Draft:Distributed Machine Learning

  • Comment: I didn't go through all of the sources, but most of them are clearly unrelated to the statements they support. I didn't check them all, but as an example source #18 says absolutely nothing about a heterogenous data model. Somepinkdude (talk) 15:36, 4 January 2026 (UTC)


Distributed Machine Learning (DML)

Distributed Machine Learning (DML) is a field of computer science focused on analyzing data in distributed environments, addressing challenges related to computation, communication, storage, and human interaction. DML has evolved out of contributions from various computing disciplines. Historically, DML algorithms and systems have appeared in research literature for fields such as Distributed Data Mining.[1][2][3], Meta Learning [4], High Performance Data Mining [5], Privacy Preserving Distributed Data Mining [6][7][8], Federated Machine Learning [9], and Multi-Agent Learning[10] are some examples. The benefits of parallel/distributed computing in machine learning have been acknowledged in many different fields, including Neural Networks [11], Parallel Genetic Algorithms [12][13][14], Multi-Agent systems [15], and Data Fusion.

Data Models, Computation, and Topology

DML algorithms are generally categorized by the data models employed at distributed nodes. For relational data, the DML algorithms usually belong to one of the following categories:

  • Homogeneous Data Model: All sites possess the same feature sets but contain different data tuples[16][17].
  • Heterogeneous Data Model: Different sites observe different features, which may or may not overlap[18][19].

Algorithms also exist for processing semi-structured and unstructured data[20]. DML algorithms can also be classified based on how they perform compute operations. Tasks may be distributed across processors[21] using Single Instruction Multiple Data (SIMD) or Multiple Instruction Multiple Data (MIMD) paradigms. System architecture ranges from tightly coupled (highly interdependent nodes that usually share resources) to loosely coupled (independent nodes that share little).

  • Network topology also plays an important role in the design of DML algorithms. One may create an overlay network topology for designing how different nodes are going to communicate with each other. Examples include:Client-Server: Client nodes communicate exclusively with a central server.
  • Peer-to-Peer (P2P): Nodes communicate directly with their neighbors without the need for a central server. P2P DML algorithms and systems often utilize local, asynchronous algorithms[22].

Distributed Representation Construction

Principal Component Analysis (PCA) is a popular technique often used to reduce dimensionality of the data by identifying latent features that capture maximum variance. It is widely used in clustering, classification, and predictive modeling applications.

  • Homogeneous PCA: Computing PCA from homogeneous distributed data is straightforward. Local sites can calculate covariance matrices and transmit them to other nodes (e.g. the server in case of the Client-Server model). Nodes can aggregate these locally computed covariance matrices since covariance is additively decomposable. Global eigenvectors are then broadcast back to local sites for data projection after performing eigen analysis on the globally constructed covariance matrix.
  • Heterogeneous PCA: This data model presents a greater challenge. One approach to address this is the Collective Principal Component Analysis (CPCA) algorithm[23][24]. It works as follows:
  1. Perform local PCA at each node and select dominant eigenvectors.
  2. Transmit a sample of projected data and dominant eigenvectors.
  3. Aggregate projected data from all sites.
  4. Perform PCA on the global set to identify and transform dominant eigenvectors back to the original space.

While exact Principal Components (PCs) theoretically require reconstructing original data, global PCs can be computed directly from projected samples due to PCA's invariance under linear transformation.

Distributed Clustering

A wide range of distributed clustering algorithms have been reported in the DML literature. They can also be grouped based on the type of data model supported by the distributed nodes.

Homogeneous Data

Forman and Zhang[25] developed a center-based algorithm relying on the exchange of sufficient statistics, extending earlier parallel clustering research[26]. Similarly, the RACHET (Recursive Agglomeration of Clustering Hierarchies by Encircling Tactic) system[27] merges local dendrograms containing sufficient statistics into a global dendrogram. Both methods iterate until convergence.

Parthasarathy and Ogihara[28] addressed the need for suitable distance metrics in this context, utilizing association rules. The PADMA system[29] performed distributed document clustering and analysis via relevance feedback-based supervised learning. Additional research on parallel/distributed clustering is documented in[30][31].

Heterogenous Data

McClean et al.[32] focused on clustering heterogeneous databases, specifically data cubes with attributes from differing domains, using Euclidean distance and Kullback-Leibler information divergence.

Kargupta et al.[33] proposed a method based on CPCA. This technique applies standard clustering to local PCs, aggregates representative points to form global PCs, and then projects local data onto these global PCs to refine clusters. Hierarchical approaches[34] and random projection-based techniques[35] have also been explored.

Strehl and Ghosh[36] proposed an Ensemble Clustering framework for combining multiple clusterings (even with varying cluster counts) by maximizing shared information, quantified via mutual information.

Click-Stream Analysis: An algorithm for click-stream data[37] generates global clusters by analyzing local cluster descriptions, represented by transaction IDs, and removing duplicates to define maximal large item sets.

Distributed Supervised Learning

Homogeneous Data

Approaches often leverage ensemble learning [38][39][40][41], where multiple base models are combined to improve accuracy.

  • Boosting & Stacking: Fan et al.[42] explored AdaBoost-based ensembles, while Breiman[43] applied Arcing for online data aggregation. Stacking[44] has also been experimentally investigated[45].
  • Meta-Learning: This framework[46][47] constructs classifiers locally, then generates meta-classifiers. This can be achieved by learning from locally generated concepts, blending original data with artificial data, or using voting mechanisms. Techniques include knowledge probing[48] and Java-based distributed systems[49][50]. It may be applied recursively, producing a hierarchy of meta-classifiers.

Heterogeneous Data

Heterogeneous environments pose challenges as local sites observe only a subset of features. Ensemble based approaches used for Homogeneous DML usually generate high variance local models[51] and fail to detect the interaction between features observed at different sites. Here are some of the approaches for supervised learning from heterogenous data.

  • If the problem is decomposable and detecting feature interactions across sites is not required, ensemble approaches or vertical data partitioning[52] may work. In the general cases, detecting feature interactions across different sites is critical. The WoRLD system[53] uses "activation spreading" based on first-order statistics to propagate distribution information across different nodes.
  • Tumer and Ghosh[54] proposed aggregation using order statistics (e.g., "spread" and "trimmed mean" classifiers) to handle high-variance models.
  • Park et al.[55] developed a Fourier spectrum-based technique to aggregate decision trees from different nodes. They identify data subsets that local classifiers fail to predict with high confidence and construct a central classifier for these cases.
  • Kargupta et al. proposed the Collective Data Mining (CDM) framework[56]. CDM learns a function using orthonormal basis functions. It generates local basis coefficients and estimates non-linear cross-terms using a small data sample transmitted to a central site. This has been applied to decision trees (using Fourier representations[57][58][59] and resampling[60]) and multivariate regression (using Wavelet representations[61]).
  • Deep Learning: Cohen et al.[62] introduced algorithms for asynchronous deep neural network training using momentum buffers to handle gradient staleness.

Scaling Up Using High-Performance Machines

High-performance computing (HPC) is integral to DML for processing massive datasets. There is extensive literature regarding the intersection of HPC and machine learning [63][64][65][66][67][68][69][70].

Peer-to-Peer (P2P) DML

P2P algorithms are generally categorized into four types:

  1. Heuristics-based: Peers learn from local and neighbor data. Example: P2P k-Means by Bandyopadhyay et al.[71].
  2. Broadcast-based: Rely on system-wide messaging[72]. Communication costs scale poorly with network size.
  3. Gossip-based: Peers exchange data with random partners. These provide probabilistic accuracy guarantees for aggregates (sum, average, max)[73][74], though overhead can be high.
  4. Local Algorithms: Relies on local rules to limit message propagation. Communication occurs only when local data distributions change (violating the rule). Originating in graph theory[75][76], these have been applied to association rule mining[77], outlier detection[78], meta-classification[79][80][81], Eigen-monitoring[82], Decision Trees[83], SVMs[84][85], neural networks[86], Top-k monitoring[87], and relational learning[88][89]. They also support distributed optimization[90][91][92][93][94][95].

Privacy Preserving Distributed Machine Learning

Privacy-sensitive DML algorithms typically adopt a model of privacy and try to deliver privacy protection based on the adopted model of privacy: Here are couple of common approaches:

  1. Secure Multi-Party Computation (SMC): A collection of distributed privacy preserving algorithms has been proposed for computing statistical aggregates and machine learning based on the SMC protocol. A privacy preserving technique to construct decision trees is proposed elsewhere[96]. Association rule mining from homogeneous[97][98], secure sum computation, and secure scalar product computation[99]) are some examples.
  2. Data Perturbation: Data is distorted via randomized techniques before pattern extraction. Examples include randomized value distortion for decision trees[100] and randomized masking[101]. However, simple additive noise has been shown to be insufficient for robust privacy protection[102].

Federated Machine Learning

Federated Learning is a specific subset of DML focusing on iterative model training (typically deep learning) across distributed nodes without centralized data storage. Further details are available in[103]

  1. ^ Kargupta H. and Sivakumar K. Existential Pleasures of Distributed Data Mining. In Data Mining: Next Generation Challenges and Future Directions, edited by H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha, MIT/AAAI Press, 2004.
  2. ^ Kargupta, H., & Chan, P. Advances in Distributed Data Mining. MIT/AAAI Press, 2000.
  3. ^ DML Bibliography. https://agnik.com/sparks/dmlbib.html
  4. ^ Chan, Philip K.; Stolfo, Salvatore J. (1995). "A Comparative Evaluation of Voting and Meta-learning on Partitioned Data". Machine Learning Proceedings 1995. pp. 90–98. doi:10.1016/B978-1-55860-377-6.50020-7. ISBN 978-1-55860-377-6.
  5. ^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000. https://hdl.handle.net/11299/215466
  6. ^ Kantarcioglu, M.; Clifton, C. (2004). "Privacy-preserving distributed mining of association rules on horizontally partitioned data". IEEE Transactions on Knowledge and Data Engineering. 16 (9): 1026–1037. doi:10.1109/TKDE.2004.45.
  7. ^ Kargupta, H.; Datta, S.; Wang, Q.; Krishnamoorthy Sivakumar (2003). "On the privacy preserving properties of random data perturbation techniques". Third IEEE International Conference on Data Mining. pp. 99–106. doi:10.1109/ICDM.2003.1250908. ISBN 0-7695-1978-4.
  8. ^ Gilburd, B.; Schuster, A.; Wolff, R. (2004). "Privacy-preserving data mining on data grids in the presence of malicious participants". Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004. pp. 225–234. doi:10.1109/HPDC.2004.1323540. ISBN 0-7695-2175-4.
  9. ^ Brendan McMahan, H.; Moore, Eider; Ramage, Daniel; Hampson, Seth; Blaise Agüera y Arcas (2016). "Communication-Efficient Learning of Deep Networks from Decentralized Data". arXiv:1602.05629 [cs.LG].
  10. ^ A. Joshi. To learn or not to learn. In Gerhard Weiß and Sundip Sen, editors, Adaption and Learning in Multi-Agent Systems, number 1042 in Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence, pages 127–139, New York, 1995. Springer-Verlag. Proceedings IJCI'95 Workshop, Montreal, Canada, 1995
  11. ^ McClelland, James L.; Rumelhart, David E. (1986). Parallel Distributed Processing. doi:10.7551/mitpress/5236.001.0001. ISBN 978-0-262-29140-8.
  12. ^ Holland, J. H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, 1975.
  13. ^ David E. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” Kluwer Academic Publishers, Boston, MA, 1989.
  14. ^ Cantú-Paz, Erick. "A survey of parallel genetic algorithms." Calculateurs paralleles, reseaux et systems repartis 10.2 (1998): 141-171.
  15. ^ A. Joshi. To learn or not to learn. In Gerhard Weiß and Sundip Sen, editors, Adaption and Learning in Multi-Agent Systems, number 1042 in Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence, pages 127–139, New York, 1995. Springer-Verlag. Proceedings IJCI'95 Workshop, Montreal, Canada, 1995.
  16. ^ H. Kargupta, I. Hamzaoglu, and B. Stafford. Scalable, distributed data mining using an agent based architecture. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, Proceedings of Knowledge Discovery And Data Mining, pages 211–214, Menlo Park, CA, 1997. AAAI Press
  17. ^ S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74–81, Menlo Park, CA, 1997. AAAI Press
  18. ^ Kargupta, Hillol; Huang, Weiyun; Sivakumar, Krishnamoorthy; Johnson, Erik (2001). "Distributed Clustering Using Collective Principal Component Analysis". Knowledge and Information Systems. 3 (4): 422–448. doi:10.1007/PL00011677.
  19. ^ Alexander Strehl and Joydeep Ghosh. Cluster ensembles – a knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, July 2002. AAAI
  20. ^ Bauer, Eric; Kohavi, Ron (1999). "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants". Machine Learning. 36 (1–2): 105–139. doi:10.1023/A:1007515423169.
  21. ^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000. https://hdl.handle.net/11299/215466
  22. ^ Schuster, Assaf; Wolff, Ran; Trock, Dan (2005). "A high-performance distributed algorithm for mining association rules". Knowledge and Information Systems. 7 (4): 458–475. doi:10.1007/s10115-004-0176-3.
  23. ^ Kargupta, Hillol; Huang, Weiyun; Sivakumar, Krishnamoorthy; Park, Byung-Hoon; Wang, Shuren (2000). "Collective Principal Component Analysis from Distributed, Heterogeneous Data". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 1910. pp. 452–457. doi:10.1007/3-540-45372-5_50. ISBN 978-3-540-41066-9.
  24. ^ Kargupta, Hillol; Huang, Weiyun; Sivakumar, Krishnamoorthy; Johnson, Erik (2001). "Distributed Clustering Using Collective Principal Component Analysis". Knowledge and Information Systems. 3 (4): 422–448. doi:10.1007/PL00011677.
  25. ^ G. Forman and B. Zhang. Distributed data clustering can be efficient and exact.In SIGKDD Explorations, volume 2 of 2, pages 34–38. ACM Press, New York, 2000
  26. ^ Zhang, Bin; Hsu, Meichun; Forman, George (2000). "Accurate Recasting of Parameter Estimation Algorithms Using Sufficient Statistics for Efficient Parallel Speed-Up". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 1910. pp. 243–254. doi:10.1007/3-540-45372-5_24. ISBN 978-3-540-41066-9.
  27. ^ Samatova, Nagiza F.; Ostrouchov, George; Geist, Al; Melechko, Anatoli V. (2002). "RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets". Distributed and Parallel Databases. 11 (2): 157–180. doi:10.1023/A:1013988102576.
  28. ^ Parthasarathy, Srinivasan; Ogihara, Mitsunori (2000). "Clustering Distributed Homogeneous Datasets". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 1910. pp. 566–574. doi:10.1007/3-540-45372-5_67. ISBN 978-3-540-41066-9.
  29. ^ Bauer, Eric; Kohavi, Ron (1999). "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants". Machine Learning. 36 (1–2): 105–139. doi:10.1023/A:1007515423169.
  30. ^ Dhillon, Inderjit S.; Modha, Dharmendra S. (2001). "Concept Decompositions for Large Sparse Text Data Using Clustering". Machine Learning. 42 (1–2): 143–175. doi:10.1023/A:1007612920971.
  31. ^ Zhang, Bin; Hsu, Meichun; Forman, George (2000). "Accurate Recasting of Parameter Estimation Algorithms Using Sufficient Statistics for Efficient Parallel Speed-Up". Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 1910. pp. 243–254. doi:10.1007/3-540-45372-5_24. ISBN 978-3-540-41066-9.
  32. ^ S. McClean, B. Scotney, and K. Greer. Conceptual clustering heterogeneous distributed databases. In Workshop on Distributed and Parallel Knowledge Discovery, Boston, MA, USA, 2000.
  33. ^ Kargupta, Hillol; Huang, Weiyun; Sivakumar, Krishnamoorthy; Johnson, Erik (2001). "Distributed Clustering Using Collective Principal Component Analysis". Knowledge and Information Systems. 3 (4): 422–448. doi:10.1007/PL00011677.
  34. ^ E. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Lecture Notes in Computer Science, volume 2007. Springer-Verlag, 1999.
  35. ^ R. A. Wolff, K. Bhaduri, and H. Kargupta. Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems. In Proceedings of the SIAM International Conference on Data Mining, 2006.
  36. ^ Alexander Strehl and Joydeep Ghosh. Cluster ensembles – a knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada, July 2002. AAAI.
  37. ^ X. Wang and H. Kargupta. Distributed data mining for e-business. In Proceedings of the Seventh International Conference on High Performance Computing (HiPC), Bangalore, India, 2000.
  38. ^ T. G. Dietterich. Ensemble methods in machine learning. In First International Workshop on Multiple Classifier Systems, pages 1–15, New York, 2000. Springer-Verlag.
  39. ^ T. G. Dietterich. Machine learning research: Four current directions. AI Magazine, 18(4):97–136, 1997.
  40. ^ L.K. Hansen and P. Salamon. Neural network ensembles. IEEE Trans. Pattern Analysis and Machine Intelligence, 12:993–1001, 1990.
  41. ^ R. E. Schapire. A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.
  42. ^ W. Fan, S. Stolfo, and J. Zhang. The application of adaboost for distributed, scalable and on-line learning. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 362–366, 1999.
  43. ^ Leo Breiman. Arcing the edge. Technical Report 486, Statistics Department, University of California at Berkeley, 1997.
  44. ^ D. H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.
  45. ^ K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29(2):341–348, 1996.
  46. ^ Chan, P., & Stolfo, S. J. A Comparative Evaluation of Voting and Meta-learning on Partitioned Data. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 90–98), 1995.
  47. ^ S. Stolfo et al. Jam: Java agents for meta-learning over distributed databases. In Proceedings Third International Conference on Knowledge Discovery and Data Mining, pages 74–81, Menlo Park, CA, 1997. AAAI Press
  48. ^ H. Guo and J. Slocum. Classification of distributed data. In Workshop on Distributed and Parallel Knowledge Discovery, Boston, MA, USA, 2000
  49. ^ A. L. Prodromidis, P. K. Chan, and S. J. Stolfo. Meta-learning in distributed data mining systems: Issues and approaches. In Advances in Distributed and Parallel Knowledge Discovery, pages 81–114. AAAI/MIT Press, 2000.
  50. ^ P. K. Chan and S. J. Stolfo. Toward parallel and distributed learning by meta-learning. In AAAI/MIT Press, editor, Advances in Knowledge Discovery and Data Mining, pages 227–240, Menlo Park, CA, 1996.
  51. ^ K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, Special issue on combining artificial neural networks: ensemble approaches, 8(3 & 4):385–404, 1996.
  52. ^ F. Provost and D. Jensen. Efficient statistics for scaling up learning. In Proceedings of the International Conference on Machine Learning, 2000
  53. ^ F. J. Provost and D. N. Hennessy. Scaling up: Distributed machine learning with cooperation. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 74–79, 1996
  54. ^ K. Tumer and J. Ghosh. Robust order statistics based ensembles for distributed data mining. In H. Kargupta and P. Chan, editors, Advances in Distributed and Parallel Knowledge Discovery, pages 185–210, Menlo Park, CA, 2000. AAAI/MIT Press.
  55. ^ B. Park, R. Ayyagari, and H. Kargupta. A fourier analysis-based approach to decision tree ensembles. In Proceedings of the SIAM International Conference on Data Mining (SDM01), Chicago, USA, 2001.
  56. ^ H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Collective data mining: A new perspective toward distributed data mining. In Advances in Distributed and Parallel Knowledge Discovery, pages 133–184. AAAI/MIT Press, 2000
  57. ^ H. Kargupta and B. Park. Mining decision trees from data streams in a mobile environment. In Proceedings of the IEEE International Conference on Data Mining (ICDM02), Maebashi City, Japan, 2002
  58. ^ H. Kargupta, R. Ayyagari, and K. Sivakumar. The fourier transform of decision trees. In Workshop on Mathematical Foundations of Data Mining at the IEEE International Conference on Data Mining, 2002.
  59. ^ S. Kushwaha, H. Kargupta, and K. Sivakumar. The fourier spectrum of decision trees. In Workshop on Mathematical Foundations of Data Mining at the IEEE International Conference on Data Mining, 2003
  60. ^ K. Sivakumar and H. Kargupta. A resampling technique for learning the fourier spectrum of decision trees. In Proceedings of the IEEE International Conference on Data Mining (ICDM03), Melbourne, Florida, 2003
  61. ^ D. Hershberger and H. Kargupta. Distributed multivariate regression using wavelet-based collective data mining. Journal of Parallel and Distributed Computing, 61(3):372–400, 2001
  62. ^ Cohen, Gilad, et al. "Distributed Asynchronous Deep Learning." arXiv preprint arXiv:2106.11295 (2021)
  63. ^ M. Joshi, E. Han, G. Karypis, and V. Kumar. Parallel alogrithms for data mining. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000
  64. ^ M. J. Zaki and C-T. Ho. Large-Scale Parallel Data Mining. LNAI State-of-the-Art Survey, Volume 1759. Springer-Verlag, 2000
  65. ^ D. B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, 7(4):26–35, 1999
  66. ^ A. A. Freitas and S. H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, 1998
  67. ^ H. Kargupta and P. Chan. Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press, 2000
  68. ^ H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.
  69. ^ C. Giannella, K. Bhaduri, H. Kargupta. Distributed Data Mining. In The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003
  70. ^ H. Kargupta, R. Bhargava, K. Liu, M. Powers, P. Blair, S. Bushra, J. Dull, K. Sarkar, M. Klein, M. Vasa, and D. Handy. Veds: A mobile information system for public health monitoring. In Proceedings of the KDD-2004, Seattle, 2004
  71. ^ S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences, 176(14):1952–1985, 2006
  72. ^ I. Sharfman, A. Schuster, and D. Keren. A geometric approach to monitoring threshold functions over distributed data streams. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 301–312, 2006.
  73. ^ D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pages 482–491, 2003.
  74. ^ S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms: Design, analysis and applications. In Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), 2005
  75. ^ Y. Afek, S. Kutten, and M. Yung. The local detection paradigm and its application to self-stabilization. Theoretical Computer Science, 186(1-2):199–229, 1997.
  76. ^ N. Linial. Locality in distributed graph algorithms. SIAM Journal on Computing, 21(1):193–201, 1992.
  77. ^ Schuster, Assaf, Ran Wolff, and Dan Trock. "A high-performance distributed algorithm for mining association rules." Knowledge and Information Systems 7.4 (2005): 458-475.
  78. ^ Branch, J.W., Szymanski, B., Giannella, C., Wolff, R., & Kargupta, H. (2006). In-network outlier detection in wireless sensor networks. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06) (pp. 51-51). IEEE
  79. ^ R. Wolff, K. Bhaduri, and H. Kargupta. A generic local algorithm for distributed data mining applications in large peer-to-peer networks. In Proceedings of the 2006 SIAM International Conference on Data Mining
  80. ^ Bhaduri, K., Wolff, R., Giannella, C., & Kargupta, H. (2008). Distributed decision-tree induction in peer-to-peer systems. Statistical Analysis and Data Mining: The ASA Data Science Journal, 1(2), 85-103
  81. ^ K. Bhaduri, R. Wolff, C. Giannella, and H. Kargupta. Distributed decision-tree induction in peer-to-peer systems. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2008.
  82. ^ K. Bhaduri and H. Kargupta. A scalable local algorithm for distributed multivariate regression in peer-to-peer networks. In Proceedings of the 2008 SIAM International Conference on Data Mining, 2008.
  83. ^ K. Bhaduri, R. Wolff, C. Giannella, and H. Kargupta. Distributed decision-tree induction in peer-to-peer systems. Statistical Analysis and Data Mining, 1(2):85–103, 2008
  84. ^ P. A. Forero, A. Cano, and G. B. Giannakis. Consensus-based distributed support vector machines. Journal of Machine Learning Research, 11:1663–1707, 2010
  85. ^ H. Zuo, H. Kargupta, and K. Bhaduri. Distributed support vector machines in peer-to-peer networks. In Proceedings of the Workshop on Large-Scale Data Mining: Theory and Applications at KDD 2010
  86. ^ J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016
  87. ^ K. Bhaduri, S. Das, K. Liu, and H. Kargupta. Distributed top-k inner product monitoring in peer-to-peer networks. In Proceedings of the Workshop on Distributed Data Mining at KDD 2009
  88. ^ H. Kargupta, H. Zuo, and K. Bhaduri. Distributed relational learning in peer-to-peer networks. In Proceedings of the Workshop on Distributed Data Mining at ICDM 2009
  89. ^ H. Zuo and H. Kargupta. Distributed graph mining in peer-to-peer networks. In Proceedings of the Workshop on Mining and Learning with Graphs at KDD 2010.
  90. ^ A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009
  91. ^ J. C. Duchi, A. Agarwal, and M. J. Wainwright. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic Control, 57(3):592–606, 2012
  92. ^ S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011
  93. ^ K. I. Tsianos, S. Lawlor, and M. G. Rabbat. Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning. In Proceedings of the 50th Allerton Conference on Communication, Control, and Computing, pages 1543–1550, 2012
  94. ^ M. Rabbat and R. Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pages 20–27, 2004
  95. ^ W. Shi, Q. Ling, G. Wu, and W. Yin. EXTRA: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
  96. ^ E. Johnson and H. Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Lecture Notes in Computer Science, volume 2007. Springer-Verlag, 1999
  97. ^ M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In SIGMOD Workshop on DMKD, Madison, WI, June 2002
  98. ^ J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002
  99. ^ C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for privacy preserving distributed data mining. SIGKDD Explorations, 4(2):28–34, 2002
  100. ^ R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 439–450, 2000
  101. ^ S. R. M. Oliveira and O. R. Zaiane. Privacy preserving clustering by data transformation. In Proceedings of the 18th Brazilian Symposium on Databases, pages 304–318, 2003
  102. ^ H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2003.
  103. ^ Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.