Ftsgd: An Adaptive Stochastic Gradient Descent Algorithm For Spark Mllib
Keywords
Adaptive Iterative Learning; Asynchronous Stochastic Gradient Decent; MLlib; Spark
Abstract
The proliferation of massive datasets and the surge of interests in big data analytics have popularized a number of novel distributed data processing platforms such as Hadoop and Spark. Their large and growing ecosystems of libraries enable even novice to take advantage of the latest data analytics and machine learning algorithms. However, time-consuming data synchronization and communications in iterative algorithms on large-scale distributed platforms can lead to significant performance inefficiency. MLlib is Spark's scalable library consisting of common machine learning algorithms, many of which employ Stochastic Gradient Descent (SGD) to find minima or maxima by iterations. However, the convergence can be very slow if gradient data are synchronized on each iteration. In this work, we optimize the current implementation of SGD in Spark's MLlib by reusing data partition for multiple times within a single iteration to find better candidate weights in a more efficient way. Whether using multiple local iterations within each partition is dynamically decided by the 68-95-99.7 rule. We also design a variant of momentum algorithm to optimize step size in every iteration. This method uses a new adaptive rule that decreases the step size whenever neighboring gradients show differing directions of significance. Experiments show that our adaptive algorithm is more efficient and can be 7 times faster compared to the original MLlib's SGD.
Publication Date
10-26-2018
Publication Title
Proceedings - IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3rd Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2018
Number of Pages
822-827
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00-22
Copyright Status
Unknown
Socpus ID
85056862511 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85056862511
STARS Citation
Zhang, Hong; Liu, Zixia; Huang, Hai; and Wang, Liqiang, "Ftsgd: An Adaptive Stochastic Gradient Descent Algorithm For Spark Mllib" (2018). Scopus Export 2015-2019. 10572.
https://stars.library.ucf.edu/scopus2015/10572