Ftsgd: An Adaptive Stochastic Gradient Descent Algorithm For Spark Mllib

Keywords

Adaptive Iterative Learning; Asynchronous Stochastic Gradient Decent; MLlib; Spark

Abstract

The proliferation of massive datasets and the surge of interests in big data analytics have popularized a number of novel distributed data processing platforms such as Hadoop and Spark. Their large and growing ecosystems of libraries enable even novice to take advantage of the latest data analytics and machine learning algorithms. However, time-consuming data synchronization and communications in iterative algorithms on large-scale distributed platforms can lead to significant performance inefficiency. MLlib is Spark's scalable library consisting of common machine learning algorithms, many of which employ Stochastic Gradient Descent (SGD) to find minima or maxima by iterations. However, the convergence can be very slow if gradient data are synchronized on each iteration. In this work, we optimize the current implementation of SGD in Spark's MLlib by reusing data partition for multiple times within a single iteration to find better candidate weights in a more efficient way. Whether using multiple local iterations within each partition is dynamically decided by the 68-95-99.7 rule. We also design a variant of momentum algorithm to optimize step size in every iteration. This method uses a new adaptive rule that decreases the step size whenever neighboring gradients show differing directions of significance. Experiments show that our adaptive algorithm is more efficient and can be 7 times faster compared to the original MLlib's SGD.

Publication Date

10-26-2018

Publication Title

Proceedings - IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3rd Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2018

Number of Pages

822-827

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00-22

Socpus ID

85056862511 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85056862511

This document is currently not available here.

Share

COinS