Taming Big Data Svm With Locality-Aware Scheduling
Keywords
data locality; HDFS; MPI; parallel SVM; read performance
Abstract
Incorporating MPI programming model into data-intensive file system for big data application is significant in performance research for optimization purpose. In this paper we ported an MPI-SVM solver, originally developed for HPC environment to the Hadoop distributed file system (HDFS). We analyzed the performance bottlenecks with which the SVM solver will be confronted on the HDFS. It is known the storage expansion on HDFS comes with a skewed data distribution. As a result, we found out that some hot nodes always receive condensed I/O requests while other nodes always send remote requests. These remote requests make the I/O delays elongate on hot nodes, which leads to performance bottleneck for our solver. Thus we specifically improved the data preprocessing part that requires large amount of I/O operations by a deterministic scheduling method. Our improvement showed a balanced read pattern on each node. The time ratio between the longest process and the shortest process has been reduced by 60%. Also the average read time has significantly reduced by 78%. The data served on each node also showed a small variance in comparison with the originally ported SVM algorithm. We believe our design avoids the overhead introduced by remote I/O operations, which will be beneficial to many algorithms when coping with large scale of data.
Publication Date
1-11-2017
Publication Title
Proceedings - 2016 International Conference on Advanced Cloud and Big Data, CBD 2016
Number of Pages
37-44
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CBD.2016.017
Copyright Status
Unknown
Socpus ID
85013151764 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85013151764
STARS Citation
Ye, Mao; Wang, Jun; Yin, Jiangling; and Han, Dezhi, "Taming Big Data Svm With Locality-Aware Scheduling" (2017). Scopus Export 2015-2019. 7179.
https://stars.library.ucf.edu/scopus2015/7179