Accelerating I/O Performance Of Svm On Hdfs
Abstract
Hadoop distributed file system (HDFS) is a major distributed file system for commodity clusters and cloud computing. Its extensive scalability and replica fault tolerance scheme makes it well suited for data-intensive application. Due to the tremendous growth of data, many computation-centric applications also become data-intensive. However, they are not optimal on HDFS, which leaves plenty of space for performance optimization. In this paper we ported an MPI-SVM solver, originally developed for HPC environment to the HDFS. We specifically improved the data pre-processing part that requires large amount of I/O operations by a deterministic scheduling method. Our improvement showed a balanced read pattern on each node. The time ratio between the longest process and the shortest process has been reduced by 60%. Also the average read time has significantly reduced by 78%. The data served on each node also showed a small variance in comparison with the originally ported SVM algorithm. We believe that our design avoids the overhead introduced by remote I/O operations, which will be beneficial to many algorithms when coping with large scale of data.
Publication Date
12-6-2016
Publication Title
Proceedings - IEEE International Conference on Cluster Computing, ICCC
Number of Pages
132-133
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CLUSTER.2016.71
Copyright Status
Unknown
Socpus ID
85013226550 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85013226550
STARS Citation
Ye, Mao; Wang, Jun; Yin, Jiangling; and Zhang, Xuhong, "Accelerating I/O Performance Of Svm On Hdfs" (2016). Scopus Export 2015-2019. 4237.
https://stars.library.ucf.edu/scopus2015/4237