Accelerating I/O Performance Of Svm On Hdfs

Abstract

Hadoop distributed file system (HDFS) is a major distributed file system for commodity clusters and cloud computing. Its extensive scalability and replica fault tolerance scheme makes it well suited for data-intensive application. Due to the tremendous growth of data, many computation-centric applications also become data-intensive. However, they are not optimal on HDFS, which leaves plenty of space for performance optimization. In this paper we ported an MPI-SVM solver, originally developed for HPC environment to the HDFS. We specifically improved the data pre-processing part that requires large amount of I/O operations by a deterministic scheduling method. Our improvement showed a balanced read pattern on each node. The time ratio between the longest process and the shortest process has been reduced by 60%. Also the average read time has significantly reduced by 78%. The data served on each node also showed a small variance in comparison with the originally ported SVM algorithm. We believe that our design avoids the overhead introduced by remote I/O operations, which will be beneficial to many algorithms when coping with large scale of data.

Publication Date

12-6-2016

Publication Title

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Number of Pages

132-133

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CLUSTER.2016.71

Socpus ID

85013226550 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85013226550

This document is currently not available here.

Share

COinS