Scopus Export 2015-2019

Taming Big Data Svm With Locality-Aware Scheduling

Mao Ye, University of Central Florida
Jun Wang, University of Central Florida
Jiangling Yin, University of Central Florida
Dezhi Han, Shanghai Maritime University

Keywords

data locality; HDFS; MPI; parallel SVM; read performance

Abstract

Incorporating MPI programming model into data-intensive file system for big data application is significant in performance research for optimization purpose. In this paper we ported an MPI-SVM solver, originally developed for HPC environment to the Hadoop distributed file system (HDFS). We analyzed the performance bottlenecks with which the SVM solver will be confronted on the HDFS. It is known the storage expansion on HDFS comes with a skewed data distribution. As a result, we found out that some hot nodes always receive condensed I/O requests while other nodes always send remote requests. These remote requests make the I/O delays elongate on hot nodes, which leads to performance bottleneck for our solver. Thus we specifically improved the data preprocessing part that requires large amount of I/O operations by a deterministic scheduling method. Our improvement showed a balanced read pattern on each node. The time ratio between the longest process and the shortest process has been reduced by 60%. Also the average read time has significantly reduced by 78%. The data served on each node also showed a small variance in comparison with the originally ported SVM algorithm. We believe our design avoids the overhead introduced by remote I/O operations, which will be beneficial to many algorithms when coping with large scale of data.

Publication Date

1-11-2017

Publication Title

Proceedings - 2016 International Conference on Advanced Cloud and Big Data, CBD 2016

Number of Pages

37-44

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CBD.2016.017

Copyright Status

Unknown

Socpus ID

85013151764 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85013151764

STARS Citation

Ye, Mao; Wang, Jun; Yin, Jiangling; and Han, Dezhi, "Taming Big Data Svm With Locality-Aware Scheduling" (2017). Scopus Export 2015-2019. 7179.
https://stars.library.ucf.edu/scopus2015/7179

This document is currently not available here.

COinS

Scopus Export 2015-2019

Taming Big Data Svm With Locality-Aware Scheduling

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

Taming Big Data Svm With Locality-Aware Scheduling

Creator

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect