Scopus Export 2015-2019

Optimize Parallel Data Access In Big Data Processing

Jiangling Yin, University of Central Florida
Jun Wang, University of Central Florida

Abstract

Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many parallel big data processing systems, such as graph processing systems, MPI-based parallel programs and scala/java-based Spark frameworks, which can efficiently support iterative and interactive data analysis in memory. The first part of my dissertation mainly focuses on studying parallel data accession distributed file systems, e.g, HDFS. Since the distributed I/O resources and global data distribution are often not taken into consideration, the data requests from parallel processes/executors will unfortunately be served in a remoter imbalanced fashion on the storage servers. In order to address these problems, we develop I/O middleware systems and matching-based algorithms to map parallel data requests to storage servers such that local and balanced data access can be achieved. The last part of my dissertation presents our plans to improve the performance of interactive data access in big data analysis. Specifically, most interactive analysis programs will scan through the entire data set regardless of which data is actually required. We plan to develop a content-aware method to quickly access required data without this laborious scanning process.

Publication Date

7-7-2015

Publication Title

Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

Number of Pages

721-724

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CCGrid.2015.168

Copyright Status

Unknown

Socpus ID

84941248060 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84941248060

STARS Citation

Yin, Jiangling and Wang, Jun, "Optimize Parallel Data Access In Big Data Processing" (2015). Scopus Export 2015-2019. 2044.
https://stars.library.ucf.edu/scopus2015/2044

This document is currently not available here.

COinS

Scopus Export 2015-2019

Optimize Parallel Data Access In Big Data Processing

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

Optimize Parallel Data Access In Big Data Processing

Creator

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect