Title
Sdaft: A Novel Scalable Data Access Framework For Parallel Blast
Keywords
HDFS; MPI/POSIX I/O; Parallel sequence search mpiBLAST
Abstract
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today's big data era, such an approach is inefficient. In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by "read" operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and (2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4-10 and double the overall execution performance as compared with existing schemes.
Publication Date
1-1-2014
Publication Title
Parallel Computing
Volume
40
Issue
10
Number of Pages
697-709
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.parco.2014.08.001
Copyright Status
Unknown
Socpus ID
85027939163 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85027939163
STARS Citation
Yin, Jiangling; Zhang, Junyao; Wang, Jun; and Feng, Wu Chun, "Sdaft: A Novel Scalable Data Access Framework For Parallel Blast" (2014). Scopus Export 2010-2014. 8462.
https://stars.library.ucf.edu/scopus2010/8462