Title
Sdaft: A Novel Scalable Data Access Framework For Parallel Blast
Keywords
HDFS; Mpi-BLAST; MPI/POSIX I/O; Parallel sequence search
Abstract
To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.
Publication Date
11-18-2013
Publication Title
Proceedings of DISCS 2013: The 2013 International Workshop on Data-Intensive Scalable Computing Systems, Held in conjunction with SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis
Number of Pages
1-6
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1145/2534645.2534647
Copyright Status
Unknown
Socpus ID
85026886068 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85026886068
STARS Citation
Yin, Jiangling; Zhang, Junyao; Wang, Jun; and Feng, Wu Chun, "Sdaft: A Novel Scalable Data Access Framework For Parallel Blast" (2013). Scopus Export 2010-2014. 6470.
https://stars.library.ucf.edu/scopus2010/6470