Title

Sdaft: A Novel Scalable Data Access Framework For Parallel Blast

Keywords

HDFS; Mpi-BLAST; MPI/POSIX I/O; Parallel sequence search

Abstract

To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era. In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.

Publication Date

11-18-2013

Publication Title

Proceedings of DISCS 2013: The 2013 International Workshop on Data-Intensive Scalable Computing Systems, Held in conjunction with SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis

Number of Pages

1-6

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1145/2534645.2534647

Socpus ID

85026886068 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85026886068

This document is currently not available here.

Share

COinS