Faculty Bibliography 2010s

SDAFT: A novel scalable data access framework for parallel BLAST

Authors

J. L. Yin; J. Y. Zhang; J. Wang;W. C. Feng

Comments

Authors: contact us about adding a copy of your work at STARS@ucf.edu

Abbreviated Journal Title

Parallel Comput.

Keywords

MPI/POSIX I/O; HDFS; Parallel sequence search; mpiBLAST; SEARCH; IMPLEMENTATION; PERFORMANCE; SEQUENCE; GENBANK; SYSTEM; Computer Science, Theory & Methods

Abstract

In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today's big data era, such an approach is inefficient. In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by "read" operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and (2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4-10 and double the overall execution performance as compared with existing schemes. (C) 2014 Elsevier B.V. All rights reserved.

Journal Title

Parallel Computing

Volume

Issue/Number

Publication Date

1-1-2015

Document Type

Article

DOI Link

http://dx.doi.org/10.1016/j.parco.2014.08.001

Language

English

First Page

697

Last Page

709

WOS Identifier

WOS:000347018800010

ISSN

0167-8191

Recommended Citation

"SDAFT: A novel scalable data access framework for parallel BLAST" (2015). Faculty Bibliography 2010s. 6333.
https://stars.library.ucf.edu/facultybib2010/6333

Find in your library

COinS

Faculty Bibliography 2010s

SDAFT: A novel scalable data access framework for parallel BLAST

Authors

Comments

Abbreviated Journal Title

Keywords

Abstract

Journal Title

Volume

Issue/Number

Publication Date

Document Type

DOI Link

Language

First Page

Last Page

WOS Identifier

ISSN

Recommended Citation

Explore

Connect

Faculty Bibliography 2010s

SDAFT: A novel scalable data access framework for parallel BLAST

Authors

Authors

Comments

Abbreviated Journal Title

Keywords

Abstract

Journal Title

Volume

Issue/Number

Publication Date

Document Type

DOI Link

Language

First Page

Last Page

WOS Identifier

ISSN

Recommended Citation

Share

Explore

Connect