Title

Dl-Mpi: Enabling Data Locality Computation For Mpi-Based Data-Intensive Applications

Keywords

Hadoop file system; HPC application; MPI

Abstract

Currently, most scientific applications based on MPI adopt a compute-centric architecture. Needed data is accessed by MPI processes running on different nodes through a shared file system. Unfortunately, the explosive growth of scientific data undermines the high performance of MPI-based applications, especially in the execution environment of commodity clusters. In this paper, we present a novel approach to enable data locality computation for MPI-based data-intensive applications and refer to it as DL-MPI. DL-MPI allows MPI-based programs to obtain data distribution information for compute nodes through a novel data locality API. In addition, the problem of allocating data processing tasks to parallel processes is formulated as an integer optimization problem with the objectives of achieving data locality computation and optimal parallel execution time. For heterogeneous runtime environments, we propose a scheduling algorithm based on probability to dynamically schedule tasks to processes by evaluating the unprocessed local data and the computing ability of each compute node. We demonstrate the functionality of our methods through the implementation of scientific data processing programs as well as the incorporation of DL-MPI with existing HPC applications. © 2013 IEEE.

Publication Date

1-1-2013

Publication Title

Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013

Number of Pages

506-511

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/BigData.2013.6691614

Socpus ID

84893218367 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84893218367

This document is currently not available here.

Share

COinS