Title

An Improved Task Assignment Scheme For Hadoop Running In The Clouds

Keywords

Cloud computing; Data-intensive computing; Hadoop; MapReduce; Parallel and distributed computing; Task assignment

Abstract

Nowadays, data-intensive problems are so prevalent that numerous organizations in various industries have to face them in their business operation. It is often crucial for enterprises to have the capability of analyzing large volumes of data in an effective and timely manner. MapReduce and its open-source implementation Hadoop dramatically simplified the development of parallel data-intensive computing applications for ordinary users, and the combination of Hadoop and cloud computing made large-scale parallel data-intensive computing much more accessible to all potential users than ever before. Although Hadoop has become the most popular data management framework for parallel data-intensive computing in the clouds, the Hadoop scheduler is not a perfect match for the cloud environments. In this paper, we discuss the issues with the Hadoop task assignment scheme, and present an improved scheme for heterogeneous computing environments, such as the public clouds. The proposed scheme is based on an optimal minimum makespan algorithm. It projects and compares the completion times of all task slots' next data block, and explicitly strives to shorten the completion time of the map phase of MapReduce jobs. We conducted extensive simulation to evaluate the performance of the proposed scheme compared with the Hadoop scheme in two types of heterogeneous computing environments that are typical on the public cloud platforms. The simulation results showed that the proposed scheme could remarkably reduce the map phase completion time, and it could reduce the amount of remote processing employed to a more significant extent which makes the data processing less vulnerable to both network congestion and disk contention. © 2013 Dai and Bassiouni.

Publication Date

1-1-2013

Publication Title

Journal of Cloud Computing

Volume

2

Issue

1

Number of Pages

1-16

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1186/2192-113X-2-23

Socpus ID

84926427013 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84926427013

This document is currently not available here.

Share

COinS