Title
An Improved Task Assignment Scheme For Hadoop Running In The Clouds
Keywords
Cloud computing; Data-intensive computing; Hadoop; MapReduce; Parallel and distributed computing; Task assignment
Abstract
Nowadays, data-intensive problems are so prevalent that numerous organizations in various industries have to face them in their business operation. It is often crucial for enterprises to have the capability of analyzing large volumes of data in an effective and timely manner. MapReduce and its open-source implementation Hadoop dramatically simplified the development of parallel data-intensive computing applications for ordinary users, and the combination of Hadoop and cloud computing made large-scale parallel data-intensive computing much more accessible to all potential users than ever before. Although Hadoop has become the most popular data management framework for parallel data-intensive computing in the clouds, the Hadoop scheduler is not a perfect match for the cloud environments. In this paper, we discuss the issues with the Hadoop task assignment scheme, and present an improved scheme for heterogeneous computing environments, such as the public clouds. The proposed scheme is based on an optimal minimum makespan algorithm. It projects and compares the completion times of all task slots' next data block, and explicitly strives to shorten the completion time of the map phase of MapReduce jobs. We conducted extensive simulation to evaluate the performance of the proposed scheme compared with the Hadoop scheme in two types of heterogeneous computing environments that are typical on the public cloud platforms. The simulation results showed that the proposed scheme could remarkably reduce the map phase completion time, and it could reduce the amount of remote processing employed to a more significant extent which makes the data processing less vulnerable to both network congestion and disk contention. © 2013 Dai and Bassiouni.
Publication Date
1-1-2013
Publication Title
Journal of Cloud Computing
Volume
2
Issue
1
Number of Pages
1-16
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1186/2192-113X-2-23
Copyright Status
Unknown
Socpus ID
84926427013 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84926427013
STARS Citation
Dai, Wei and Bassiouni, Mostafa, "An Improved Task Assignment Scheme For Hadoop Running In The Clouds" (2013). Scopus Export 2010-2014. 7266.
https://stars.library.ucf.edu/scopus2010/7266