Abstract

Today Big Data computer platforms employ resource management systems such as Yarn, Torque, Mesos, and Google Borg to enable sharing the physical computing among many users or applications. Given virtualization and resource management systems, users are able to launch their applications on the same node with low mutual interference and management overhead on CPU and memory. However, there are still challenges to be addressed before these systems can be fully adopted to manage the IO resources in Big Data File Systems (BDFS) and shared network facilities. In this study, we mainly study on three IO management problems systematically, in terms of the proportional sharing of block IO in container-based virtualization, the network IO contention in MPI-based HPC applications and the data migration overhead in HPC workflows. To improve the proportional sharing, we develop a prototype system called BDFS-Container, by containerizing BDFS at Linux block IO level. Central to BDFS-Container, we propose and design a proactive IOPS throttling based mechanism named IOPS Regulator, which improves proportional IO sharing under the BDFS IO pattern by 74.4% on an average. In the aspect of network IO resource management, we exploit using virtual switches to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Third, to solve the data migration problem in small-medium sized HPC clusters, we propose to construct a sided IO path, named as SideIO, to explicitly direct analysis data to BDFS that co-locates computation with data. By experimenting with two real-world scientific workflows, SideIO completely avoids the most expensive data movement overhead and achieves up to 3x speedups compared with current solutions.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2018

Semester

Summer

Advisor

Wang, Jun

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Electrical Engineering and Computer Engineering

Degree Program

Computer Engineering

Format

application/pdf

Identifier

CFE0007195

URL

http://purl.fcla.edu/fcla/etd/CFE0007195

Language

English

Release Date

August 2021

Length of Campus-only Access

3 years

Access Status

Doctoral Dissertation (Campus-only Access)

Share

COinS