Today Big Data computer platforms employ resource management systems such as Yarn, Torque, Mesos, and Google Borg to enable sharing the physical computing among many users or applications. Given virtualization and resource management systems, users are able to launch their applications on the same node with low mutual interference and management overhead on CPU and memory. However, there are still challenges to be addressed before these systems can be fully adopted to manage the IO resources in Big Data File Systems (BDFS) and shared network facilities. In this study, we mainly study on three IO management problems systematically, in terms of the proportional sharing of block IO in container-based virtualization, the network IO contention in MPI-based HPC applications and the data migration overhead in HPC workflows. To improve the proportional sharing, we develop a prototype system called BDFS-Container, by containerizing BDFS at Linux block IO level. Central to BDFS-Container, we propose and design a proactive IOPS throttling based mechanism named IOPS Regulator, which improves proportional IO sharing under the BDFS IO pattern by 74.4% on an average. In the aspect of network IO resource management, we exploit using virtual switches to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Third, to solve the data migration problem in small-medium sized HPC clusters, we propose to construct a sided IO path, named as SideIO, to explicitly direct analysis data to BDFS that co-locates computation with data. By experimenting with two real-world scientific workflows, SideIO completely avoids the most expensive data movement overhead and achieves up to 3x speedups compared with current solutions.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Electrical Engineering and Computer Engineering
Length of Campus-only Access
Doctoral Dissertation (Campus-only Access)
Huang, Dan, "Managing IO Resource for Co-running Data Intensive Applications in Virtual Clusters" (2018). Electronic Theses and Dissertations, 2004-2019. 6049.