Abstract
Modern information era gives rise to the persistent generation of large amounts of data with rapid speed and broad geographical distribution. Obtaining knowledge and understanding via analysis and learning from such data have invaluable worth. Features of such data analytical tasks commonly include: data can be large scale and geographically distributed; computing capability demand can be enormous; tasks can be time-critical; some data can be private; participants can have heterogeneous capabilities and non-IID data; and multiple simultaneously submitted data analytical tasks can be possible. These bring challenges to contemporary computing infrastructure and learning models. In view of this, we develop techniques with the purpose of tackling above challenges together towards more efficient collaborative distributed data analysis and learning. We propose a hierarchical framework that supports data analytics on multiple Apache Spark clusters. We propose reinforcement learning based resource management approaches to improve overall efficiency and reduce deadline violations for scheduling general and time-critical data analytical workflows among computing resources. We establish a new hybrid framework for efficient privacy-preserving federated learning and further propose an algorithm upon it for improving asynchronous federated learning of heterogeneous participants having non-IID data. We also propose an asynchronous stochastic gradient descent algorithm for general distributed learning of heterogeneous participants having non-IID data with convergence analysis. Experiments have shown the efficacy of our proposed approaches.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2022
Semester
Spring
Advisor
Wang, Liqiang
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0009448; DP0027171
URL
https://purls.library.ucf.edu/go/DP0027171
Language
English
Release Date
November 2023
Length of Campus-only Access
1 year
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Liu, Zixia, "Towards More Efficient Collaborative Distributed Data Analysis and Learning" (2022). Electronic Theses and Dissertations, 2020-2023. 1477.
https://stars.library.ucf.edu/etd2020/1477