Modern information era gives rise to the persistent generation of large amounts of data with rapid speed and broad geographical distribution. Obtaining knowledge and understanding via analysis and learning from such data have invaluable worth. Features of such data analytical tasks commonly include: data can be large scale and geographically distributed; computing capability demand can be enormous; tasks can be time-critical; some data can be private; participants can have heterogeneous capabilities and non-IID data; and multiple simultaneously submitted data analytical tasks can be possible. These bring challenges to contemporary computing infrastructure and learning models. In view of this, we develop techniques with the purpose of tackling above challenges together towards more efficient collaborative distributed data analysis and learning. We propose a hierarchical framework that supports data analytics on multiple Apache Spark clusters. We propose reinforcement learning based resource management approaches to improve overall efficiency and reduce deadline violations for scheduling general and time-critical data analytical workflows among computing resources. We establish a new hybrid framework for efficient privacy-preserving federated learning and further propose an algorithm upon it for improving asynchronous federated learning of heterogeneous participants having non-IID data. We also propose an asynchronous stochastic gradient descent algorithm for general distributed learning of heterogeneous participants having non-IID data with convergence analysis. Experiments have shown the efficacy of our proposed approaches.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Liu, Zixia, "Towards More Efficient Collaborative Distributed Data Analysis and Learning" (2022). Electronic Theses and Dissertations, 2020-. 1477.