ORCID
0000-0002-8523-9074
Keywords
computer vision, foundation models, federated learning, multi-modal learning, parameter-efficient fine-tuning
Abstract
The emergence of large-scale foundation models has reshaped computer vision and multi-modal learning, but their substantial computational and communication demands pose significant challenges for deployment in federated learning (FL) systems. Federated learning enables collaborative model training across decentralized clients without sharing raw data, offering strong privacy guarantees. However, adapting foundation models in FL remains difficult due to communication bottlenecks, heterogeneous data distributions, limited device resources, and scarce labeled data. This dissertation presents a unified line of research that addresses these challenges through four complementary frameworks, progressively advancing the efficiency, personalization, and practicality of foundation model adaptation in federated settings.
First, FedPEFT investigates parameter-efficient fine-tuning as a means to enable foundation models in FL. By updating and communicating only lightweight parameter subsets, FedPEFT drastically reduces communication overhead while maintaining strong performance and robustness under diverse federated conditions.
Building upon this communication-efficient foundation, FedPerfix focuses on personalized federated learning for Vision Transformers. Through a systematic analysis of transformer components, it identifies heterogeneity-sensitive layers and introduces a selective personalization strategy that balances global generalization with client-specific adaptation.
The third contribution, FedCola, extends federated learning to multi-modal transformers. It introduces a collaborative framework that enables clients with unpaired vision and language data to jointly train a unified multi-modal model. By addressing both in-modality and cross-modality gaps using parameter-based collaboration, FedCola establishes a systematic approach to multi-modal federated learning.
Finally, FedMox advances toward real-world deployment by introducing Practical Semi-Supervised Federated Learning, which models settings where edge clients hold unlabeled, low-resolution data and limited resources. Through a sparse mixture-of-experts architecture with spatial routing and soft aggregation, FedMox enables efficient foundation model adaptation for complex tasks such as object detection.
Together, these contributions demonstrate that foundation models can be efficiently trained, personalized, and deployed in federated environments, advancing FL toward scalable and practical real-world applications.
Completion Date
2026
Semester
Spring
Committee Chair
Chen, Chen
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Format
Document Type
Dissertation
Identifier
DP0053099
STARS Citation
Sun, Guangyu, "Federated Learning in the Era of Foundation Models" (2026). Graduate Studies Theses and Dissertations 2026. 192.
https://stars.library.ucf.edu/gradstudies_etd_2026/192
Accessibility Statement
This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.