ORCID
https://orcid.org/0009-0008-4555-7677
Keywords
Representation Learning, Metric Learning, Efficient Transfer Learning
Abstract
Representation Learning and Transfer Learning are fundamental challenges in deep learning, particularly for downstream computer vision tasks. While existing research predominantly focuses on loss functions and network architectures, this dissertation explores how modeling distance metrics and data distributions can enhance representation and transfer learning efficiency in computer vision applications, specifically focusing on image captioning and retrieval tasks.
We make contributions primarily in three subareas. First, we develop an image captioning system that leverages geometric relationships among objects using a novel object graph structure, enhancing global visual representations for more accurate caption generation. We further incorporate adversarial learning to model data distributions, enabling the generation of diverse yet coherent captions by matching feature discrepancies before and after the adversarial discriminator. Second, we advance image retrieval by aligning cross-modal distributions between visual and textual domains. Our approach adapts representations across vision and language modalities while minimizing distributional gaps among different classes. We further enhance retrieval performance through proxy-based domain adaptation within deep metric learning, aligning data distributions to proxy feature points. Finally, we improve transfer learning efficiency by introducing learnable visual prompts accompanying input images during the fine-tuning of pre-trained models. This approach facilitates effective knowledge transfer for downstream image retrieval tasks. We demonstrate that learning deep metrics between visual prompts and aligning class-based information significantly enhances transfer learning effectiveness.
Our comprehensive experiments demonstrate that explicitly modeling data metrics and distributions substantially improves performance across various computer vision tasks. The proposed methods advance the state-of-the-art in both representation learning and transfer learning, particularly in image captioning and retrieval applications.
Completion Date
2025
Semester
Spring
Committee Chair
Hua, Kien
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Identifier
DP0029374
Document Type
Dissertation/Thesis
Campus Location
Orlando (Main) Campus
STARS Citation
Ren, Li, "Modeling Data Metrics And Distributions For Representation And Efficient Transfer Learning" (2025). Graduate Thesis and Dissertation post-2024. 205.
https://stars.library.ucf.edu/etd2024/205