ORCID
https://orcid.org/0000-0002-4537-2620
Keywords
Representation space, Contrastive multi-modal learning, Multi-task learning, Out-of-distribution detection, LLLM tokenizer optimization, Neuro-symbolic representaion
Abstract
The deployment of real-world artificial intelligence necessitates models capable of processing diverse data modalities while maintaining strict standards of computational efficiency and operational reliability. This thesis investigates the structured multi-modal representation space by systematically advancing how embedding and tokenization spaces are constructed, evaluated, and utilized across domains. To address the inherent challenges of combining shared and task-specific learning objectives in multi-modal environments, we first propose a multi-task contrastive learning framework that strategically partitions the embedding space. This structuring accommodates diverse classification and regression requirements, significantly improving overall accuracy and generalization while retaining the robust feature alignment of contrastive learning. Building upon these foundational representations, we ensure system reliability by developing an angular distance-based out-of-distribution detection methodology. By formulating a distance transformation compliant with the training processes, this approach accurately identifies when a model operates outside its knowledge limits, establishing robust safety boundaries without requiring computationally expensive model retraining. Furthermore, to improve data utilization efficiency, we introduce a neuro-symbolic framework for 3D scene representation. This method automatically converts dense 3D point clouds into compact, hybrid formats by substituting recognized neural entities with verified symbolic objects by optimizing parameter space. This drastically reduces the computational overhead for downstream tasks. Finally, we optimize the semantic representation space within Large Language Models to improve the inference speed and model generation quality. By introducing a Bayesian model of token importance, calculated via domain, specific token frequency and gradients, we update the tokenization space to enhance semantic understandability while minimizing memory requirements. By addressing these fundamental challenges, this dissertation lays the groundwork for creating reliable, and transparent AI systems for real-world applications.
Completion Date
2026
Semester
Spring
Committee Chair
Dr. Hao Zheng
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Department of Electrical and Computer Engineering
Format
Document Type
Dissertation
Identifier
DP0053181
Release Date
5-15-2027
STARS Citation
Hossain, M Shifat, "Optimization of the Structured Multi-Modal Representation Space for Efficient and Reliable AI Systems" (2026). Graduate Studies Theses and Dissertations 2026. 84.
https://stars.library.ucf.edu/gradstudies_etd_2026/84
Accessibility Statement
This item was created or digitized prior to April 24, 2027, or is a reproduction of legacy media created before that date. It is preserved in its original, unmodified state specifically for research, reference, or historical recordkeeping. In accordance with the ADA Title II Final Rule, the University Libraries provides accessible versions of archival materials upon request. To request an accommodation for this item, please submit an accessibility request form.