ORCID
0000-0003-2755-4727
Keywords
Machine Learning, Artificial Intelligence, Transformers, Drug-Drug Interaction, Proteomics, Bioinformatics
Abstract
A significant catalyst of the recent Artificial Intelligence and Machine Learning revolution is attributed to large language models (LLMs), which include many prominent AI systems such as ChatGPT, Bard, and Gemini. The majority of high-performance LLM models are based on the transformer architecture and the primary factor behind the recent enhancement in the performance of these models is attributed to the increase of layer quantity and dimensionality, resulting in an exponential rise in the total number of learnable parameters, necessitating substantial data for effective training. This may pose a significant barrier in certain application domains, such as bioinformatics, pharmaceutical research, and medicinal applications, where data is constrained by its sensitive nature, confidentiality restrictions, and personal privacy laws. The purpose of this study is to design architectural level and training loop modifications to the transformer architecture to make it more suitable for low-data applications. Our proposed modifications include selective transfer learning, cross transfer learning, hierarchical system design, convolutional transformers, knowledge graph fusion models, etc. To test the efficacy of the proposed techniques, we have applied the ideas to various applications, including Enzyme Commission (EC) number prediction from fullscale protein sequences. Here, we have linked four transformer ProtBert modules in a hierarchical manner and selectively pretrained them to obtain an accuracy increase from 89.93% to 97.88% and an F1 score boost from 0.9323 to 0.9787 compared to conventional transformer architectures. We have also designed a multimodal, knowledge graph integrated transformer model that could predict fetal drug-drug interaction events from drug SMILES and achieve accuracy increases of up to 6.78% on inductive test settings. Lastly, we have developed a parallel transformer modular architecture for predicting Gene Ontology (GO) terms from full-scale protein sequences and obtain better results with less training data and computational resources compared to other conventional transformer models.
Completion Date
2024
Semester
Fall
Committee Chair
Yuan, Jiann-Shiun
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Electrical and Computer Engineering
Degree Program
Electrical Engineering
Format
Identifier
DP0029040
Language
English
Release Date
12-15-2024
Access Status
Dissertation
Campus Location
Orlando (Main) Campus
STARS Citation
Tamir, Azwad, "Architectural and Training Regime Modifications to Transformer Models for low data Applications" (2024). Graduate Thesis and Dissertation post-2024. 73.
https://stars.library.ucf.edu/etd2024/73
Accessibility Status
PDF accessibility verified using Adobe Acrobat Pro Accessibility Checker