Abstract
Powered by the high-throughput genomic technologies, the RNA sequencing (RNA-Seq) method is capable of measuring transcriptome-wide mRNA expressions and molecular activities in cells. Elucidation of gene expressions at the isoform resolution enables the detection of better molecular signatures for phenotype prediction, and the identified biomarkers may provide insights into the functional consequences of disease. This dissertation research focuses on developing advanced machine learning algorithms for mining large-scale RNA-Seq data in cancer transcriptome analysis. A platform-integrated model for transcript quantification (IntMTQ) is developed to improve the performance of RNA-Seq on isoform expression estimation. IntMTQ provides more precise RNA-Seq-based isoform quantification, and the gene expressions learned by IntMTQ consistently provide more and better molecular features for downstream analyses. In light of recent challenges posted by the COVID-19 pandemic, computational methods are developed and applied to RNA-Seq data of lung cancer cell lines to detect novel molecular signatures that are highly correlated with SARS-CoV-2 pathogenesis and prognosis for COVID-19 studies. The results from the data analyses demonstrate that post-transcriptional gene regulations provide additional molecular signatures for COVID-19 therapeutic targets compared to the transcriptional signatures. To further investigate post-transcriptional regulations, a pan-cancer analysis is performed to reveal discrete intronic polyadenylation in human cancer transcriptome. The identified intronic APA profile can add additional prognostic and predictive power beyond conventional gene expression profiles in cancer survival analysis and phenotype prediction. In view of this, a biological pathway encoded transformer model is proposed to maximize the use of RNA-Seq data for cancer phenotype prediction.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu.
Graduation Date
2023
Semester
Spring
Advisor
Zhang, Wei
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Identifier
CFE0009868; DP0028138
URL
https://purls.library.ucf.edu/go/DP0028138
Language
English
Release Date
November 2023
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Sun, Jiao, "Machine Learning Algorithms For Molecular Signature Identification with High-throughput Genome Sequencing Data" (2023). Electronic Theses and Dissertations, 2020-2023. 1897.
https://stars.library.ucf.edu/etd2020/1897