Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression and play an essential role in phenotype development. The regulation mechanism behind miRNA reveals insight into gene expression and gene regulation. Transcription Start Site(TSS) is the key to studying gene expression. However, the TSSs of miRNAs can be thousands of nucleotides away from the precursor miRNAs, which makes it hard to be detected by conventional RNA-Seq experiments. Some previous methods tried to take advantage of sequencing data using sequence features or integrated epigenetic markers, but resulted in either not condition-specific or low-resolution prediction. Furthermore, the availability of a large amount of Single-Cell RNA-Seq(scRNA-Seq) data provides remarkable opportunities for studying gene regulatory mechanisms at single-cell resolution. Incorporating the gene regulatory mechanisms can assist with cell type identification and state discovery from scRNA-Seq data. In this dissertation, we studied computational modeling of gene transcription initialization and expression, including two novel approaches to identify TSSs with various type of conditions and one case study at the single-cell level. Firstly, we studied how TSS can be identified based on Cap Analysis Gene Expression (CAGE) experiments data using the thriving Deep Learning Neural Network. We used a control model to study the Deepbind binding score features that the protein binding motif model can improve overall prediction performance. Furthermore, comparing data from unseen cell lines showed better performance than existing tools. Secondly, to better predict the TSSs of miRNA in a condition-specific manner, we built D-miRT, a two-steam convolutional neural network based on integrated low-resolution epigenetic features and high-resolution sequence features. D-miRT outperformed all baseline models and demonstrated high accuracy for miRNA TSS prediction tasks. Compared with the most recent approaches on cell-specific miRNA TSS identification using cell lines that were unseen to the model training processes, D-miRT also showed superior performance. Thirdly, to study gene transcription initialization and regulation from single-cell perspective, we developed INSISTC, an unsupervised machine learning-based approach that incorporated network structure information for single-cell type classification. In contrast to other clustering algorithms, we showed that INSISTC with the SC3 algorithm provides cluster number estimation. Future studies on gene expression and regulation will benefit from INSISTC's adaptability with regard to the kinds of biological networks that can be used.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2022
Semester
Fall
Advisor
Hu, Haiyan
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0009427; DP0027150
URL
https://purls.library.ucf.edu/go/DP0027150
Language
English
Release Date
December 2025
Length of Campus-only Access
3 years
Access Status
Doctoral Dissertation (Campus-only Access)
STARS Citation
Zheng, Hansi, "Computational Study of Gene Transcription Initialization and Regulation" (2022). Electronic Theses and Dissertations, 2020-2023. 1456.
https://stars.library.ucf.edu/etd2020/1456
Restricted to the UCF community until December 2025; it will then be open access.