This thesis focuses on two important computational problems in genomics and metagenomics with the public available next-generation sequencing data. One is about gene regulation, for which we explore how distal regulatory elements may interact with the proximal regulatory elements. The other is about metagenomics, in which we study how to reconstruct bacterial strain genomes from shotgun reads. Studying gene regulation, especially distal gene regulation, is important because regulatory elements, including those in distal regulatory regions, orchestrate when, where and how much a gene is activated under every experimental condition. Their dysfunction results in various types of diseases. Moreover, the current study on distal gene regulation is still under development. The study of bacterial strains is also vital, as the bacterial strains are the main source of drug resistance, mixed infection, reinfection, etc. The study of novel bacterial strains is still in its infancy, with only one tool that can work with multiple metagenomic samples while has suboptimal performance. We identified hundreds of pairs of regulatory elements that are biologically sound and are likely to contribute to the interaction of distal and proximal regulatory regions. We demonstrated for the first time that ribosomal protein genes share common distal regulatory regions under the same experimental conditions and might be differentially regulated across different experimental conditions. In addition, we developed a novel approach called SMS to reconstruct novel bacterial strains from multiple shotgun metagenomic samples. Tested on 702 simulated and 195 experimental datasets, we showed that SMS has high accuracy in inferring the present strains, including the strain number, strain abundance, strain variations, etc. Compared with the two existing approaches, SMS shows much better performance. Our studies shed new light on genomics and generated novel tools in metagenomics.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Wang, Saidi, "Computational Methods to Analyze Next-generation Sequencing Data in Genomics and Metagenomics" (2022). Electronic Theses and Dissertations, 2020-. 1691.