Abstract
It is necessary to study bacterial strains in environmental samples. The environmental samples are mixed DNA samples collected from the ocean, soil, lake, human body sites, etc. In a natural environment, they provide us new insights into the diversity of our earth. As for bacterial strains on or inside human bodies, to select the proper treatment for diseases caused by bacterial strains, it is critical to identify the corresponding strains and reconstruct their genomes. However, it is a challenge to do so with the DNA from a large number of unknown microbial species mixed together in an environmental sample. The majority of available computational methods depend on available sequenced genomes and marker genes, which can not fully discover the strains and reconstruct their genomes from the shotgun metagenomic reads. In this dissertation, we studied bacterial strain reconstruction, including one case study about shotgun metagenomic sequencing and two novel approaches to improve the performance of reconstructing bacterial strains. Firstly, we studied how newly sequenced genomes affect the analysis result from shotgun metagenomic datasets. In this study, we found two more new phyla that were related to colitis development compared with a previous study, and the two new phyla were also more statistically significant. Furthermore, we found that one major conclusion from the previous study was not supported by repeating the analysis with an updated marker gene database and tools in metagenomics. Secondly, to better analyze shotgun metagenomic datasets, BHap, a novel algorithm based on fuzzy flow networks and de Bruijn graph was developed to reconstruct bacterial strains. BHap had high precision, recall and F1 score and low susceptibility to sequence errors. It also outperformed existing tools in terms of better precision, better recall, higher F1 score and more accurate estimation of the number of strains. Last but not least, a second approach, mixtureS, was developed by considering all genome positions. MixtureS is based on the EM algorithms and the frequency difference of strains to distinguish different strains of a bacterial species in shotgun metagenomic datasets. Compared with several existing methods including BHap, mixtureS had a better performance in terms of precision, recall, the prediction accuracy of the strain numbers and abundance. Based on the developed BHap and mixtureS methods, we also developed two software tools, which will be valuable for future strain studies in metagenomics.
Notes
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Graduation Date
2020
Semester
Fall
Advisor
Hu, Haiyan
Degree
Doctor of Philosophy (Ph.D.)
College
College of Engineering and Computer Science
Department
Computer Science
Degree Program
Computer Science
Format
application/pdf
Identifier
CFE0008348; DP0023785
URL
https://purls.library.ucf.edu/go/DP0023785
Language
English
Release Date
December 2020
Length of Campus-only Access
None
Access Status
Doctoral Dissertation (Open Access)
STARS Citation
Li, Xin, "Reconstruction of Bacterial Strain Genomes from Shotgun Metagenomic Reads" (2020). Electronic Theses and Dissertations, 2020-2023. 377.
https://stars.library.ucf.edu/etd2020/377