Abstract

It is necessary to study bacterial strains in environmental samples. The environmental samples are mixed DNA samples collected from the ocean, soil, lake, human body sites, etc. In a natural environment, they provide us new insights into the diversity of our earth. As for bacterial strains on or inside human bodies, to select the proper treatment for diseases caused by bacterial strains, it is critical to identify the corresponding strains and reconstruct their genomes. However, it is a challenge to do so with the DNA from a large number of unknown microbial species mixed together in an environmental sample. The majority of available computational methods depend on available sequenced genomes and marker genes, which can not fully discover the strains and reconstruct their genomes from the shotgun metagenomic reads. In this dissertation, we studied bacterial strain reconstruction, including one case study about shotgun metagenomic sequencing and two novel approaches to improve the performance of reconstructing bacterial strains. Firstly, we studied how newly sequenced genomes affect the analysis result from shotgun metagenomic datasets. In this study, we found two more new phyla that were related to colitis development compared with a previous study, and the two new phyla were also more statistically significant. Furthermore, we found that one major conclusion from the previous study was not supported by repeating the analysis with an updated marker gene database and tools in metagenomics. Secondly, to better analyze shotgun metagenomic datasets, BHap, a novel algorithm based on fuzzy flow networks and de Bruijn graph was developed to reconstruct bacterial strains. BHap had high precision, recall and F1 score and low susceptibility to sequence errors. It also outperformed existing tools in terms of better precision, better recall, higher F1 score and more accurate estimation of the number of strains. Last but not least, a second approach, mixtureS, was developed by considering all genome positions. MixtureS is based on the EM algorithms and the frequency difference of strains to distinguish different strains of a bacterial species in shotgun metagenomic datasets. Compared with several existing methods including BHap, mixtureS had a better performance in terms of precision, recall, the prediction accuracy of the strain numbers and abundance. Based on the developed BHap and mixtureS methods, we also developed two software tools, which will be valuable for future strain studies in metagenomics.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2020

Semester

Fall

Advisor

Hu, Haiyan

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0008348; DP0023785

URL

https://purls.library.ucf.edu/go/DP0023785

Language

English

Release Date

December 2020

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

Share

COinS