Abstract

Metagenomics uses sequencing technologies to study genetic sequences from whole microbial communities. Binning metagenomic reads is the most fundamental step in metagenomic studies, which is essential for the understanding of microbial functions, compositions, and interactions in environmental samples. Various taxonomy-dependent and taxonomy-independent approaches have been developed based on information such as sequence similarity, sequence composition, or k-mer frequency. However, there is still room for improvement, and it is still challenging to bin reads from species with similar or low abundance or to bin reads from unknown species. In this dissertation, we introduce one taxonomy-independent and three taxonomy-dependent approaches to improve the performance of metagenomic reads binning. The taxonomy-independent method called MBBC, bins reads by considering k-mer frequency in reads without reference genomes. The first two taxonomy-dependent methods both bin reads by measuring the similarity of reads to the trained Markov Chains from different taxa. The major difference between these two methods is that the first one selects the potential taxa with the taxonomical decision tree, while the second one, called MBMC, selects potential taxa using ordinary least squares (OLS) method. The third taxonomy-dependent method bins reads by combining the methods of MBMC with clustering Markov chains from the assembled reads. By testing on both simulated and real datasets, these tools showed superior or comparable performance with various the state of the art methods. We anticipate that our tools can significantly improve the accuracy of metagenomic reads binning and thus be widely applied in real environmental samples.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2016

Semester

Fall

Advisor

Hu, Haiyan

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Computer Science

Degree Program

Computer Science

Format

application/pdf

Identifier

CFE0006515

URL

http://purl.fcla.edu/fcla/etd/CFE0006515

Language

English

Release Date

December 2019

Length of Campus-only Access

3 years

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Wang, Ying, "Computational Approaches for Binning Metagenomic Reads" (2016). Electronic Theses and Dissertations. 5344.
https://stars.library.ucf.edu/etd/5344

Download

Included in

Computer Sciences Commons

COinS

Electronic Theses and Dissertations

Computational Approaches for Binning Metagenomic Reads

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations

Computational Approaches for Binning Metagenomic Reads

Author

Abstract

Notes

Graduation Date

Semester

Advisor

Degree

College

Department

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect