Abbreviated Journal Title
Metagenomics; Binning; Taxonomy-independent; EM Algorithm; Markov; properties; INTERPOLATED MARKOV-MODELS; PHYLOGENETIC CLASSIFICATION; TAXONOMIC; CLASSIFICATION; MAXIMUM-LIKELIHOOD; GENOMIC FRAGMENTS; DNA-SEQUENCES; MICROBIAL GENOMES; L-TUPLES; ALGORITHM; CHALLENGES; Biochemical Research Methods; Biotechnology & Applied Microbiology; Mathematical & Computational Biology
Background: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. Results: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. Conclusions: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/similar to xiaoman/MBBC/MBBC.html.
Wang, Ying; Hu, Haiyan; and Li, Xiaoman, "MBBC: an efficient approach for metagenomic binning based on clustering" (2015). Faculty Bibliography 2010s. 6860.