Title

MBBC: an efficient approach for metagenomic binning based on clustering

Authors

Authors

Y. Wang; H. Y. Hu;X. M. Li

Comments

Authors: contact us about adding a copy of your work at STARS@ucf.edu

Abbreviated Journal Title

BMC Bioinformatics

Keywords

Metagenomics; Binning; Taxonomy-independent; EM Algorithm; Markov; properties; INTERPOLATED MARKOV-MODELS; PHYLOGENETIC CLASSIFICATION; TAXONOMIC; CLASSIFICATION; MAXIMUM-LIKELIHOOD; GENOMIC FRAGMENTS; DNA-SEQUENCES; MICROBIAL GENOMES; L-TUPLES; ALGORITHM; CHALLENGES; Biochemical Research Methods; Biotechnology & Applied Microbiology; Mathematical & Computational Biology

Abstract

Background: Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. Results: We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. Conclusions: We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/similar to xiaoman/MBBC/MBBC.html.

Journal Title

Bmc Bioinformatics

Volume

16

Publication Date

1-1-2015

Document Type

Article

Language

English

First Page

11

WOS Identifier

WOS:000349250400001

ISSN

1471-2105

Share

COinS