Clustering Dna Sequences Using The Out-Of-Place Measure With Reduced N-Grams

Keywords

Alignment-free method; Out-of-place measure; Phylogenetic analysis; Reduced n-gram

Abstract

The alignment-free n-gram based method with the out-of-place measures as the distance has been successfully applied to automatic text or natural languages categorization in real time. However, it is not clear about its performance and the selection of n for comparing genome sequences. Here we propose a symmetric version of the out-of-place measure and a new approach for finding the optimal range of n to construct a phylogenetic tree with the symmetric out-of-place measures. Our method is then applied to real genome sequence datasets. The resulting phylogenetic trees are matching with the standard biological classification. It shows that our proposed method is a very powerful tool for phylogenetic analysis in terms of both classification accuracy and computation efficiency.

Publication Date

10-7-2016

Publication Title

Journal of Theoretical Biology

Volume

406

Number of Pages

61-72

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.jtbi.2016.06.029

Socpus ID

84978298571 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84978298571

This document is currently not available here.

Share

COinS