Clustering Dna Sequences Using The Out-Of-Place Measure With Reduced N-Grams
Keywords
Alignment-free method; Out-of-place measure; Phylogenetic analysis; Reduced n-gram
Abstract
The alignment-free n-gram based method with the out-of-place measures as the distance has been successfully applied to automatic text or natural languages categorization in real time. However, it is not clear about its performance and the selection of n for comparing genome sequences. Here we propose a symmetric version of the out-of-place measure and a new approach for finding the optimal range of n to construct a phylogenetic tree with the symmetric out-of-place measures. Our method is then applied to real genome sequence datasets. The resulting phylogenetic trees are matching with the standard biological classification. It shows that our proposed method is a very powerful tool for phylogenetic analysis in terms of both classification accuracy and computation efficiency.
Publication Date
10-7-2016
Publication Title
Journal of Theoretical Biology
Volume
406
Number of Pages
61-72
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.jtbi.2016.06.029
Copyright Status
Unknown
Socpus ID
84978298571 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84978298571
STARS Citation
Huang, Hsin Hsiung and Yu, Chenglong, "Clustering Dna Sequences Using The Out-Of-Place Measure With Reduced N-Grams" (2016). Scopus Export 2015-2019. 3443.
https://stars.library.ucf.edu/scopus2015/3443