An Ensemble Distance Measure Of K-Mer And Natural Vector For The Phylogenetic Analysis Of Multiple-Segmented Viruses
Keywords
H7N9 virus; Hausdorff distance; K-mer; Natural Vector; Phylogenetic analysis
Abstract
The Natural Vector combined with Hausdorff distance has been successfully applied for classifying and clustering multiple-segmented viruses. Additionally, k-mer methods also yield promising results for global genome comparison. It is not known whether combining these two approaches can lead to more accurate results. The author proposes a method of combining the Hausdorff distances of the 5-mer counting vectors and natural vectors which achieves the best classification without cutting off any sample. Using the proposed method to predict the taxonomic labels for the 2363 NCBI reference viral genomes dataset, the accuracy rates are 96.95%, 94.37%, 99.41% and 93.82% for the Baltimore, family, subfamily, and genus labels, respectively. We further applied the proposed method to 48 isolates of the influenza A H7N9 viruses which have eight complete segments of nucleotide sequences. The single-linkage clustering trees and the statistical hypothesis testing results all indicate that the proposed ensemble distance measure can cluster viruses well using all of their segments of genome sequences.
Publication Date
6-7-2016
Publication Title
Journal of Theoretical Biology
Volume
398
Number of Pages
136-144
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.jtbi.2016.03.004
Copyright Status
Unknown
Socpus ID
84963690241 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84963690241
STARS Citation
Huang, Hsin Hsiung, "An Ensemble Distance Measure Of K-Mer And Natural Vector For The Phylogenetic Analysis Of Multiple-Segmented Viruses" (2016). Scopus Export 2015-2019. 2670.
https://stars.library.ucf.edu/scopus2015/2670