An Ensemble Distance Measure Of K-Mer And Natural Vector For The Phylogenetic Analysis Of Multiple-Segmented Viruses

Keywords

H7N9 virus; Hausdorff distance; K-mer; Natural Vector; Phylogenetic analysis

Abstract

The Natural Vector combined with Hausdorff distance has been successfully applied for classifying and clustering multiple-segmented viruses. Additionally, k-mer methods also yield promising results for global genome comparison. It is not known whether combining these two approaches can lead to more accurate results. The author proposes a method of combining the Hausdorff distances of the 5-mer counting vectors and natural vectors which achieves the best classification without cutting off any sample. Using the proposed method to predict the taxonomic labels for the 2363 NCBI reference viral genomes dataset, the accuracy rates are 96.95%, 94.37%, 99.41% and 93.82% for the Baltimore, family, subfamily, and genus labels, respectively. We further applied the proposed method to 48 isolates of the influenza A H7N9 viruses which have eight complete segments of nucleotide sequences. The single-linkage clustering trees and the statistical hypothesis testing results all indicate that the proposed ensemble distance measure can cluster viruses well using all of their segments of genome sequences.

Publication Date

6-7-2016

Publication Title

Journal of Theoretical Biology

Volume

398

Number of Pages

136-144

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.jtbi.2016.03.004

Socpus ID

84963690241 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84963690241

This document is currently not available here.

Share

COinS