Data partitions and complex models in Bayesian analysis: The phylogeny of Gymnophthalmid lizards

Authors

    Authors

    T. A. Castoe; T. M. Doan;C. L. Parkinson

    Comments

    Authors: contact us about adding a copy of your work at STARS@ucf.edu

    Abbreviated Journal Title

    Syst. Biol.

    Keywords

    Autocorrelated gamma; Bayesian analysis; combining data; Gymnophthalmidae; likelihood models; partitioning data; Reptilia; site-specific gamma; SITE RATE VARIATION; MAXIMUM-LIKELIHOOD-ESTIMATION; MITOCHONDRIAL; RIBOSOMAL DNA; NUCLEOTIDE SUBSTITUTION; MOLECULAR SYSTEMATICS; SECONDARY; STRUCTURE; BOOTSTRAP MEASURES; GENE-SEQUENCES; EMPIRICAL-DATA; RNA; STRUCTURES; Evolutionary Biology

    Abstract

    Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.

    Journal Title

    Systematic Biology

    Volume

    53

    Issue/Number

    3

    Publication Date

    1-1-2004

    Document Type

    Article

    Language

    English

    First Page

    448

    Last Page

    469

    WOS Identifier

    WOS:000222351000006

    ISSN

    1063-5157

    Share

    COinS