Data partitions and complex models in Bayesian analysis: The phylogeny of Gymnophthalmid lizards
Abbreviated Journal Title
Autocorrelated gamma; Bayesian analysis; combining data; Gymnophthalmidae; likelihood models; partitioning data; Reptilia; site-specific gamma; SITE RATE VARIATION; MAXIMUM-LIKELIHOOD-ESTIMATION; MITOCHONDRIAL; RIBOSOMAL DNA; NUCLEOTIDE SUBSTITUTION; MOLECULAR SYSTEMATICS; SECONDARY; STRUCTURE; BOOTSTRAP MEASURES; GENE-SEQUENCES; EMPIRICAL-DATA; RNA; STRUCTURES; Evolutionary Biology
Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.
"Data partitions and complex models in Bayesian analysis: The phylogeny of Gymnophthalmid lizards" (2004). Faculty Bibliography 2000s. 4250.