Sensitivity of Phylogeny Estimation to Taxonomic Sampling

Poe, Steven
March 1998
Systematic Biology;Mar1998, Vol. 47 Issue 1, p18
Academic Journal
Recent studies have shown that addition or deletion of taxa from a data matrix can change the estimate of phylogeny. I used 29 data sets from the literature to examine the effect of taxon sampling on phylogeny estimation within data sets. I then used multiple regression to assess the effect of number of taxa, number of characters, homoplasy, strength of support, and tree symmetry on the sensitivity of data sets to taxonomic sampling. Sensitivity to sampling was measured by mapping characters from a matrix of culled taxa onto optimal trees for that reduced matrix and onto the pruned optimal tree for the entire matrix, then comparing the length of the reduced tree to the length of the pruned complete tree. Within-data-set patterns can be described by a second-order equation relating fraction of taxa sampled to sensitivity to sampling. Multiple regression analyses found number of taxa to be a significant predictor of sensitivity to sampling; retention index, number of informative characters, total support index, and tree symmetry were nonsignificant predictors. I derived a predictive regression equation relating fraction of taxa sampled and number of taxa potentially sampled to sensitivity to taxonomic sampling and calculated values for this equation within the bounds of the variables examined. The length difference between the complete tree and a subsampled tree was generally small (average difference of 0-2.9 steps), indicating that subsampling taxa is probably not an important problem for most phylogenetic analyses using up to 20 taxa. \[Multiple regression; modeling; phylogeny estimation; taxonomic sampling.].


Related Articles

  • Large-Scale Phylogenies and Measuring the Performance of Phylogenetic Estimators. Kim, Junhyong // Systematic Biology;Mar1998, Vol. 47 Issue 1, p43 

    Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular,...

  • Is It Better to Add Taxa or Characters to a Difficult Phylogenetic Problem? Graybeal, Anna // Systematic Biology;Mar1998, Vol. 47 Issue 1, p9 

    Abstract. -The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a...

  • Taxonomic Sampling, Phylogenetic Accuracy, and Investigator Bias. Hillis, David M. // Systematic Biology;Mar1998, Vol. 47 Issue 1, p3 

    Comments on a series of articles published in the March 1998 issue of 'Systematic Biology,' which examine the effects of taxonomic sampling on phylogenetic analysis. Local versus global effects of taxonomic sampling; Taxonomic sampling schemes; Addition of increasingly distantly-related taxa to...

  • Lagomorphs Misplaced by More Characters and Fewer Taxa. Halanych, Kenneth M. // Systematic Biology;Mar1998, Vol. 47 Issue 1, p138 

    Discusses the limitations and advantages of collecting more data from a few representative taxa or obtaining limited data representation from a wide range of taxa, in phylogenetic reconstruction for Lagomorpha. Problems of taxonomic representation in a four-taxon approach; Underrepresentation...

  • On the Best Evolutionary Rate for Phylogenetic Analysis. Yang, Ziheng // Systematic Biology;Mar1998, Vol. 47 Issue 1, p125 

    The effect of the evolutionary rate of a gene on the accuracy of phylogeny reconstruction was examined by computer simulation. The evolutionary rate is measured by the tree length, that is, the expected total number of nucleotide substitutions per site on the phylogeny. DNA sequence data were...

  • Towards an Inclusive Philosophy for Phylogenetic Inference. Faith, Daniel P.; Trueman, John W. H. // Systematic Biology;June2001, Vol. 50 Issue 3, p331 

    We defend and expand on our earlier proposal for an inclusive philosophical framework for phylogenetics, based on an interpretation of Popperian corroboration that is decoupled from the popular falsificationist interpretation of Popperian philosophy. Any phylogenetic inference method can provide...

  • Component Coding, Three-Item Coding, and Consensus Methods. Williams, David M.; Humphries, Christopher J. // Systematic Biology;Apr2003, Vol. 52 Issue 2, p255 

    Discusses the use of component coding, three-item coding and consensus methods to express cladograms or the phylogenetic relationships of organisms. Description of component coding as matrix representation with parsimony; Consideration of each node in three-item coding as a relation between...

  • Tree Robustness and Clade Significance. Lee, Michael S. Y. // Systematic Biology;Dec2000, Vol. 49 Issue 4, p829 

    Assesses the cladistic significance in analyzing tree robustness. Basis of testing; Use of comparison across different data sets; Ratio in supporting the conflicting characters.


    Examines the use of cladistic parsimony to infer phylogenetic trees. Evaluation of tree topologies; Presuppositions of parsimony; Problems on parsimony.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics