Large-Scale Phylogenies and Measuring the Performance of Phylogenetic Estimators

Kim, Junhyong
March 1998
Systematic Biology;Mar1998, Vol. 47 Issue 1, p43
Academic Journal
Performance measures of phylogenetic estimation methods such as accuracy, consistency, and power are an attempt at summarizing an ensemble of a given estimator's behavior. These summaries characterize an ensemble behavior with a single number, leading to a variety of definitions. In particular, the relationships between different performance measures such as accuracy and consistency or accuracy and error depend on the exact definition of these measures. In addition, it is relatively common to use large-sample behavior to infer similar behavior for small samples. In fact, large-sample results such as the claimed asymptotic efficiency of the maximumlikelihood estimator are often uninformative for small samples. Conversely, small-sample behavior using simulations is sometimes used to imply large-sample behavior such as consistency. However, such extrapolation is often difficult. How the performance of a phylogenetic estimator scales with the addition of taxa must be qualified with respect to whether the whole tree is being estimated or a fixed subset of taxa is being estimated. It must also be qualified with respect to how tree models are sampled. Over the ensemble of all possible trees of a given size, the performance of the estimators for the whole tree estimate suffers when the tree size becomes larger. However, under certain models of cladogenesis, the estimate can improve with the addition of taxa. In fact, at all numbers of taxa there are subsets of tree models that are easier to estimate than others. This suggests that with judicious addition or subtraction of taxa we can move from tree models that are more difficult to estimate at one number of taxa to those that are easier to estimate at another number of taxa. \[Accuracy; consistency; efficiency; large-scale phylogeny; performance.].


Related Articles

  • Sensitivity of Phylogeny Estimation to Taxonomic Sampling. Poe, Steven // Systematic Biology;Mar1998, Vol. 47 Issue 1, p18 

    Recent studies have shown that addition or deletion of taxa from a data matrix can change the estimate of phylogeny. I used 29 data sets from the literature to examine the effect of taxon sampling on phylogeny estimation within data sets. I then used multiple regression to assess the effect of...

  • Is It Better to Add Taxa or Characters to a Difficult Phylogenetic Problem? Graybeal, Anna // Systematic Biology;Mar1998, Vol. 47 Issue 1, p9 

    Abstract. -The effects on phylogenetic accuracy of adding characters and/or taxa were explored using data generated by computer simulation. The conditions of this study were constrained but allowed for systematic investigation of certain parameters. The starting point for the study was a...

  • Taxonomic Sampling, Phylogenetic Accuracy, and Investigator Bias. Hillis, David M. // Systematic Biology;Mar1998, Vol. 47 Issue 1, p3 

    Comments on a series of articles published in the March 1998 issue of 'Systematic Biology,' which examine the effects of taxonomic sampling on phylogenetic analysis. Local versus global effects of taxonomic sampling; Taxonomic sampling schemes; Addition of increasingly distantly-related taxa to...

  • Lagomorphs Misplaced by More Characters and Fewer Taxa. Halanych, Kenneth M. // Systematic Biology;Mar1998, Vol. 47 Issue 1, p138 

    Discusses the limitations and advantages of collecting more data from a few representative taxa or obtaining limited data representation from a wide range of taxa, in phylogenetic reconstruction for Lagomorpha. Problems of taxonomic representation in a four-taxon approach; Underrepresentation...

  • On the Best Evolutionary Rate for Phylogenetic Analysis. Yang, Ziheng // Systematic Biology;Mar1998, Vol. 47 Issue 1, p125 

    The effect of the evolutionary rate of a gene on the accuracy of phylogeny reconstruction was examined by computer simulation. The evolutionary rate is measured by the tree length, that is, the expected total number of nucleotide substitutions per site on the phylogeny. DNA sequence data were...

  • Towards an Inclusive Philosophy for Phylogenetic Inference. Faith, Daniel P.; Trueman, John W. H. // Systematic Biology;June2001, Vol. 50 Issue 3, p331 

    We defend and expand on our earlier proposal for an inclusive philosophical framework for phylogenetics, based on an interpretation of Popperian corroboration that is decoupled from the popular falsificationist interpretation of Popperian philosophy. Any phylogenetic inference method can provide...

  • Component Coding, Three-Item Coding, and Consensus Methods. Williams, David M.; Humphries, Christopher J. // Systematic Biology;Apr2003, Vol. 52 Issue 2, p255 

    Discusses the use of component coding, three-item coding and consensus methods to express cladograms or the phylogenetic relationships of organisms. Description of component coding as matrix representation with parsimony; Consideration of each node in three-item coding as a relation between...

  • Tree Robustness and Clade Significance. Lee, Michael S. Y. // Systematic Biology;Dec2000, Vol. 49 Issue 4, p829 

    Assesses the cladistic significance in analyzing tree robustness. Basis of testing; Use of comparison across different data sets; Ratio in supporting the conflicting characters.


    Examines the use of cladistic parsimony to infer phylogenetic trees. Evaluation of tree topologies; Presuppositions of parsimony; Problems on parsimony.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics