Phylogenetic reconstruction from transpositions

Feng Yue; Meng Zhang; Jijun Tang
January 2008
BMC Genomics;2008 Supplement 2, Vol. 9, Special section p1
Academic Journal
Background: Because of the advent of high-throughput sequencing and the consequent reduction in the cost of sequencing, many organisms have been completely sequenced and most of their genes identified. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the rearrangement of these entities through computational means. As a result, genome rearrangement data has attracted increasing attentions from both biologists and computer scientists as a new type of data for phylogenetic analysis. The main events of genome rearrangements include inversions, transpositions and transversions. To date, GRAPPA and MGR are the most accurate methods for rearrangement phylogeny, both assuming inversion as the only event. However, due to the complexity of computing transposition distance, it is very difficult to analyze datasets when transpositions are dominant. Results: We extend GRAPPA to handle transpositions. The new method is named GRAPPA-TP, with two major extensions: a heuristic method to estimate transposition distance, and a new transposition median solver for three genomes. Although GRAPPA-TP uses a greedy approach to compute the transposition distance, it is very accurate when genomes are relatively close. The new GRAPPA-TP is available from http://phylo.cse.sc.edu/. Conclusion: Our extensive testing using simulated datasets shows that GRAPPA-TP is very accurate in terms of ancestor genome inference and phylogenetic reconstruction. Simulation results also suggest that model match is critical in genome rearrangement analysis: it is not accurate to simulate transpositions with other events including inversions.


Related Articles

  • Multiple sequence alignment accuracy and evolutionary distance estimation. Rosenberg, Michael S. // BMC Bioinformatics;2005, Vol. 6, p278 

    Background: Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by...

  • Genome analysis with inter-nucleotide distances. Afreixo, Vera; Bastos, Carlos A. C.; Pinho, Armando J.; Garcia, Sara P.; Ferreira, Paulo J. S. G. // Bioinformatics;Dec2009, Vol. 25 Issue 23, p3064 

    Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The...

  • Detecting Signatures of Selection Through Haplotype Differentiation Among Hierarchically Structured Populations. Fariello, María Inés; Boitard, Simon; Naya, Hugo; SanCristobal, Magali; Servin, Bertrand // Genetics;Mar2013, Vol. 193 Issue 3, p929 

    The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations and to look for genomic regions with outstanding genetic differentiation between these populations....

  • Sequence-Level Population Simulations Over Large Genomic Regions. Hoggart, Clive J.; Chadeau-Hyam, Marc; Clark, Taane G.; Lampariello, Riccardo; Whittaker, John C.; De Iorio, Maria; Balding, David J. // Genetics;Nov2007, Vol. 177 Issue 3, p1725 

    Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic diversity, and for assessing the performance of statistical techniques, for example those designed to detect and measure the genomic effects of...

  • From Bad to Good: Fitness Reversals and the Ascent of Deleterious Mutations. Cowperthwaite, Matthew C.; Bull, J. J.; Meyers, Lauren Ancel // PLoS Computational Biology;Oct2006, Vol. 2 Issue 10, p1292 

    Deleterious mutations are considered a major impediment to adaptation, and there are straightforward expectations for the rate at which they accumulate as a function of population size and mutation rate. In a simulation model of an evolving population of asexually replicating RNA molecules,...

  • Characterization of donor genome contents of backcross progenies detected by SSR markers in rice. Zhang-Ying Xi; Feng-Hua He; Rui-Zhen Zeng; Ze-Min Zhang; Xiao-Hua Ding; Wen-Tao Li; Gui-Quan Zhang // Euphytica;Apr2008, Vol. 160 Issue 3, p369 

    Backcrossing is a trait introgression method of renewed importance in crops. The evolution of introgressed or substituted segments through backcross generations has been analyzed theoretically using simulations. In this study, the content of donor genomes, including donor segment number (DSN),...

  • Medians seek the corners, and other conjectures. Haghighi, Maryam; Sankoff, David // BMC Bioinformatics;2012, Vol. 13 Issue Suppl 19, p1 

    Background: Median construction is at the heart of several approaches to gene-order phylogeny. It has been observed that the solution to a median problem is generally not unique, and that alternate solutions may be quite different. Another concern has to do with a tendency for medians to fall on...

  • GAT: a simulation framework for testing the association of genomic intervals. Heger, Andreas; Webber, Caleb; Goodson, Martin; Ponting, Chris P.; Lunter, Gerton // Bioinformatics;Aug2013, Vol. 29 Issue 16, p2046 

    Motivation: A common question in genomic analysis is whether two sets of genomic intervals overlap significantly. This question arises, for example, when interpreting ChIP-Seq or RNA-Seq data in functional terms. Because genome organization is complex, answering this question is...

  • An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes. Lasky-Su, Jessica; Murphy, Amy; McQueen, Matthew B.; Weiss, Scott; Lange, Christoph // European Journal of Human Genetics;Jun2010, Vol. 18 Issue 6, p720 

    We propose an omnibus family-based association test (MFBAT) that can be applied to multiple markers and multiple phenotypes and that has only one degree of freedom. The proposed test statistic extends current FBAT methodology to incorporate multiple markers as well as multiple phenotypes. Using...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics