The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study

Dalquen, Daniel A.; Altenhoff, Adrian M.; Gonnet, Gaston H.; Dessimoz, Christophe
February 2013
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
Academic Journal
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another. Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.


Related Articles

  • GenPhyloData: realistic simulation of gene family evolution. Sjöstrand, Joel; Arvestad, Lars; Lagergren, Jens; Sennblad, Bengt // BMC Bioinformatics;2013, Vol. 14 Issue 1, p1 

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and--perhaps more interestingly--also supports generation of gene family trees...

  • A cost-effective and universal strategy for complete prokaryotic genomic sequencing proposed by computer simulation.  // BMC Research Notes;2012, Vol. 5 Issue 1, p80 

    The article focuses on a study conducted to test various pyrosequencing strategies by simulated assembling from 100 prokaryotic genomes, which found that a cost-effective way for prokaryotic whole genome sequencing is a single end 454 Jr. run combined with a paired end 454 Jr. run. Solution to...

  • Genomic insights into the evolution of hybrid isoprenoid biosynthetic gene clusters in the MAR4 marine streptomycete clade. Gallagher, Kelley A.; Jensen, Paul R. // BMC Genomics;11/17/2015, Vol. 16, p1 

    Background: Considerable advances have been made in our understanding of the molecular genetics of secondary metabolite biosynthesis. Coupled with increased access to genome sequence data, new insight can be gained into the diversity and distributions of secondary metabolite biosynthetic gene...

  • Natural Genetic Transformation Generates a Population of Merodiploids in Streptococcus pneumoniae. Johnston, Calum; Caymaris, Stéphanie; Zomer, Aldert; Bootsma, Hester J.; Prudhomme, Marc; Granadel, Chantal; Hermans, Peter W. M.; Polard, Patrice; Martin, Bernard; Claverys, Jean-Pierre // PLoS Genetics;Sep2013, Vol. 9 Issue 9, p1 

    Partial duplication of genetic material is prevalent in eukaryotes and provides potential for evolution of new traits. Prokaryotes, which are generally haploid in nature, can evolve new genes by partial chromosome duplication, known as merodiploidy. Little is known about merodiploid formation...

  • Which Phylogenetic Networks are Merely Trees with Additional Arcs? FRANCIS, ANDREW R.; STEEL, MIKE // Systematic Biology;Sep2015, Vol. 64 Issue 5, p768 

    A binary phylogenetic network may or may not be obtainable from a tree by the addition of directed edges (arcs) between tree arcs. Here, we establish a precise and easily tested criterion (based on "2-SAT") that efficiently determines whether or not any given network can be realized in this way....

  • Gene duplications and horizontal gene transfer during early evolution. Gogarten, J.; Hilario, Elena; Olendzenski, Lorraine // Origins of Life & Evolution of the Biosphere;1996, Vol. 26 Issue 3-5, p284 

    An abstract of the paper "Gene Duplications and Horizontal Gene Transfer During Early Evolution," by J. Peter Gogarten and colleagues presented at the triennial meeting of the International Society for the Study of the Origin of Life (ISSOL) in July 1996 in Orléans, France is provided.

  • Signature of backward replication slippage at the copy number variation junction. Ohye, Tamae; Inagaki, Hidehito; Ozaki, Mamoru; Ikeda, Toshiro; Kurahashi, Hiroki // Journal of Human Genetics;May2014, Vol. 59 Issue 5, p247 

    Copy number abnormalities such as deletions and duplications give rise to a variety of medical problems and also manifest innocuous genomic variations. Aberrant DNA replication is suggested as the mechanism underlying de novo copy number abnormalities, but the precise details have remained...

  • A unique phenotype in a patient with a rare triplication of the 22q11.2 region and new clinical insights of the 22q11.2 microduplication syndrome: a report of two cases. Vaz, Sara O.; Pires, Renato; Pires, Luís M.; Carreira, Isabel M.; Anjos, Rui; Maciel, Paula; Mota-Vieira, Luisa // BMC Pediatrics;Aug2015, Vol. 15 Issue 1, p1 

    Background: The rearrangements of the 22q11.2 chromosomal region, most frequently deletions and duplications, have been known to be responsible for multiple congenital anomaly disorders. These rearrangements are implicated in syndromes that have some phenotypic resemblances. While the 22q11.2...

  • Precise detection of chromosomal translocation or inversion breakpoints by whole-genome sequencing. Suzuki, Toshifumi; Tsurusaki, Yoshinori; Nakashima, Mitsuko; Miyake, Noriko; Saitsu, Hirotomo; Takeda, Satoru; Matsumoto, Naomichi // Journal of Human Genetics;Dec2014, Vol. 59 Issue 12, p649 

    Structural variations (SVs), including translocations, inversions, deletions and duplications, are potentially associated with Mendelian diseases and contiguous gene syndromes. Determination of SV-related breakpoints at the nucleotide level is important to reveal the genetic causes for diseases....


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics