Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?

Pallejà, Albert; Harrington, Eoghan D.; Bork, Peer
January 2008
BMC Genomics;2008, Vol. 9, Special section p1
Academic Journal
Background: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. Results: We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). Conclusion: Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.


Related Articles

  • The Genome The genome: you gain some, you lose some. Breuning, Martyn H. // European Journal of Human Genetics;Jun2008, Vol. 16 Issue 6, p663 

    The author reflects on the study of the germline rates of de novo meiotic deletions and duplications stimulating various genomic disorders. The author cites the importance of Darwinian theory in evaluating the DNA sequence throughout the genome. She discovers that increases dosage of the genes...

  • Demonstration of IS711 transposition in Brucella ovis and Brucella pinnipedialis. Ocampo-Sosa, Alain A.; García-Lobo, Juan M. // BMC Microbiology;2008, Vol. 8, Special section p1 

    Background: The Brucella genome contains an insertion sequence (IS) element called IS711 or IS6501, which is specific to the genus. The copy number of IS711 varies in the genome of the different Brucella species, ranging from 7 in B. abortus, B. melitensis and B. suis to more than 30 in B. ovis...

  • On Two Quantum Approaches to Adaptive Mutations in Bacteria. Ogryzko, Vasily // NeuroQuantology;Dec2009, Vol. 7 Issue 4, p564 

    The phenomenon of adaptive mutations has been attracting attention of biologists for several decades as challenging the basic premise of the Central Dogma of Molecular Biology. Two approaches, based on the quantum theoretical principles (QMAMs - Quantum Models of Adaptive Mutations) have been...

  • Transcriptome Analysis of the Role of GlnD/GlnBK in Nitrogen Stress Adaptation by Sinorhizobium meliloti Rm1021. Yurgel, Svetlana N.; Rice, Jennifer; Kahn, Michael L. // PLoS ONE;Mar2013, Vol. 8 Issue 3, p1 

    Transcriptional changes in the nitrogen stress response (NSR) of wild type S. meliloti Rm1021, and isogenic strains missing both PII proteins, GlnB and GlnK, or carrying a ΔglnD-sm2 mutation were analyzed using whole-genome microarrays. This approach allowed us to identify a number of new...

  • A Naturally Occurring Mutation in ropB Suppresses SpeB Expression and Reduces M1T1 Group A Streptococcal Systemic Virulence. Hollands, Andrew; Aziz, Ramy K.; Kansal, Rita; Kotb, Malak; Nizet, Victor; Walker, Mark J. // PLoS ONE;2008, Vol. 3 Issue 12, p1 

    Epidemiological studies of group A streptococcus (GAS) have noted an inverse relationship between SpeB expression and invasive disease. However, the role of SpeB in the course of infection is still unclear. In this study we utilize a SpeB-negative M1T1 clinical isolate, 5628, with a naturally...

  • Affinity maturation of B cells involves not only a few but a whole spectrum of relevant mutations. Weiser, Armin A.; Wittenbrink, Nicole; Zhang, Lei; Schmelzer, Andrej I.; Valai, Atijeh; Or-Guil, Michal // International Immunology;May2011, Vol. 23 Issue 5, p345 

    Affinity maturation of B lymphocytes within germinal centers involves both diversification of their B-cell receptors (BCRs) by somatic hypermutation (SHM) and a crucial receptor-mediated selection step. However, in contrast to recent advances in revealing the molecular mechanism of SHM, the...

  • Novel mutations in patients with McArdle disease by analysis of skeletal muscle mRNA. García-Consuegra, I.; Rubio, J. C.; Nogales-Gadea, G.; Bautista, J.; Jiménez, S.; Cabello, A.; Lucía, A.; Andreu, A. L.; Arenas, J.; Martin, M. A. // Journal of Medical Genetics;Mar2009, Vol. 46 Issue 3, p198 

    Objective: To identify pathogenic mutant alleles of the PYGM gene in "genetic manifesting heterozygous" patients with McArdle disease--that is, those in whom we could only find a sole mutant allele by genomic DNA analysis. Methods: We studied four unrelated patients. PCR-RFLP, gene sequencing,...

  • Estimation of Population Heterozygosity and Library Construction-Induced Mutation Rate From Expressed Sequence Tag Collections. Long, A. D.; Beldade, P.; Macdonald, S. J. // Genetics;May2007, Vol. 176 Issue 1, p711 

    Unigene alignments obtained from cDNA libraries made using multiple individuals are not currently used to estimate population heterozygosity, as they are known to harbor mutations created during library construction. We describe an estimator of population heterozygosity that utilizes only SNPs...

  • Codon adaptation index analysis of RNA genome plant viruses. Kadam, U. S.; Ghosh, S. B. // Current Science (00113891);1/10/2008, Vol. 94 Issue 1, p24 

    The article reports on the proposal of codon adaptation index (CAI) as quantitative way of predicting the expression level of gene based on its codon sequence. It is stated that CAI is designed to forecast the level of gene expression and evaluating the adaptation of viral genes to their hosts....


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics