High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example

de Souza, Gustavo A.; Målen, Hiwa; Søfteland, Tina; Sælensminde, Gisle; Prasad, Swati; Jonassen, Inge; Wiker, Harald G.
January 2008
BMC Genomics;2008, Vol. 9, Special section p1
Academic Journal
Background: While the genomic annotations of diverse lineages of the Mycobacterium tuberculosis complex are available, divergences between gene prediction methods are still a challenge for unbiased protein dataset generation. M. tuberculosis gene annotation is an example, where the most used datasets from two independent institutions (Sanger Institute and Institute of Genomic Research-TIGR) differ up to 12% in the number of annotated open reading frames, and 46% of the genes contained in both annotations have different start codons. Such differences emphasize the importance of the identification of the sequence of protein products to validate each gene annotation including its sequence coding area. Results: With this objective, we submitted a culture filtrate sample from M. tuberculosis to a high-accuracy LTQ-Orbitrap mass spectrometer analysis and applied refined N-terminal prediction to perform comparison of two gene annotations. From a total of 449 proteins identified from the MS data, we validated 35 tryptic peptides that were specific to one of the two datasets, representing 24 different proteins. From those, 5 proteins were only annotated in the Sanger database. In the remaining proteins, the observed differences were due to differences in annotation of transcriptional start sites. Conclusion: Our results indicate that, even in a less complex sample likely to represent only 10% of the bacterial proteome, we were still able to detect major differences between different gene annotation approaches. This gives hope that high-throughput proteomics techniques can be used to improve and validate gene annotations, and in particular for verification of high-throughput, automatic gene annotations.


Related Articles

  • Definition of novel cell envelope associated proteins in Triton X-114 extracts of Mycobacterium tuberculosis H37Rv. Målen, Hiwa; Pathak, Sharad; Søfteland, Tina; de Souza, Gustavo A.; Wiker, Harald G. // BMC Microbiology;2010, Vol. 10, p132 

    Background: Membrane- and membrane-associated proteins are important for the pathogenicity of bacteria. We have analysed the content of these proteins in virulent Mycobacterium tuberculosis H37Rv using Triton X-114 detergentphase separation for extraction of lipophilic proteins, followed by...

  • A metabolomics approach exploring the function of the ESX-3 type VII secretion system of M. smegmatis. Loots, Du; Meissner-Roloff, Reinart; Newton-Foot, Mae; Gey van Pittius, Nicolaas // Metabolomics;Jun2013, Vol. 9 Issue 3, p631 

    The genome of Mycobacterium, including Mycobacterium tuberculosis, contains five copies of a cluster of genes encoding a novel type VII secretion system, named the ESX gene cluster region. This ESX-3 gene cluster is essential for in vitro growth and is thought to play a role in iron and zinc...

  • customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Wang, Xiaojing; Zhang, Bing // Bioinformatics;Dec2013, Vol. 29 Issue 24, p3235 

    Summary: Database search is the most widely used approach for peptide and protein identification in mass spectrometry-based proteomics studies. Our previous study showed that sample-specific protein databases derived from RNA-Seq data can better approximate the real protein pools in the samples...

  • Blueprint for the white plague. Young, Douglas B. // Nature;6/11/1998, Vol. 393 Issue 6685, p515 

    Reports that the complete sequence of the genome of Mycobacterium tuberculosis has been accomplished. Research by Stewart Cole et al in this issue; Tuberculosis in human history; Prior research by Robert Koch; Ability of M. tuberculosis to cause disease; High level of sequence conservation in...

  • On the Impact of the Pangenome and Annotation Discrepancies While Building Protein Sequence Databases for Bacteria Proteogenomics. Machado, Karla C. T.; Fortuin, Suereta; Tomazella, Gisele Guicardi; Fonseca, Andre F.; Warren, Robin Mark; Wiker, Harald G.; de Souza, Sandro Jose; de Souza, Gustavo Antonio // Frontiers in Microbiology;6/20/2019, pN.PAG 

    In proteomics, peptide information within mass spectrometry (MS) data from a specific organism sample is routinely matched against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or genetically poorly characterized, it...

  • Function Prediction and Analysis of Mycobacterium tuberculosis Hypothetical Proteins. Mazandu, Gaston K.; Mulder, Nicola J. // International Journal of Molecular Sciences;Jun2012, Vol. 13 Issue 6, p7283 

    High-throughput biology technologies have yielded complete genome sequences and functional genomics data for several organisms, including crucial microbial pathogens of humans, animals and plants. However, up to 50% of genes within a genome are often labeled "unknown", "uncharacterized" or...

  • The PE16 (Rv1430) of Mycobacterium tuberculosis Is an Esterase Belonging to Serine Hydrolase Superfamily of Proteins. Sultana, Rafiya; Vemula, Mani Harika; Banerjee, Sharmishta; Guruprasad, Lalitha // PLoS ONE;Feb2013, Vol. 8 Issue 2, p1 

    The PE and PPE multigene families, first discovered during the sequencing of M. tuberculosis H37Rv genome are responsible for antigenic variation and have been shown to induce increased humoral and cell mediated immune response in the host. Using the bioinformatics tools, we had earlier reported...

  • Stabilization of the genome of the mismatch repair deficient Mycobacterium tuberculosis by context-dependent codon choice. Wanner, Roger M.; Güthlein, Carolin; Springer, Burkhard; Böttger, Erik C.; Ackermann, Martin // BMC Genomics;2008, Vol. 9, Special section p1 

    Background: The rate at which a stretch of DNA mutates is determined by the cellular systems for DNA replication and repair, and by the nucleotide sequence of the stretch itself. One sequence feature with a particularly strong influence on the mutation rate are nucleotide repeats. Some microbial...

  • Clonal Expansion of Both Modern and Ancient Genotypes of Mycobacterium tuberculosis in Southern Taiwan. Jia-Ru Chang; Yih-Yuan Chen; Tsi-Shu Huang; Wei-Feng Huang; Shu-Chen Kuo; Fan-Chen Tseng; Ih-Jen Su; Chien-Hsing Lin; Yao-Shen Chen; Jun-Ren Sun; Tzong-Shi Chiueh; Horng-Yunn Dou; Neyrolles, Olivier // PLoS ONE;Aug2012, Vol. 7 Issue 8, Special section p1 

    We present the first comprehensive analysis of Mycobacterium tuberculosis isolates circulating in the Kaohsiung region of southern Taiwan. The major spoligotypes found in the 224 isolates studied were Beijing lineages (n = 97; 43.3%), EAI lineages (n = 72; 32.1%) and Haarlem lineages (n = 18;...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics