Specialized microbial databases for inductive exploration of microbial genome sequences

Gang Fang; Ho, Christine; Yaowu Qiu; Cubas, Virginie; Zhou Yu; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine
January 2005
BMC Genomics;2005, Vol. 6, p14
Academic Journal
Background: The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods: The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results: Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion: This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison.


Related Articles

  • Editorial: Focus on InterPro. Apweiler, Rolf // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p221 

    Introduces a series of articles on protein databases for similar sequences.

  • Swiss databank to start charging for use. Abbot, Alison // Nature;7/16/1998, Vol. 394 Issue 6690, p214 

    Reports that the Swiss-Prot protein sequence databank, one of the world's most widely used reference sources on proteins, is introducing a system of licensing for private companies. The licensing to ensure stable financing and help fund an expansion of activities.

  • Protein Subcellular Localization Feature of Essential/Nonessential Genes in 28 Prokaryotes. Liu Xiao; Geng Xiaoli; Tang Hongling // Applied Mechanics & Materials;2014, Issue 644-650, p5197 

    This study aimed to pursue the correlation between essential/nonessential gene and protein subcellular localization. The protein sequences of the essential/nonessential genes of 28 prokaryotes in Database of Essential Genes were analyzed by PSORTb3.0. Results show that proteins of essential...

  • ProDom: Automated clustering of homologous domains. Servant, Florence; Bru, Catherine; Carrire, Sebastien; Courcelle, Emmanuel; Gouzy, Jerome; Peyruc, David; Kahn, Daniel // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p246 

    The ProDom database is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. An associated database, ProDom-CG, has been derived as a restriction of ProDom to completely sequenced genomes. The ProDom construction method is based...

  • The PRINTS database: A resource for identification of protein families. Attwood, Terri K. // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p252 

    The PRINTS database houses a collection of protein fingerprints, which may be used to assign family and functional attributes to uncharacterised sequences, such as those currently emanating from the various genome-sequencing projects. The April 2002 release includes 1,700 family fingerprints,...

  • High-quality protein knowledge resource: SWISS-PROT and TrEMBL. O'Donovan, Claire; Martin, Maria Jesus; Gattiker, Alexandre; Gasteiger, Elisabeth; Bairoch, Amos; Apweiler, Rolf // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p275 

    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, posttranslational modifications, variants, etc.), a minimal level of redundancy and a high level of integration with...

  • Applications of InterPro in protein annotation and genome analysis. Biswas, Margaret; O'Rourke, John F.; Camon, Evelyn; Fraser, Gill; Kanapin, Alexander; Karavidopoulou, Youla; Kersey, Paul; Kriventseva, Evgenia; Mittard, Virginie; Mulder, Nicola; Phan, Isabelle; Servant, Florence; Apweiler, Rolf // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p285 

    The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as...

  • Detecting protein sequence conservation via metric embeddings. E. Halperin; J. Buhler; R. Karp; R. Krauthgamer; B. Westover // Bioinformatics;Jan2009 Supplement, Vol. 19, p122 

    Motivation: Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992)....

  • CHIKVPRO - a protein sequence annotation database for chikungunya virus. Mishra, Akaash Kumar; Jain, Chakresh Kumar; Agrawal, Apurva; Jain, Saransh; Jain, Kumar Sambhav; Dudha, Namrata; Kumar, Kapila; Sharma, Sanjeev K.; Gupta, Sanjay // Bioinformation;2010, Vol. 5 Issue 1, p4 

    In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics