Semantically linking and browsing PubMed abstracts with gene ontology

Vanteru, Bhanu C.; Shaik, Jahangheer S.; Yeasin, Mohammed
January 2008
BMC Genomics;2008 Supplement 1, Vol. 9, Special section p1
Academic Journal
Background: The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology. Results: The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics. Conclusions: The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.


Related Articles

  • Gene Ontology annotations: what they mean and where they come from. Hill, David P.; Smith, Barry; McAndrews-Hill, Monica S.; Blake, Judith A. // BMC Bioinformatics;2008 Supplement 5, Vol. 9, Special section p1 

    To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we...

  • Sampled delights. Chapman, Tim // Nature;2/6/2003, Vol. 421 Issue 6923, p665 

    Features an automated storage and retrieval system developed by the Charlottesville, Virginia-based biotechnology company, Biophile Co. Physical description; Specifications; Performance.

  • Automated Querying of Genome Databases. Schattner, Peter // PLoS Computational Biology;Jan2007, Vol. 3 Issue 1, p3 

    The article focuses on the identification of tools that facilitate automated, genome -database-querying and effective applications. The author described the role of genomic databases as a medium in integrating and analyzing data from multiple data from multiple biological databases. Some of the...

  • TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets. Schmieder, Robert; Yan Wei Lim; Rohwer, Forest; Edwards, Robert // BMC Bioinformatics;2010, Vol. 11, p341 

    Background: Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases....

  • Combining heterogeneous data sources for accurate functional annotation of proteins. Sokolov, Artem; Funk, Christopher; Graim, Kiley; Verspoor, Karin; Ben-Hur, Asa // BMC Bioinformatics;2013, Vol. 14 Issue S3, p1 

    Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension...

  • Model information system for educational administrators in Pakistan. Shafique, Farzana // Pakistan Journal of Library & Information Science;2012, Issue 13, p1 

    An abstract of the article related to the model information system for educational administrators in Pakistan by Farzana Shafique is presented.

  • Slim-Prim: an integrated data system for clinical and translational research. Viangteeravat, Teeradache; Brooks, Ian; Vuthipadadon, Somchan; Huang, Eunice; Smith, Ebony; Homayouni, Ramin; McDonald, Chanchai // BMC Bioinformatics;2009 Supplement 7, Vol. 10, p1 

    An abstract of a study related to an integrated data system Slim-Prim developed for the clinical and translational research, which was conducted by Teeradache Viangteeravat and colleagues, is presented.

  • SCAPEGOATING HUMANS, SCAPEGOATING TECHNOLOGIES: EXAMINING ANOTHER SIDE OF INFORMATION SYSTEM PROJECT CONTROL. Fulk, H. Kevin; Obyung Kwun; Alijani, Ghasem S. // Allied Academies International Conference: Proceedings of the Ac;Apr2012, Vol. 16 Issue 1, p55 

    A large number of information system (IS) projects are considered to be failures for not achieving the objectives set for them. One means to improve success of these projects is adequate control. While a growing body of studies have examined IS project control, researchers have given little...

  • Growing genome database.  // R&D Magazine;Oct2005, Vol. 47 Issue 10, p11 

    This article reports on the availability of the Integrated Microbial Genomes (IMG) version 1.2, a data management system by the U.S. Department of Energy's Joint Genome Institute in October 2005. The new version contains 270 additional public genomes and nine new JGI genomes (four finished, five...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics