Extracting and connecting chemical structures from text sources using chemicalize.org

Southan, Christopher; Stracz, Andras
June 2013
Journal of Cheminformatics;2013, Vol. 5 Issue 1, p1
Academic Journal
Background: Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Results: Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. Conclusion: This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and databases, including structure searches in PubChem, InChIKey searches in Google and the chemicalize.org archive. It has the flexibility to extract text from any internal, external or Web source. It synergizes with other open tools and the application is undergoing continued development. It should thus facilitate progress in medicinal chemistry, chemical biology and other bioactive chemistry domains


Related Articles

  • WebChem Viewer: a tool for the easy dissemination of chemical and structural data sets. Durrant, Jacob D.; Amaro, Rommie E. // BMC Bioinformatics;2014, Vol. 15 Issue 1, p1 

    Background Sharing sets of chemical data (e.g., chemical properties, docking scores, etc.) among collaborators with diverse skill sets is a common task in computer-aided drug design and medicinal chemistry. The ability to associate this data with images of the relevant molecular structures...

  • Deriving Conceptual Schema from Domain Ontology: A Web Application Reverse Engineering Approach. Benslimane, Sidi; Malki, Mimoun; Bouchiha, Djelloul // International Arab Journal of Information Technology (IAJIT);Apr2010, Vol. 7 Issue 2, p167 

    The heterogeneous and dynamic nature of components making up a web application, the lack of effective programming mechanisms for implementing basic software engineering principles in it, and undisciplined development processes induced by the high pressure of a very short time-to-market, make web...

  • A special breed. Stapleton, Jennifer // Computer Bulletin;Mar2001, Vol. 43 Issue 2, p4 

    BCS specialist groups can look forward to a fulfilling future as the Web strategy unfolds

  • ONTOCS: A WEB-BASED SYSTEM FOR COLLABORATIVE ONTOLOGY CONSTRUCTION. Dosam HWANG; In Keun LEE; JUNG, Jason J. // Computing & Informatics;2009, Vol. 28 Issue 6, p781 

    A number of studies on ontology editing tools and ontology-based applications have been proposed for automatically processing knowledge and information. However, the existing methodologies and tools for dealing with ontologies have assumed that the system is restricted to a single user. Main...

  • An Efficient Algorithm for Frequent Pattern Mining using Web Analysis Approach. Verma, Monika; Pandey, Shikha // International Journal of Computer Science Engineering & Technolo;Jul2012, Vol. 2 Issue 7, p1327 

    In this paper a complete structure with new modified algorithm for mining and finding web usage patterns from a Web Application is presented. The Web Application can be a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and...

  • A METHOD OF DETECTING SQL INJECTION ATTACK TO SECURE WEB APPLICATIONS. Manmadhan, Sruthy; T., Manesh // International Journal of Distributed & Parallel Systems;Nov2012, Vol. 3 Issue 6, p1 

    Web applications are becoming an important part of our daily life. So attacks against them also increases rapidly. Of these attacks, a major role is held by SQL injection attacks (SQLIA). This paper proposes a new method for preventing SQL injection attacks in JSP web applications. The basic...

  • Rubabel: wrapping open Babel with Ruby. Smith, Rob; Williamson, Ryan; Ventura, Dan; Prince, John T. // Journal of Cheminformatics;2013, Vol. 5 Issue 1, p1 

    Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating,...

  • Verizon's Annual Breach Report Finds 92 Percent Of Attacks Follow Nine Basic Patterns. Gormisky, Liz // Defense Daily;4/23/2014, p3 

    The article focuses on the findings of American broadband and telecommunications company Verizon Communications in its 2014 Data Breach Investigations Report (DBIR) that 92 percent of data breaches or attacks follow nine basic patterns. Topics include the compilation of the DBIR as a partnership...

  • Model-driven web engineering methods: a literature review. Hincapié Londoño, Jesús Andrés; Duitama, John Freddy // Revista Facultad de Ingenieria Universidad de Antioquia;jun2012, Issue 63, p69 

    This paper presents some of the model-driven Web engineering methods that have been proposed, and discusses and analyzes the advantages and disadvantages of such methods regarding current tendencies and best practices on model-driven engineering. The idea is to present each approach and analyze...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics