Hierarchical Classification of Protein Folds Using a Novel Ensemble Classifier

Lin, Chen; Zou, Ying; Qin, Ji; Liu, Xiangrong; Jiang, Yi; Ke, Caihuan; Zou, Quan
February 2013
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
Academic Journal
The analysis of biological information from protein sequences is important for the study of cellular functions and interactions, and protein fold recognition plays a key role in the prediction of protein structures. Unfortunately, the prediction of protein fold patterns is challenging due to the existence of compound protein structures. Here, we processed the latest release of the Structural Classification of Proteins (SCOP, version 1.75) database and exploited novel techniques to impressively increase the accuracy of protein fold classification. The techniques proposed in this paper include ensemble classifying and a hierarchical framework, in the first layer of which similar or redundant sequences were deleted in two manners; a set of base classifiers, fused by various selection strategies, divides the input into seven classes; in the second layer of which, an analogous ensemble method is adopted to predict all protein folds. To our knowledge, it is the first time all protein folds can be intelligently detected hierarchically. Compared with prior studies, our experimental results demonstrated the efficiency and effectiveness of our proposed method, which achieved a success rate of 74.21%, which is much higher than results obtained with previous methods (ranging from 45.6% to 70.5%). When applied to the second layer of classification, the prediction accuracy was in the range between 23.13% and 46.05%. This value, which may not be remarkably high, is scientifically admirable and encouraging as compared to the relatively low counts of proteins from most fold recognition programs. The web server Hierarchical Protein Fold Prediction (HPFP) is available at http://datamining.xmu.edu.cn/software/hpfp.


Related Articles

  • Concomitant prediction of function and fold at the domain level with GO-based profiles. Lopez, Daniel; Pazos, Florencio // BMC Bioinformatics;2013, Vol. 14 Issue S3, p1 

    Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the...

  • De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts. Kosciolek, Tomasz; Jones, David T. // PLoS ONE;Mar2014, Vol. 9 Issue 3, p1 

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm – FRAGFOLD, with...

  • Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge. Wong, Andrew; Shatkay, Hagit // BMC Bioinformatics;2013, Vol. 14 Issue S3, p1 

    Background: Advances in sequencing technology over the past decade have resulted in an abundance of sequenced proteins whose function is yet unknown. As such, computational systems that can automatically predict and annotate protein function are in demand. Most computational systems use features...

  • An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis. Mitra, Pralay; Shultis, David; Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang // PLoS Computational Biology;Oct2013, Vol. 9 Issue 10, p1 

    Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we...

  • Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Eric L. Peterson; Jané Kondev; Julie A. Theriot; Rob Phillips // Bioinformatics;Jun2009, Vol. 25 Issue 11, p1356 

    Motivation: Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar...

  • Exploration of the relationship between topology and designability of conformations. Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej // Journal of Chemical Physics;6/21/2011, Vol. 134 Issue 23, p235101 

    Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the...

  • General overview on structure prediction of twilight-zone proteins. Bee Yin Khor; Gee Jun Tye; Theam Soon Lim; Yee Siew Choong // Theoretical Biology & Medical Modelling;9/4/2015, Vol. 12 Issue 1, p1 

    Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined...

  • Origin and Evolution of Protein Fold Designs Inferred from Phylogenomic Analysis of CATH Domain Structures in Proteomes. Bukhari, Syed Abbas; Caetano-Anollés, Gustavo // PLoS Computational Biology;Mar2013, Vol. 9 Issue 3, p1 

    The spatial arrangements of secondary structures in proteins, irrespective of their connectivity, depict the overall shape and organization of protein domains. These features have been used in the CATH and SCOP classifications to hierarchically partition fold space and define the architectural...

  • Predicting Protein Folds with Fold-Specific PSSM Libraries. Yoojin Hong; Chintapalli, Sree Vamsee; Kyung Dae Ko; Bhardwaj, Gaurav; Zhenhai Zhang; Rossum, Damian van; Patterson, Randen L. // PLoS ONE;2011, Vol. 6 Issue 6, p1 

    Accurately assigning folds for divergent protein sequences is a major obstacle to structural studies. Herein, we outline an effective method for fold recognition using sets of PSSMs, each of which is constructed for different protein folds. Our analyses demonstrate that FSL (Fold-specific...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics