On the Relevance of Sophisticated Structural Annotations for Disulfide Connectivity Pattern Prediction

Becker, Julien; Maes, Francis; Wehenkel, Louis
February 2013
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
Academic Journal
Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of on the benchmark dataset SPX, which corresponds to improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.


Related Articles

  • Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns. Skwark, Marcin J.; Raimondi, Daniele; Michel, Mirco; Elofsson, Arne // PLoS Computational Biology;Nov2014, Vol. 10 Issue 11, p1 

    Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a...

  • Protein Complex Identification by Integrating Protein-Protein Interaction Evidence from Multiple Sources. Xu, Bo; Lin, Hongfei; Chen, Yang; Yang, Zhihao; Liu, Hongfang // PLoS ONE;Dec2013, Vol. 8 Issue 12, p1 

    Background: Understanding protein complexes is important for understanding the science of cellular organization and function. Many computational methods have been developed to identify protein complexes from experimentally obtained protein-protein interaction (PPI) networks. However, interaction...

  • Progressive Clustering Based Method for Protein Function Prediction. Saini, Ashish; Hou, Jingyu // Bulletin of Mathematical Biology;Feb2013, Vol. 75 Issue 2, p331 

    In recent years, significant effort has been given to predicting protein functions from protein interaction data generated from high throughput techniques. However, predicting protein functions correctly and reliably still remains a challenge. Recently, many computational methods have been...

  • Identification of functional hubs and modules by converting interactome networks into hierarchical ordering of proteins. Young-Rae Cho; Aidong Zhang // BMC Bioinformatics;2010 Supplement 3, Vol. 11, p1 

    Background: Protein-protein interactions play a key role in biological processes of proteins within a cell. Recent high-throughput techniques have generated protein-protein interaction data in a genome-scale. A wide range of computational approaches have been applied to interactome network...

  • A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs). Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong // PLoS ONE;Sep2014, Vol. 9 Issue 9, p1 

    Background: Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However,...

  • Protein Complex Detection via Weighted Ensemble Clustering Based on Bayesian Nonnegative Matrix Factorization Ou-Yang, Le; Dai, Dao-Qing; Zhang, Xiao-Fei // PLoS ONE;May2013, Vol. 8 Issue 5, p1 

    :Detecting protein complexes from protein-protein interaction (PPI) networks is a challenging task in computational biology. A vast number of computational methods have been proposed to undertake this task. However, each computational method is developed to capture one aspect of the network. The...

  • Using anchoring motifs for the computational design of protein-protein interactions. Jacobs, Timothy M.; Kuhlman, Brian // Biochemical Society Transactions;Oct2013, Vol. 41 Issue 5, p1141 

    The computer-based design of PPIs (protein-protein interactions) is a challenging problem because large desolvation and entropic penalties must be overcome by the creation of favourable hydrophobic and polar contacts at the target interface. Indeed, many computationally designed interactions...

  • Automatic selection of reference taxa for protein–protein interaction prediction with phylogenetic profiling. Simonsen, Martin; Maetschke, Stefan R.; Ragan, Mark A. // Bioinformatics;Mar2012, Vol. 28 Issue 6, p851 

    Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein–protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa (RT) is critical for accurate prediction, but with more than 2500 fully sequenced taxa...

  • Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. Min Li; Jian-er Chen; Jian-xin Wang; Bin Hu; Gang Chen // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics