An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures

Han, Guo Sheng; Yu, Zu Guo; Anh, Vo; Krishnajith, Anaththa P. D.; Tian, Yu-Chu
February 2013
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
Academic Journal
Background: Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings: A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions: It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.


Related Articles

  • Making automated multiple alignments of very large numbers of protein sequences. Sievers, Fabian; Dineen, David; Wilm, Andreas; Higgins, Desmond G. // Bioinformatics;Apr2013, Vol. 29 Issue 8, p989 

    Motivation: Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is...

  • ANARCI: antigen receptor numbering and receptor classification. Dunbar, James; Deane, Charlotte M. // Bioinformatics;1/15/2016, Vol. 32 Issue 2, p298 

    Motivation: Antibody amino-acid sequences can be numbered to identify equivalent positions. Such annotations are valuable for antibody sequence comparison, protein structure modelling and engineering. Multiple different numbering schemes exist, they vary in the nomenclature they use to annotate...

  • Protein Function Prediction using Text-based Features extracted from the Biomedical Literature: The CAFA Challenge. Wong, Andrew; Shatkay, Hagit // BMC Bioinformatics;2013, Vol. 14 Issue S3, p1 

    Background: Advances in sequencing technology over the past decade have resulted in an abundance of sequenced proteins whose function is yet unknown. As such, computational systems that can automatically predict and annotate protein function are in demand. Most computational systems use features...

  • Evaluating Long-Term Relationship of Protein Sequence by Use of D-Interval Conditional Probability and Its Impact on Protein Structural Class Prediction. Fei Gu; Hang Chen // Protein & Peptide Letters;Oct2009, Vol. 16 Issue 10, p1267 

    To fix the large and expanding gap between sequence known proteins and structure known proteins, it is important to study on protein structural class prediction (PSCP) for its foundation and usefulness in protein structure analysis. In this paper, the d-interval conditional probability index was...

  • SALIGN: a web server for alignment of multiple protein sequences and structures. Braberg, Hannes; Webb, Benjamin M.; Tjioe, Elina; Pieper, Ursula; Sali, Andrej; Madhusudhan, M.S. // Bioinformatics;8/1/2012, Vol. 28 Issue 15, p2072 

    Summary: Accurate alignment of protein sequences and/or structures is crucial for many biological analyses, including functional annotation of proteins, classifying protein sequences into families, and comparative protein structure modeling. Described here is a web interface to SALIGN, the...

  • A COMPARATIVE STUDY OF PROTEIN STRUCTURE VISUALIZATION TOOLS FOR VARIOUS DISPLAY CAPABILITIES. Ansari, Shaheda N.; Iliyas, Sayyed // Bioscience Discovery: An International Journal of Life Sciences;Jun2011, Vol. 2 Issue 2, p222 

    A molecular graphics visualization tool is required to view the structure that is encoded by atomic coordinate PDB files and to be able to manipulate the images to view the molecule from various perspectives. Without a proper tool, the PDB file will be read as a text file that lists each atom...

  • Comparative Modeling: The State of the Art and Protein Drug Target Structure Prediction. Tianyun Liu; Tang, Grace W.; Capriotti, Emidio // Combinatorial Chemistry & High Throughput Screening;Jul2011, Vol. 14 Issue 6, p532 

    No abstract available.

  • On the Relationship Between Catalytic Residues and their Protein Contact Number. Shao-Wei Huang; Sung-Huan Yu; Chien-Hua Shih; Huei-Wen Guan; Tsun-Tsao Huang; Jenn-Kang Hwang // Current Protein & Peptide Science;Sep2011, Vol. 12 Issue 6, p574 

    Due to advances in structural biology, an increasing number of protein structures of unknown function have been deposited in Protein Data Bank (PDB). These proteins are usually characterized by novel structures and sequences. Conventional comparative methodology (such as sequence alignment,...

  • Analysis of casein alpha S1 & S2 proteins from different mammalian species. Masoodi, Tariq Ahmad; Shafi, Gowhar // Bioinformation;2010, Vol. 4 Issue 9, p430 

    Nowadays, the quality of any food used for human consumption is, to a considerable extent, considered by its possible contribution to the maintenance or improvement of the consumer's health. In developed countries there is increasing interest in goat milk and its derivates, the quality of which...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics