Classification of protein sequences by means of irredundant patterns

Comin, Matteo; Verzotto, Davide
January 2010
BMC Bioinformatics;2010 Supplement 1, Vol. 11, Special section p1
Academic Journal
Background: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent," and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. Results: In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. Conclusion: Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences.


Related Articles

  • DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. Subramanian, Amarendran R.; Weyer-Menkhoff, Jan; l Kaufmann, Michae; Morgenstern, Burkhard // BMC Bioinformatics;2005, Vol. 6, p66 

    Background: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally...

  • iDBPs: a web server for the identification of DNA binding proteins. Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir // Bioinformatics;Mar2010, Vol. 26 Issue 5, p692 

    Summary: The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good...

  • FPGA accelerator for protein secondary structure prediction based on the GOR algorithm. Fei Xia; Yong Dou; Guoqing Lei; Yusong Tan // BMC Bioinformatics;2011 Supplement 1, Vol. 12 Issue Suppl 1, p1 

    Background: Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most...

  • Computational method for predicting protein functional sites with the use of specificity determinants. Kalinina, O. V.; Russell, R. B.; Rakhmaninova, A. B.; Gelfand, M. S. // Molecular Biology;Jan2007, Vol. 41 Issue 1, p137 

    The currently available body of decoded amino acid sequences of various proteins exceeds manifold the experimental capabilities of their functional annotation. Therefore, in silico annotation using bioinformatics methods becomes increasingly important. Such annotation is actually a prediction;...

  • Accessibility and partner number of protein residues, their relationship and a webserver, ContPlot for their display. Pal, Arumay; Bahadur, Ranjit Prasad; Ray, Partha Sarathi; Chakrabarti, Pinak // BMC Bioinformatics;2009, Vol. 10, Special section p1 

    Background: Depending on chemical features residues have preferred locations -- interior or exterior -- in protein structures, which also determine how many other residues are found around them. The close packing of residues is the hallmark of protein interior and protein-protein interaction...

  • A Novel Markov Pairwise Protein Sequence Alignment Method for Sequence 665 Comparison. Xing-Ming Zhao; Yiu-Ming Cheung; De-Shuang Huang // Protein & Peptide Letters;Oct2005, Vol. 12 Issue 7, p665 

    The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit...

  • SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. Cao, Renzhi; Zheng Wang; Yiheng Wang; Jianlin Cheng // BMC Bioinformatics;2014, Vol. 15 Issue 1, p1 

    Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein...

  • Loose and Strict Repeats in Weighted Sequences of Proteins. Hui Zhang; Qing Guo; Jing Fan; Iliopoulos, Costas S. // Protein & Peptide Letters;Sep2010, Vol. 17 Issue 9, p1136 

    A weighted sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. Weighted sequences are able to summarize poorly defined short sequences, as well as the profiles of protein families and complete chromosome sequences. Thus it is...

  • Making automated multiple alignments of very large numbers of protein sequences. Sievers, Fabian; Dineen, David; Wilm, Andreas; Higgins, Desmond G. // Bioinformatics;Apr2013, Vol. 29 Issue 8, p989 

    Motivation: Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics