Using Pseudo Amino Acid Composition to Predict Protein Attributes Via Cellular Automata and Other Approaches

Xuan Xiao; Kuo-Chen Chou
June 2011
Current Bioinformatics;Jun2011, Vol. 6 Issue 2, p251
Academic Journal
With the avalanche of protein sequences generated in the post-genomic age, many typical topics in bioinformatics, proteomics and system biology are relevant to identification of various attributes of uncharacterized proteins or need this kind of knowledge. Unfortunately, it is both time-consuming and costly to acquire the desired information by purely conducting biochemical experiments. Therefore, it is highly desirable to develop automated methods for fast and accurately identifying various attributes of proteins based on their sequences information alone. This is the convergence between bioinformatics and artificial intelligence techniques (AI). To establish powerful computational methods in this regard, one of the key procedures is to find an effective mathematical expression for the protein samples that can truly reflect their intrinsic correlation with the target to be predicted. To realize this, the pseudo amino acid (PseAA) composition or PseAAC was proposed. Stimulated by the concept of PseAAC, a series of different modes of PseAAC were developed to deal with proteins or proteins-related systems. The current review is mainly focused on those PseAAC modes that were formulated via cellular automata. By using some optimal space-time evolvement rules of cellular automata, a protein sequence can be represented by a unique image, the so-called cellular automata (CA) image or CAI. Many important features, which are deeply hidden in piles of long and complicated amino acid sequences, can be clearly revealed through their CAIs. It is anticipated that, owing to its impressive power, intuitiveness and relative simplicity, the CAI approach holds a great potential in bioinformatics and other related areas.


Related Articles

  • Amino Acid Sequence Database Suitable for the Protein and Proteome Analysis. Kawakami, Takao; Ozaki, Junko; Kondo, Kazuhiro; Sato, Shinji; Yunokawa, Harunobu // Current Proteomics;Dec2008, Vol. 5 Issue 4, p267 

    Amino acid sequence database is one of the essential components in the current proteomics with mass spectrometry. Protein identification routine as well as posttranslational modification analysis is based on correlation between the mass spectrometry data of peptides obtained from proteome and...

  • Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology. Kuo-Chen Chou // Current Proteomics;Dec2009, Vol. 6 Issue 4, p262 

    With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop automated methods for efficiently identifying various attributes of uncharacterized proteins. This is one of the most important tasks facing us today in bioinformatics, and the information...

  • A Functional Proteomic Approach to the Identification and Characterization of Protein Composition in Wheat Leaf. Jung-Feng Hsieh; Shui-Tein Chen // Current Proteomics;Dec2008, Vol. 5 Issue 4, p253 

    Proteomics and bioinformatics approach were applied for the analyzing of wheat leaf proteins' composition and function. Wheat proteins were precipitated by ammonium sulfate and analyzed by two-dimensional gel electrophoresis and mass spectrometry. A total of 200 wheat proteins were selected to...

  • Designing patterns for profile HMM search. Yanni Sun // Bioinformatics;Jan2007, Vol. 23 Issue 2, pe36 

    Motivation: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a...

  • Identification and Characterization of the Rat DVL2 Gene Using Bioinformatic Tools. Varişli, Lokman; Çen, Osman // Turkish Journal of Biology;2007, Vol. 31 Issue 2, p81 

    We identified and characterized the rat DVL2 gene using bioinformatics. In addition to the structure and chromosomal localization of the rat DVL2 gene, the transcribed and translated protein product of the gene was analyzed in silico. Results showed that the rat DVL2 gene consists of 15 exons...

  • NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data. Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten // PLoS ONE;2011, Vol. 6 Issue 11, p1 

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new ''omics''-based approaches towards the analysis of complex biological processes. However, the amount and...

  • Classification of protein sequences by means of irredundant patterns. Comin, Matteo; Verzotto, Davide // BMC Bioinformatics;2010 Supplement 1, Vol. 11, Special section p1 

    Background: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not...

  • Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space. Yaniv Loewenstein; Elon Portugaly; Menachem Fromer; Michal Linial // Bioinformatics;Jul2008, Vol. 24 Issue 13, pi41 

    Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets....

  • DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. Subramanian, Amarendran R.; Weyer-Menkhoff, Jan; l Kaufmann, Michae; Morgenstern, Burkhard // BMC Bioinformatics;2005, Vol. 6, p66 

    Background: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics