Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers

Peng Chen; Jinyan Li
January 2010
BMC Structural Biology;2010 Supplement 1, Vol. 10, Special section p1
Academic Journal
Background: Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. Results: In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Conclusions: Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.


Related Articles

  • GASH: An improved algorithm for maximizing the number of equivalent residues between two protein structures. Standley, Daron M.; Toh, Hiroyuki; Nakamura, Haruki // BMC Bioinformatics;2005, Vol. 6, p221 

    Background: We introduce GASH, a new, publicly accessible program for structural alignment and superposition. Alignments are scored by the Number of Equivalent Residues (NER), a quantitative measure of structural similarity that can be applied to any structural alignment method. Multiple...

  • An improved Multiple Sequence Alignment Genetic Algorithm. Jia Tie-jun; Zhao Xi-qiang // International Journal of Digital Content Technology & its Applic;Dec2011, Vol. 5 Issue 12, p436 

    In order to find an efficient multiple sequence alignment genetic algorithm, a new model based on rough set-LVQ neural networks and an improved algorithm and procedure are put forward by describing and analyzing multiple sequence alignment genetic algorithms, the structure characteristics of...

  • Genotype determination for polymorphisms in linkage disequilibrium. Zhaoxia Yu; Garner, Chad; Ziogas, Argyrios; Anton-Culver, Hoda; Schaid, Daniel J. // BMC Bioinformatics;2009, Vol. 10, Special section p1 

    Background: Genome-wide association studies with single nucleotide polymorphisms (SNPs) show great promise to identify genetic determinants of complex human traits. In current analyses, genotype calling and imputation of missing genotypes are usually considered as two separated tasks. The...

  • ISHAPE: new rapid and accurate software for haplotyping. Delaneau, Olivier; Coulonges, Cédric; Boelle, Pierre-Yves; Nelson, George; Spadoni, Jean-Louis; Zagury, Jean-François // BMC Bioinformatics;2007, Vol. 8, p205 

    Background: We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be...

  • 3D Protein structure prediction with genetic tabu search algorithm. Xiaolong Zhang; Ting Wang; Huiping Luo; Yang, Jack Y.; Youping Deng; Jinshan Tang; Mary Qu Yang // BMC Systems Biology;2010 Supplement 1, Vol. 4, p1 

    Background: Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design...

  • A novel hierarchical ensemble classifier for protein fold recognition. Xia Guo; Xieping Gao // PEDS: Protein Engineering, Design & Selection;Nov2008, Vol. 21 Issue 11, p659 

    The ensemble classifier plays a critical role in protein fold recognition. In this article, a novel hierarchical ensemble classifier named GAOEC (Genetic-Algorithm Optimized Ensemble Classifier) is presented and it can be constructed in the following steps. First, a novel optimized classifier...

  • An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences.  // International Journal of Computer Applications;Oct2010, Vol. 8, p1 

    The article presents a study that investigates selection models in evaluating the amyloidogenic regions of protein sequences in India. It examines the wrapper method performance along with embedded and filter models. Moreover, the novel integrated selection scheme on Genetic Algorithm (GA) and...

  • Software for Optimising Genetic Algorithem Engine Replacing Genetic Manipulations preformed by Organisms.  // Japanese Biotechnology & Medical Technology;Mar/Apr2002, Vol. 2 Issue 1, p7 

    Reports on the development of the software General-Purpose Genetic Algorithm Optimizing Engine which permits optimization of applications based on the genetic algorithm. Disadvantage of the genetic algorithm; Concept of the genetic algorithm; Establishment of the basic concept on the...

  • FREE SHAPE CONTEXT DESCRIPTORS OPTIMIZED WITH GENETIC ALGORITHM FOR THE DETECTION OF DEAD TREE TRUNKS IN ALS POINT CLOUDS. Polewski, P.; Yao, W.; Heurich, M.; Krzystek, P.; Stilla, U. // ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Informa;8/19/2015, Vol. 2 Issue 3-W5, p41 

    In this paper, a new family of shape descriptors called Free Shape Contexts (FSC) is introduced to generalize the existing 3D Shape Contexts. The FSC introduces more degrees of freedom than its predecessor by allowing the level of complexity to vary between its parts. Also, each part of the FSC...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics