Review of Common Sequence Alignment Methods: Clues to Enhance Reliability

Lambert, Christophe; Van Campenhout, Jean-Marc; DeBolle, Xavier; Depiereux, Eric
February 2003
Current Genomics;Feb2003, Vol. 4 Issue 2, p131
Academic Journal
Today, in various aspects of molecular biology, sequence alignment has become an essential tool to study the structure-function relationships of proteins. With the impressive increase of the number of available sequences, alignments provide a substantial piece of information by way of various computational methods. These approaches have generally become a crucial tool to put forward working hypotheses for time-consuming bench work, as protein engineering and site directed mutagenesis. However alignment methods remain hugely perfectible. All methods are dramatically limited in the twilight zone, taking place around 25% of identity between pairs of sequences. More worrying is the very high rate of false positive results generated by most algorithms, depending of empirical parameters, and hard to validate by statistical criteria. After reviewing the main methods, this paper draws user's attention to the fact that algorithm performance evaluations are entirely limited to alignment power (sensibility) evaluation. In reference to a given truth defined from alignment of know structures, the power is defined as the proportion of truth restored in the solution. The power may be overestimated by a lack of independent sets of poorly related sequences and its value depends entirely on the criterion used to define the truth. On the other hand, confidence (selectivity) represents the proportion of the solution that is true. Depending on the method and the parameters used, confidence may be much lower than power, and is usually never evaluated. For non-trivial alignments, when the power is high, confidence is low, which means that correctly aligned positions are embedded in large regions unduly aligned. One possible solution to these problems is to use consensus of several multiple alignment methods, which will increase the confidence of the results. The addition of external information, such as the prediction of the secondary structure and / or the prediction of solvent accessibility is also an other way that should increase the performance of existing multiple alignment methods.


Related Articles

  • Current approaches to whole genome phylogenetic analysis. Savva, George; Dicks, Jo; Roberts, lan N. // Briefings in Bioinformatics;Mar2003, Vol. 4 Issue 1, p63 

    It has long been known that evolutionary trees (phylogenies) can be estimated by comparing the DNA or protein sequences of homologous genes across different organisms. More recently, attempts have been made to estimate phylogenies by comparing entire genomes. These attempts have focused largely...

  • Protein structure modeling for structural genomics. Sánchez, Roberto; Pieper, Ursula; Melo, Francisco; Eswar, Narayanan; Martí-Renom, Marc A.; Madhusudhan, M.S.; Mirković, Nebojša; Šali, Andrej // Nature Structural Biology;Nov2000 Supplement, Vol. 7, p986 

    The shapes of most protein sequences will be modeled based on their similarity to experimentally determined protein structures. The current role, limitations, challenges and prospects for protein structure modeling (using information about genes and genomes) are discussed in the context of...

  • 100,000 protein structures for the biologist. Šali, Andrej // Nature Structural Biology;Dec98, Vol. 5 Issue 12, p1029 

    Structural genomics promises to deliver experimentally determined three-dimensional structures for many thousands of protein domains. These domains will be carefully selected, so that the methods of fold assignment and comparative protein structure modeling will result in useful models for most...

  • Target practice. Sali, Andrej // Nature Structural Biology;Jun2001, Vol. 8 Issue 6, p482 

    Focuses on a study which estimated the scope of structural genomics by stimulation of target selection strategies based on known protein sequence families. Description of structural genomics; Methodology of the study; Results and discussion.

  • Applications of InterPro in protein annotation and genome analysis. Biswas, Margaret; O'Rourke, John F.; Camon, Evelyn; Fraser, Gill; Kanapin, Alexander; Karavidopoulou, Youla; Kersey, Paul; Kriventseva, Evgenia; Mittard, Virginie; Mulder, Nicola; Phan, Isabelle; Servant, Florence; Apweiler, Rolf // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p285 

    The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as...

  • A novel knowledge-based approach to design inorganic-binding peptides. Ersin Emre Oren; Candan Tamerler; Deniz Sahin; Marketa Hnilova; Urartu Ozgur Safak Seker; Mehmet Sarikaya; Ram Samudrala // Bioinformatics;Nov2007, Vol. 23 Issue 21, p2816 

    Motivation: The discovery of solid-binding peptide sequences is accelerating along with their practical applications in biotechnology and materials sciences. A better understanding of the relationships between the peptide sequences and their binding affinities or specificities will enable...

  • ConFunc--functional annotation in the twilight zone. Mark N. Wass; Michael J. E. Sternberg // Bioinformatics;Mar2008, Vol. 24 Issue 6, p798 

    Motivation: The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function....

  • The Vast, Conserved Mammalian lincRNome. Managadze, David; Lobkovsky, Alexander E.; Wolf, Yuri I.; Shabalina, Svetlana A.; Rogozin, Igor B.; Koonin, Eugene V. // PLoS Computational Biology;Feb2013, Vol. 9 Issue 2, Special section p1 

    We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of...

  • Functional and Evolutionary Analysis of the Genome of an Obligate Fungal Symbiont. Vogel, Kevin J.; Moran, Nancy A. // Genome Biology & Evolution;May2013, Vol. 5 Issue 5, p891 

    Nutritional symbionts of insects include some of the most bizarre genomes studied to date, with extremely reduced size, biased base composition, and limited metabolic abilities. A monophyletic group of aphids within the subfamily Cerataphidinae have lost the bacterial symbiont common to all...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics