Comparative mapping of sequence-based and structure-based protein domains

Ya Zhang; Chandonia, John-Marc; Chris6 Ding; Holbrook, Stephen R.
January 2005
BMC Bioinformatics;2005, Vol. 6, p77
Academic Journal
Background: Protein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question. Methods: Here we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based (SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied. Results: The mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.


Related Articles

  • DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. Subramanian, Amarendran R.; Weyer-Menkhoff, Jan; l Kaufmann, Michae; Morgenstern, Burkhard // BMC Bioinformatics;2005, Vol. 6, p66 

    Background: We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally...

  • Amino acid size, charge, hydropathy indices and matrices for protein structure analysis. Biro, J. C. // Theoretical Biology & Medical Modelling;2006, Vol. 3, p15 

    Background: Prediction of protein folding and specific interactions from only the sequence (ab initio) is a major challenge in bioinformatics. It is believed that such prediction will prove possible if Anfinsen's thermodynamic principle is correct for all kinds of proteins, and all the...

  • Evolution of NIN-Like Proteins inArabidopsis, Rice, andLotus japonicus. Schauser, Leif; Wieloch, Wioletta; Stougaard, Jens // Journal of Molecular Evolution;Feb2005, Vol. 60 Issue 2, p229 

    Genetic studies inLotusjaponicusand pea have identifiedNinas a core symbiotic gene required for establishing symbiosis between legumes and nitrogen fixing bacteria collectively called Rhizobium. Sequencing of additionalLotuscDNAs combined with analysis of genome sequences from Arabidopsis and...

  • Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. Tamura, Takeyuki; Akutsu, Tatsuya // BMC Bioinformatics;2007 Supplement 2, Vol. 8, p466 

    Background: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming...

  • Motivated Proteins: A web application for studying small three-dimensional protein motifs. Leader, David P.; Milner-White, E. James // BMC Bioinformatics;2009, Vol. 10, Special section p1 

    Background: Small loop-shaped motifs are common constituents of the three-dimensional structure of proteins. Typically they comprise between three and seven amino acid residues, and are defined by a combination of dihedral angles and hydrogen bonding partners. The most abundant of these are...

  • Columba: an integrated database of proteins, structures, and annotations. Trißl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf // BMC Bioinformatics;2005, Vol. 6, p81 

    Background: Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because...

  • SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. David, Fabrice P. A.; Yip, Yum L. // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help...

  • ProDom: Automated clustering of homologous domains. Servant, Florence; Bru, Catherine; Carrire, Sebastien; Courcelle, Emmanuel; Gouzy, Jerome; Peyruc, David; Kahn, Daniel // Briefings in Bioinformatics;Sep2002, Vol. 3 Issue 3, p246 

    The ProDom database is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. An associated database, ProDom-CG, has been derived as a restriction of ProDom to completely sequenced genomes. The ProDom construction method is based...

  • Detecting protein sequence conservation via metric embeddings. E. Halperin; J. Buhler; R. Karp; R. Krauthgamer; B. Westover // Bioinformatics;Jan2009 Supplement, Vol. 19, p122 

    Motivation: Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992)....


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics