An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life

Roche, Daniel Barry; Brüls, Thomas
October 2015
Scientific Reports;10/9/2015, p1
Academic Journal
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.


Related Articles

  • Are the same or different amino acid residues responsible for correct and incorrect protein folding? Galzitskaya, O. V. // Biochemistry (00062979);Feb2009, Vol. 74 Issue 2, p186 

    It has been shown for 20 proteins that amino acid residues included into the protein folding nucleus, determined experimentally, are often involved in the theoretically determined amyloidogenic fragments. For 18 proteins, Φ-values indicative of the extent of residue involvement into the...

  • PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Ke Chen; Lukasz Kurgan // Bioinformatics;Nov2007, Vol. 23 Issue 21, p2843 

    Motivation: The number of protein families has been estimated to be as small as 1000. Recent study shows that the growth in discovery of novel structures that are deposited into PDB and the related rate of increase of SCOP categories are slowing down. This indicates that the protein structure...

  • Emergence of Protein Fold Families through Rational Design. Feng Ding; Dokholyan, Nikolay V. // PLoS Computational Biology;Jul2006, Vol. 2 Issue 7, pe85 

    Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%-30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary...

  • In Silico Techniques Tell How the Protein Turns. Tolchin, Elizabeth // Genomics & Proteomics;Jul/Aug2005, Vol. 5 Issue 6, p21 

    Reports on the improvement of computational methods that can predict the structure of a protein from its amino acid sequence. Dependence of the biological function of a protein on the protein folding into the correct structure; Use of a computer program called Rosetta for modeling macromolecular...

  • Fragment-free approach to protein folding using conditional neural fields. Feng Zhao; Jian Peng; Jinbo Xu // Bioinformatics;Jun2010, Vol. 26 Issue 12, pi310 

    Motivation: One of the major bottlenecks with ab initio protein folding is an effective conformation sampling algorithm that can generate native-like conformations quickly. The popular fragment assembly method generates conformations by restricting the local conformations of a protein to short...

  • SeqRate: sequence-based protein folding type classification and rates prediction. Guan Ning Lin; Zheng Wang; Dong Xu; Jianlin Cheng // BMC Bioinformatics;2010 Supplement 3, Vol. 11, p1 

    Background: Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an...

  • Neural Network Pairwise Interaction Fields for Protein Model Quality Assessment and Ab Initio Protein Folding. Martin, Alberto J. M.; Mirabello, Claudio; Pollastri, Gianluca // Current Protein & Peptide Science;Sep2011, Vol. 12 Issue 6, p549 

    In order to use a predicted protein structure one needs to know how good it is, as the utility of a model depends on its quality. To this aim, many Model Quality Assessment Programs (MQAP) have been developed over the last decade, with MQAP also being assessed at the CASP competition. We present...

  • The induction of α-helical structure in partially unfolded HypF-N does not affect its aggregation propensity. Ahmad, B.; Vigliotta, I.; Tatini, F.; Campioni, S.; Mannini, B.; Winkelmann, J.; Tiribilli, B.; Chiti, F. // PEDS: Protein Engineering, Design & Selection;Jul2011, Vol. 24 Issue 7, p553 

    The conversion of proteins into structured fibrillar aggregates is a central problem in protein chemistry, biotechnology, biology and medicine. It is generally accepted that aggregation takes place from partially structured states of proteins. However, the role of the residual structure present...

  • Bioinformatics Approaches for Understanding and Predicting Protein Folding Rates. Gromiha, M. Michael; Selvaraj, S. // Current Bioinformatics;Jan2008, Vol. 3 Issue 1, p1 

    Understanding the relationship between amino acid sequences and protein folding rates is a challenging task similar to the protein folding problem. In this review, after a brief definition of protein folding rates, we describe various methods including contact order, long-range order, total...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics