Nh3D: A reference dataset of non-homologous protein structures

Thiruv, B.; Quon, G.; Saldanha, S. A.; Steipe, B.
January 2005
BMC Structural Biology;2005, Vol. 5, p12
Academic Journal
Background: The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here we provide a reference dataset of non-homologous protein domains, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry. The dataset is based on domains at the Topology level of the CATH database which hierarchically classifies all protein structures. It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. Results: The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. We observe that even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. Conclusion: Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. Regularly updated versions of Nh3D and the corresponding PDB-formatted coordinate sets are accessible from our Web site http:// www.schematikon.org.


Related Articles

  • Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology. Hill, Steven M.; Neve, Richard M.; Bayani, Nora; Kuo, Wen-Lin; Ziyad, Safiyyah; Spellman, Paul T.; Gray, Joe W.; Mukherjee, Sach // BMC Bioinformatics;2012, Vol. 13 Issue 1, p94 

    Background: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to...

  • Biochemical characteristic along UBAF in a one-stage autotrophic nitrogen removal reactor. Tao Liu; Dong Li; Jie Zhang // Water Science & Technology;2016, Vol. 74 Issue 11, p2656 

    The Up-flow biological aerated filter (UBAF) based on a one-stage autotrophic nitrogen removal process has been widely investigated nowadays. In this work, the biochemical characteristic along the volcanic-filled UBAF reactor had been studied. The results indicate that short-rod, spherical and...

  • Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. Martin, Juliette; LetellieR, Guillaume; Marin, Antoine; Taly, Jean-François; de Brevern, Alexandre G.; Gibrat, Jean-François // BMC Structural Biology;2005, Vol. 5, p17 

    Background: A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and...

  • An Extensive Targeted Proteomic Analysis of Disease-Related Protein Biomarkers in Urine from Healthy Donors Nolen, Brian M.; Orlichenko, Lidiya S.; Marrangoni, Adele; Velikokhatnaya, Liudmila; Prosser, Denise; Grizzle, William E.; Ho, Kevin; Jenkins, Frank J.; Bovbjerg, Dana H.; Lokshin, Anna E. // PLoS ONE;May2013, Vol. 8 Issue 5, p1 

    The analysis of protein biomarkers in urine is expected to lead to advances in a variety of clinical settings. Several characteristics of urine including a low-protein matrix, ease of testing and a demonstrated proteomic stability offer distinct advantages over current widely used biofluids,...

  • The PAXgene® Tissue System Preserves Phosphoproteins in Human Tissue Specimens and Enables Comprehensive Protein Biomarker Research. Gündisch, Sibylle; Schott, Christina; Wolff, Claudia; Tran, Kai; Beese, Christian; Viertler, Christian; Zatloukal, Kurt; Becker, Karl-Friedrich // PLoS ONE;Mar2013, Vol. 8 Issue 3, p1 

    Precise quantitation of protein biomarkers in clinical tissue specimens is a prerequisite for accurate and effective diagnosis, prognosis, and personalized medicine. Although progress is being made, protein analysis from formalin-fixed and paraffin-embedded tissues is still challenging. In...

  • Associations between Keloid Severity and Single-Nucleotide Polymorphisms: Importance of rs8032158 as a Biomarker of Keloid Severity. Ogawa, Rei; Watanabe, Atsushi; Naing, Banyar Than; Sasaki, Motoko; Fujita, Atsushi; Akaishi, Satoshi; Hyakusoku, Hiko; Shimada, Takashi // Journal of Investigative Dermatology;Sep2014, Vol. 134 Issue 9, p2475 

    A correction to an article on biomarker of keloid severity published in a previous issue of the journal is presented.

  • LYMPHOCYTES DNA CONTENT, P53, C-MYC AND BCL-2 AS PREDICTIVE MARKERS IN CHILDHOOD WITH ACUTE LYMPHOBLASTIC LEUKEMIA. Settin, Ahmed A.; Attallah, Abdelfattah M.; Abo-Sekina3, Morsy M.; Ali, Ehab Mostafa M.; Gawish, Gehan El-Hussiney // Egyptian Journal of Biochemistry & Molecular Biology;Dec2007, Vol. 25 Issue 2, p192 

    Cell cycle parameters as well as apoptotic and tumor markers directly control cell growth. DNA ploidy and S phase fraction, apoptosis fraction in addition to apoptotic inducer (p53, c-myc) and antiapoptotic marker (Bcl-2) were investigated in childhood with acute lymphoblastic leukemia (ALL)...

  • Metabolomics Workflows: Combining Untargeted Discovery-Based and Targeted Confirmation Approaches for Mining Metabolomics Data. Sana, Theodore; Fischer, Steve; Tichy, Shane E. // Spectroscopy;Mar2011 Supplement, p12 

    The article presents a study which utilizes a global metabolics workflow strategy to solve the complexity of data by the enormous chemical diversity of metabolites. The study uses the combination of untargeted quadruple time-of-flight (Q-TOF) liquid chromatography-mass spectrometry (LC-MS),...

  • ANDES: Statistical tools for the ANalyses of DEep Sequencing. Li, Kelvin; Venter, Eli; Yooseph, Shibu; Stockwell, Timothy B.; Eckerle, Lance D.; Denison, Mark R.; Spiro, David J.; Methé, Barbara A. // BMC Research Notes;2010, Vol. 3, p199 

    Background: The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics