Probabilistic models and machine learning in structural bioinformatics

Hamelryck, Thomas
October 2009
Statistical Methods in Medical Research;Oct2009, Vol. 18 Issue 5, p505
Academic Journal
Structural bioinformatics is concerned with the molecular structure of biomacromolecules on a genomic scale, using computational methods. Classic problems in structural bioinformatics include the prediction of protein and RNA structure from sequence, the design of artificial proteins or enzymes, and the automated analysis and comparison of biomacromolecules in atomic detail. The determination of macromolecular structure from experimental data (for example coming from nuclear magnetic resonance, X-ray crystallography or small angle X-ray scattering) has close ties with the field of structural bioinformatics. Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis and experimental determination of macromolecular structure that are based on such methods. These developments include generative models of protein structure, the estimation of the parameters of energy functions that are used in structure prediction, the superposition of macromolecules and structure determination methods that are based on inference. Although this review is not exhaustive, I believe the selected topics give a good impression of the exciting new, probabilistic road the field of structural bioinformatics is taking.


Related Articles

  • Inferring clonal evolution of tumors from single nucleotide somatic mutations. Wei Jiao; Vembu, Shankar; Deshwar, Amit G.; Stein, Lincoln; Morris, Quaid // BMC Bioinformatics;2014, Vol. 15 Issue 1, p2 

    Background High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in...

  • A Bayesian nonparametric method for prediction in EST analysis. Lijoi, Antonio; Mena, Ramsés H.; Prünster, Igor // BMC Bioinformatics;2007 Supplement 2, Vol. 8, p339 

    Background: Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected...

  • A method for enhancement of short read sequencing alignment with Bayesian inference. Weixing Feng; Fengfei Song; Yansheng Dong; Bo He // Journal of Chemical & Pharmaceutical Research;2013, Vol. 5 Issue 11, p200 

    Next-generation short read sequencing is widely utilized in genome wide association study. However, as an indirect measurement technique, short read sequencing requires alignment step to map all sequencing reads to reference genome before acquiring interested genomic information. Facing to huge...

  • Strelka: accurate somatic small-variant calling from sequenced tumor–normal sample pairs. Saunders, Christopher T.; Wong, Wendy S. W.; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J.; Cheetham, R. Keira // Bioinformatics;Jul2012, Vol. 28 Issue 14, p1811 

    Motivation: Whole genome and exome sequencing of matched tumor–normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants...

  • A conditional random fields method for RNA sequence–structure relationship modeling and conformation sampling. Wang, Zhiyong; Xu, Jinbo // Bioinformatics;Jul2011, Vol. 27 Issue 13, pi102 

    Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native...

  • Improved similarity scores for comparing motifs. Tanaka, Emi; Bailey, Timothy; Grant, Charles E.; Noble, William Stafford; Keich, Uri // Bioinformatics;Jun2011, Vol. 27 Issue 12, p1603 

    Motivation: A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly...

  • ToPS: A Framework to Manipulate Probabilistic Models of Sequence Data. Kashiwabara, André Yoshiaki; Bonadio, Ígor; Onuchic, Vitor; Amado, Felipe; Mathias, Rafael; Durham, Alan Mitchell // PLoS Computational Biology;Oct2013, Vol. 9 Issue 10, p1 

    Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to...

  • Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Xiong, Hui Yuan; Barash, Yoseph; Frey, Brendan J. // Bioinformatics;Sep2011, Vol. 27 Issue 18, p2554 

    Motivation: Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a ‘splicing code’ that predicts how splicing is regulated in different cell types by...

  • A dynamic Bayesian Markov model for phasing and characterizing haplotypes in next-generation sequencing. Zhang, Yu // Bioinformatics;Apr2013, Vol. 29 Issue 7, p878 

    Motivation: Next-generation sequencing (NGS) technologies have enabled whole-genome discovery and analysis of genetic variants in many species of interest. Individuals are often sequenced at low coverage for detecting novel variants, phasing haplotypes and inferring population structures....


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics