Bayesian biclustering of gene expression data

Jiajun Gu; Liu, Jun S.
January 2008
BMC Genomics;2008 Supplement 1, Vol. 9, Special section p1
Academic Journal
Background: Biclustering of gene expression data searches for local patterns of gene expression. A bicluster (or a two-way cluster) is defined as a set of genes whose expression profiles are mutually similar within a subset of experimental conditions/samples. Although several biclustering algorithms have been studied, few are based on rigorous statistical models. Results: We developed a Bayesian biclustering model (BBC), and implemented a Gibbs sampling procedure for its statistical inference. We showed that Bayesian biclustering model can correctly identify multiple clusters of gene expression data. Using simulated data both from the model and with realistic characters, we demonstrated the BBC algorithm outperforms other methods in both robustness and accuracy. We also showed that the model is stable for two normalization methods, the interquartile range normalization and the smallest quartile range normalization. Applying the BBC algorithm to the yeast expression data, we observed that majority of the biclusters we found are supported by significant biological evidences, such as enrichments of gene functions and transcription factor binding sites in the corresponding promoter sequences. Conclusions: The BBC algorithm is shown to be a robust model-based biclustering method that can discover biologically significant gene-condition clusters in microarray data. The BBC model can easily handle missing data via Monte Carlo imputation and has the potential to be extended to integrated study of gene transcription networks.


Related Articles

  • A Graphical Modelling Approach to the Dissection of Highly Correlated Transcription Factor Binding Site Profiles. Stojnic, Robert; Fu, Audrey Qiuyan; Adryan, Boris // PLoS Computational Biology;Nov2012, Vol. 8 Issue 11, p1 

    Inferring the combinatorial regulatory code of transcription factors (TFs) from genome-wide TF binding profiles is challenging. A major reason is that TF binding profiles significantly overlap and are therefore highly correlated. Clustered occurrence of multiple TFs at genomic sites may arise...

  • A Bayesian Search for Transcriptional Motifs. Miller, Andrew K.; Print, Cristin G.; Nielsen, Poul M. F.; Crampin, Edmund J. // PLoS ONE;2010, Vol. 5 Issue 11, p1 

    Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs....

  • Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. Levitsky, Victor G.; Ignatieva, Elena V.; Ananko, Elena A.; Turnaev, Igor I.; Merkulova, Tatyana I.; Kolchanov, Nikolay A.; Hodgman, T. C. // BMC Bioinformatics;2007 Supplement 2, Vol. 8, p481 

    Background: Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high falsepositive rates that occur when only sequence conservation at the...

  • Potential binding sites for SF-1: Recognition by the SiteGA method, experimental verification, and search for new target genes. Klimova, N. V.; Levitsky, V. G.; Ignatieva, E. V.; Vasiliev, G. V.; Kobzev, V. F.; Busygina, T. V.; Merkulov, T. I.; Kolchanov, N. A. // Molecular Biology;May2006, Vol. 40 Issue 3, p454 

    The transcription factor SF-1 (steroidogenic factor 1) regulates the expression of the steroidogenesis genes, coordinates the development and function of the hypothalamic-pituitary-gonadal and adrenal systems, and plays an important role in the development and function of the reproductive...

  • Discovery of putative transcription factor binding sites from microarray-based gene expression profiles. Vilo, Jaak; Brazma, Alvis; Jonassen, Inge; Ukkonen, Esko // Nature Genetics;Nov99 Supplement, Vol. 23, p79 

    Presents an abstract for the article on the discovery of putative transcription factor binding sites from microarray-based gene expression profiles.

  • MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences. Arnold, Phil; Erb, Ionas; Pachkov, Mikhail; Molina, Nacho; van Nimwegen, Erik // Bioinformatics;Feb2012, Vol. 28 Issue 4, p487 

    Motivation: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the...

  • Tissue-specific regulatory elements in mammalian promoters. Smith, Andrew D.; Sumazin, Pavel; Zhang, Michael Q. // Molecular Systems Biology;2007, Vol. 3 Issue 1, p73 

    Transcription factor-binding sites and the cis-regulatory modules they compose are central determinants of gene expression. We previously showed that binding site motifs and modules in proximal promoters can be used to predict a significant portion of mammalian tissue-specific transcription....

  • In silico representation and discovery of transcription factor binding sites. Pavesi, Giulio; Mauri, Giancarlo; Pesole, Graziano // Briefings in Bioinformatics;Sep2004, Vol. 5 Issue 3, p217 

    Understanding the complex mechanisms governing basic biological processes requires the characterisation of regulatory motifs modulating gene expression at transcriptional and post-transcriptional level. In particular, extent, chronology and cell-specificity of transcription are modulated by the...

  • Integrating transcription factor binding site information with gene expression datasets. Ian B. Jeffery; Stephen F. Madden; Paul A. McGettigan; Guy Perrière; Aedín C. Culhane; Desmond G. Higgins // Bioinformatics;Feb2007, Vol. 23 Issue 3, p298 

    Motivation: Microarrays are widely used to measure gene expression differences between sets of biological samples. Many of these differences will be due to differences in the activities of transcription factors. In principle, these differences can be detected by associating motifs in promoters...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics