Prediction-based approaches to characterize bidirectional promoters in the mammalian genome

Qu Yang, Mary; Elnitski, Laura L.
January 2008
BMC Genomics;2008 Supplement 1, Vol. 9, Special section p1
Academic Journal
Background: Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA. Results: We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, nonbidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands. Conclusions: We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.


Related Articles

  • Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique. Bhardwaj, Nitin; Gerstein, Mark; Hui Lu // BMC Bioinformatics;2010 Supplement 1, Vol. 11, Special section p1 

    Background: In supervised learning, traditional approaches to building a classifier use two sets of examples with pre-defined classes along with a learning algorithm. The main limitation of this approach is that examples from both classes are required which might be infeasible in certain cases,...

  • Binning sequences using very sparse labels within a metagenome. Chon-Kit Kenneth Chan; Hsu, Arthur L.; Halgamuge, Saman K.; Sen-Lin Tang // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing...

  • Contents. Criminisi, Antonio; Shotton, Jamie; Konukoglu, Ender // Foundations & Trends in Computer Graphics & Vision;2011, Vol. 7 Issue 2/3, preceding p83 

    The table of contents for the publication "Decision Forests: A Unified Frameworkfor Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning" is presented.

  • Introduction to the special issue on image and video retrieval: theory and applications. Kompatsiaris, Ioannis; Marchand-Maillet, Stephane; Zwol, Roelof; Marcel, Sébastien // Multimedia Tools & Applications;Oct2011, Vol. 55 Issue 1, p1 

    An introduction is presented in which the editor discusses various reports within the issue on topics including the use of supervised learning, retagging videos and face recognition.

  • Editorial. Steinley, Douglas // Journal of Classification;Oct2015, Vol. 32 Issue 3, p357 

    An introduction is presented in which the editor discusses various reports within the issue on topics including the fractionally-supervised classification method for unsupervised and supervised learning, semi-definite programming (SDP) method for anlyzing data, and exact algorithm for clustering.

  • Supervised learning-based tagSNP selection for genome-wide disease classifications. Qingzhong Liu; Jack Yang; Zhongxue Chen; Qu Yang, Mary; Sung, Andrew H.; Xudong Huang // BMC Genomics;2008 Supplement 1, Vol. 9, Special section p1 

    Background: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is...

  • Predicting co-complexed protein pairs using genomic and proteomic data integration. Zhang, Lan V.; Wong, Sharyl L.; King, Oliver D.; Roth, Frederick P. // BMC Bioinformatics;2004, Vol. 5, p38 

    Background: Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass...

  • The rise and fall of supervised machine learning techniques. Jensen, Lars Juhl; Bateman, Alex // Bioinformatics;Dec2011, Vol. 27 Issue 24, p3331 

    In this article, the authors reflect on the growth and decline of supervised machine learning techniques. According to the authors, machine learning holds immense importance in the field of bioinformatics and biomedical sciences. They point out that there has been a rise in the popularity of...

  • Analysis of Perceptron-Based Active Learning. Dasgupta, Sanjoy; Kalai, Adam Tauman; Tauman, Adam // Journal of Machine Learning Research;2/1/2009, Vol. 10 Issue 2, p281 

    We start by showing that in an active learning setting, the Perceptron algorithm needs ω(ε½) labels to learn linear separators within generalization error ε. We then present a simple active learning algorithm for this problem, which combines a modification of the Perceptron update...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics