Variable structure motifs for transcription factor binding sites

Reid, John E; Evans, Kenneth J; Dyer, Nigel; Wernisch, Lorenz; Ott, Sascha
January 2010
BMC Genomics;2010, Vol. 11, Special section p1
Academic Journal
Background: Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets. Results: We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance. Conclusions: We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.


Related Articles

  • CAAT Box.  // Encyclopedic Reference of Cancer;2001, p147 

    A definition of the term "CAAT box" is presented. It is part of a conserved sequence located upstream of eukaryotic transcription units and is recognized by a large group of transcription factors.

  • Genomic structure and cloning of two transcript isoforms of human Sp8. Milona, Maria-athina; Gough, Julie E.; Edgar, Alasdair J. // BMC Genomics;2004, Vol. 5, p86 

    Background: The Specificity proteins (Sp) are a family of transcription factors that have three highly conserved zinc-fingers located towards the carboxy-terminal that bind GC-boxes and assist in the initiation of gene transcription. Human Sp1-7 genes have been characterized. Recently, the...

  • What can we learn from noncoding regions of similarity between genomes? Down, Thomas A.; Hubbard, Tim J. P. // BMC Bioinformatics;2004, Vol. 5, p1 

    Background: In addition to known protein-coding genes, large amounts of apparently non-coding sequence are conserved between the human and mouse genomes. It seems reasonable to assume that these conserved regions are more likely to contain functional elements than less-conserved portions of the...

  • STUDY OF RICE TELOMERE BINDING PROTEIN 1 (RTBP1): AN IN SILICO APPROACH. Mukherjee, Koel; Kumar, Ashutosh; Pandey, Dev M.; Vidyarthi, Ambarish S. // International Journal of Pharmaceutical Sciences Review & Resear;Sep/Oct2011, Vol. 10 Issue 1, p193 

    Helix Turn Helix is one of the simple structural motifs that reside in the sequence specific DNA binding domain of transcription factors. The interaction of HTH motif with DNA regulates the gene expression. RTBP1 is a sequence specific transcription factor binding protein involved in telomere...

  • Cupin: A candidate molecular structure for the Nep 1-like protein family. Cechin, Adelmo L.; Sinigaglia, Marialva; Lemke, Ney; Echeverrigaray, Sérgio; Cabrera, Odalys G.; Pereira, Gonçalo A. G.; Mombach, José C. M. // BMC Plant Biology;2008, Vol. 8, Special section p1 

    Background: NEP1-like proteins (NLPs) are a novel family of microbial elicitors of plant necrosis. Some NLPs induce a hypersensitive-like response in dicot plants though the basis for this response remains unclear. In addition, the spatial structure and the role of these highly conserved...

  • W-ChIPeaks: a comprehensive web application tool for processing ChIP-chip and ChIP-seq data. Lan, Xun; Bonneville, Russell; Apostolos, Jeff; Wu, Wangcheng; Jin, Victor X // Bioinformatics;Feb2011, Vol. 27 Issue 3, p428 

    Summary: ChIP-based technology is becoming the leading technology to globally profile thousands of transcription factors and elucidate the transcriptional regulation mechanisms in living cells. It has evolved rapidly in recent years, from hybridization with spotted or tiling microarray...

  • Isolated and characterization of a cDNA encoding ethylene-responsive element binding protein (EREBP)/AP2-type protein, RCBF2, in Oryza sativa L. Liu, Jin-Ge; Zhang, Zhen; Qin, Qiu-Lin; Peng, Ri-He; Xiong, Ai-Sheng; Chen, Jian-Min; Xu, Fang; Zhu, Hong; Yao, Quan-Hong // Biotechnology Letters;Jan2007, Vol. 29 Issue 1, p165 

    A transcription factor RCBF2 which interacts with C-repeat/DRE was isolated from Oryza sativa L. by a yeast one-hybrid method. Analysis of the deduced RCBF2 amino acid sequence revealed that RCBF2 contained a conserved ethylene-responsive element binding protein (EREBP)/AP2 domain of 59 amino...

  • The ets-Related Transcription Factor GABP Directs Bidirectional Transcription. Collins, Patrick J.; Kobayashi, Yuya; Nguyen, Loan; Trinklein, Nathan D.; Myers, Richard M. // PLoS Genetics;Nov2007, Vol. 3 Issue 11, p2247 

    Approximately 10% of genes in the human genome are distributed such that their transcription start sites are located less than 1 kb apart on opposite strands. These divergent gene pairs have a single intergenic segment of DNA, which in some cases appears to share regulatory elements, but it is...

  • The Reconstruction of Condition-Specific Transcriptional Modules Provides New Insights in the Evolution of Yeast AP-1 Proteins. Goudot, Christel; Etchebest, Catherine; Devaux, Frédéric; Lelandais, Gaëlle // PLoS ONE;2011, Vol. 6 Issue 6, p1 

    AP-1 proteins are transcription factors (TFs) that belong to the basic leucine zipper family, one of the largest families of TFs in eukaryotic cells. Despite high homology between their DNA binding domains, these proteins are able to recognize diverse DNA motifs. In yeasts, these motifs are...


Read the Article

Courtesy of

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics