TITLE

Homopolymer tract length dependent enrichments in functional regions of 27 eukaryotes and their novel dependence on the organism DNA (G+C)% composition

AUTHOR(S)
Yue Zhou; Bizzaro, Jeffrey W.; Marx, Kenneth A.
PUB. DATE
January 2004
SOURCE
BMC Genomics;2004, Vol. 5, p95
SOURCE TYPE
Academic Journal
DOC. TYPE
Article
ABSTRACT
Background: DNA homopolymer tracts, poly(dA).poly(dT) and poly(dG).poly(dC), are the simplest of simple sequence repeats. Homopolymer tracts have been systematically examined in the coding, intron and flanking regions of a limited number of eukaryotes. As the number of DNA sequences publicly available increases, the representation (over and under) of homopolymer tracts of different lengths in these regions of different genomes can be compared. Results: We carried out a survey of the extent of homopolymer tract over-representation (enrichment) and over-proportional length distribution (above expected length) primarily in the single gene documents, but including some whole chromosomes of 27 eukaryotics across the (G+C)% composition range from 20 - 60%. A total of 5.2 x 107 bases from 15,560 cleaned (redundancy removed) sequence documents were analyzed. Calculated frequencies of non-overlapping long homopolymer tracts were found overrepresented in non-coding sequences of eukaryotes. Long poly(dA).poly(dT) tracts demonstrated an exponential increase with tract length compared to predicted frequencies. A novel negative slope was observed for all eukaryotes between their (G+C)% composition and the threshold length N where poly(dA).poly(dT) tracts exhibited over-representation and a corresponding positive slope was observed for poly(dG).poly(dC) tracts. Tract size thresholds where over-representation of tracts in different eukaryotes began to occur was between 4 - 11 bp depending upon the organism (G+C)% composition. The higher the GC%, the lower the threshold N value was for poly(dA).poly(dT) tracts, meaning that the over-representation happens at relatively lower tract length in more GC-rich surrounding sequence. We also observed a novel relationship between the highest over-representations, as well as lengths of homopolymer tracts in excess of their random occurrence expected maximum lengths. Conclusions: We discuss how our novel tract over-representation observations can be accounted for by a few models. A likely model for poly(dA).poly(dT) tract over-representation involves the known insertion into genomes of DNA synthesized from retroviral mRNAs containing 3′ polyA tails. A proposed model that can account for a number of our observed results, concerns the origin of the isochore nature of eukaryotic genomes via a non-equilibrium GC% dependent mutation rate mechanism. Our data also suggest that tract lengthening via slip strand replication is not governed by a simple thermodynamic loop energy model.
ACCESSION #
28859385

 

Related Articles

  • In silico identification of novel selenoproteins in the Drosophila melanogaster genome. Casteilano, Sergi; Morozova, Nadya; Morey, Marta; Berry, Maria J.; Serras, Fiorenci; Corominas, Montserrat; Guigó, Roderic // EMBO Reports;Aug2001, Vol. 2 Issue 8, p697 

    In selenoproteins, incorporation of the amino acid selenocysteine is specified by the VGA codon, usually a stop signal. The alternative decoding of VGA is conferred by an mRNA structure, the SECIS element, located in the 3'-untranslated region of the selenoprotein mRNA. Because of the...

  • On mending the genome and mentoring the epigenome. Albertini, David // Journal of Assisted Reproduction & Genetics;Apr2011, Vol. 28 Issue 4, p285 

    No abstract available.

  • AUG_hairpin: prediction of a downstream secondary structure influencing the recognition of a translation start site. Kochetov, Alex V; Palyanov, Andrey; Titov, Igor I; Grigorovich, Dmitry; Sarai, Akinori; Kolchanov, Nikolay A // BMC Bioinformatics;2007, Vol. 8, p1 

    Background: The translation start site plays an important role in the control of translation efficiency of eukaryotic mRNAs. The recognition of the start AUG codon by eukaryotic ribosomes is considered to depend on its nucleotide context. However, the fraction of eukaryotic mRNAs with the start...

  • Full Genome Sequencing and Genetic Characterization of Eubenangee Viruses Identify Pata Virus as a Distinct Species within the Genus Orbivirus. Belaganahalli, Manjunatha N.; Maan, Sushila; Maan, Narender S.; Nomikou, Kyriaki; Pritchard, Ian; Lunt, Ross; Kirkland, Peter D.; Attoui, Houssam; Brownlie, Joe; Mertens, Peter P. C. // PLoS ONE;Mar2012, Vol. 7 Issue 3, p1 

    Eubenangee virus has previously been identified as the cause of Tammar sudden death syndrome (TSDS). Eubenangee virus (EUBV), Tilligery virus (TILV), Pata virus (PATAV) and Ngoupe virus (NGOV) are currently all classified within the Eubenangee virus species of the genus Orbivirus, family...

  • Modeling Inhomogeneous DNA Replication Kinetics. Gauthier, Michel G.; Norio, Paolo; Bechhoefer, John // PLoS ONE;Mar2012, Vol. 7 Issue 3, p1 

    In eukaryotic organisms, DNA replication is initiated at a series of chromosomal locations called origins, where replication forks are assembled proceeding bidirectionally to replicate the genome. The distribution and firing rate of these origins, in conjunction with the velocity at which forks...

  • Modeling Inhomogeneous DNA Replication Kinetics. Gauthier, Michel G.; Norio, Paolo; Bechhoefer, John // PLoS ONE;Mar2012, Vol. 7 Issue 3, p1 

    In eukaryotic organisms, DNA replication is initiated at a series of chromosomal locations called origins, where replication forks are assembled proceeding bidirectionally to replicate the genome. The distribution and firing rate of these origins, in conjunction with the velocity at which forks...

  • Pigs in sequence space: A 0.66X coverage pig genome survey based on shotgun sequencing. Wernersson, Rasmus; Schierup, Mikkel H; JØrgensen, Frank G; Gorodkin, Jan; Panitz, Frank; Stærfeldt, Hans-Henrik; Christensen, Ole F; Mailund, Thomas; HornshØj, Henrik; Klein, Ami; Jun Wang; Bin Liu; Hu, Songnian; Wei Dong; Wei Li; Wong, Gane KS; Yu, Jun; Wang, Jian; Bendixen, Christian; Fredholm, Merete // BMC Genomics;2005, Vol. 6, p70 

    Background: Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls. Results: We have generated ~3.84 million shotgun sequences...

  • Unpredictability of metabolism-the key role of metabolomics science in combination with next-generation genome sequencing. Weckwerth, Wolfram // Analytical & Bioanalytical Chemistry;Sep2011, Vol. 400 Issue 7, p1967 

    Next-generation sequencing provides technologies which sequence whole prokaryotic and eukaryotic genomes in days, perform genome-wide association studies, chromatin immunoprecipitation followed by sequencing and RNA sequencing for transcriptome studies. An exponentially growing volume of...

  • Genome size differentiates co-occurring populations of the planktonic diatom Ditylum brightwellii (Bacillariophyta). Koester, Julie A; Swalwell, Jarred E.; Von Dassow, Peter; Armbrust, E. Virginia // BMC Evolutionary Biology;2010, Vol. 10, p1 

    Background: Diatoms are one of the most species-rich groups of eukaryotic microbes known. Diatoms are also the only group of eukaryotic micro-algae with a diplontic life history, suggesting that the ancestral diatom switched to a life history dominated by a duplicated genome. A key mechanism of...

Share

Read the Article

Courtesy of THE LIBRARY OF VIRGINIA

Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics