Marked variation in predicted and observed variability of tandem repeat loci across the human genome

O'Dushlaine, Colm T.; Shields, Denis C.
January 2008
BMC Genomics;2008, Vol. 9, Special section p1
Academic Journal
Background: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2-12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. Results: We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (ρ = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). Conclusion: Variability among 2-12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y -- likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter -- and excesses in chromosomes 6, 13, 20 and 21.


Related Articles

  • Genetic Databases and their Potential in Pharmacogenomics. Lagoumintzis, George; Poulas, Konstantinos; Patrinos, George P. // Current Pharmaceutical Design;7/1/2010, Vol. 16 Issue 20, p2224 

    No abstract available.

  • Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation. Flannick, Jason; Korn, Joshua M.; Fontanillas, Pierre; Grant, George B.; Banks, Eric; Depristo, Mark A.; Altshuler, David // PLoS Computational Biology;Jul2012, Vol. 8 Issue 7, p1 

    High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize...

  • Diversidad genética en la población mexicana: Utilización de marcadores de ADN. Guardado-Estrada, Mariano; Queipo, Gloria; Meraz-Ríos, Marco; Berumen-Campos, Jaime // Revista Medica del Hospital General de Mexico;jul-sep2008, Vol. 71 Issue 3, p162 

    At the beginning of the 21st begins what it is known as the genomic era with the complete sequentiation of the human genome. Obtaining sequences from individuals all around the world could precise define the human variation among individuals. The knowledge of this genome variation will help to...

  • Natural Genetic Variation Caused by Transposable Elements in Humans. Bennett, E. Andrew; Coleman, Laura E.; Tsui, Circe; Pittard, W. Stephen; Devine, Scott E. // Genetics;Oct2004, Vol. 168 Issue 2, p933 

    Transposons and transposon-like repetitive elements collectively occupy 44% of the human genome sequence. In an effort to measure the levels of genetic variation that are caused by human transposons, we have developed a new method to broadly detect transposon insertion polymorphisms of all kinds...

  • Identification of 197 genetic variations in six human methyltransferase genes in the Japanese population. Saito, S.; Iida, A.; Sekine, A.; Miura, Y.; Sakamoto, T.; Ogawa, C.; Kawauchi, S.; Higuchi, S.; Nakamura, Y. // Journal of Human Genetics;2001, Vol. 46 Issue 9, p529 

    Methylation is an important event in the biotransformation pathway for many drugs and xenobiotic compounds. We screened DNA from 48 Japanese individuals for single-nucleotide polymorphisms (SNPs) in six methyltransferase (MT) genes (catechol-O-MT, COMT; guanidinoacetate N-MT, GAMT; histamine...

  • Distribution and Effects of Nonsense Polymorphisms in Human Genes. Yamaguchi-Kabata, Yumi; Shimada, Makoto K.; Hayakawa, Yosuke; Minoshima, Shinsei; Chakraborty, Ranajit; Gojobori, Takashi; Imanishi, Tadashi // PLoS ONE;2008, Vol. 3 Issue 10, p1 

    Background: A great amount of data has been accumulated on genetic variations in the human genome, but we still do not know much about how the genetic variations affect gene function. In particular, little is known about the distribution of nonsense polymorphisms in human genes despite their...

  • GeneTalk: an expert exchange platform for assessing rare sequence variants in personal genomes. Kamphans, Tom; Krawitz, Peter M. // Bioinformatics;Oct2012, Vol. 28 Issue 19, p2515 

    Summary: Next-generation sequencing has become a powerful tool in personalized medicine. Exomes or even whole genomes of patients suffering from rare diseases are screened for sequence variants. After filtering out common polymorphisms, the assessment and interpretation of detected personal...

  • Genetic diversity in the block 2 region of the merozoite surface protein-1 of Plasmodium falciparum in central India.  // Malaria Journal;2012, Vol. 11 Issue 1, p78 

    The article presents the findings of a research on the molecular characterization of block two region of merozoite surface protein-1 (MSP-1) gene from the tribal-dominated, forested region of Madhya Pradesh, India. It discusses the methodology of the research. It concludes an extensive genetic...

  • Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts. Hakenberg, Jörg; Wei-Yi Cheng; Thomas, Philippe; Ying-Chih Wang; Uzilov, Andrew V.; Rong Chen // BMC Bioinformatics;1/8/2016, Vol. 17, p1 

    Background: Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics