Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies

Pattaro, Cristian; Ruczinski, Ingo; Fallin, Danièle M.; Parmigiani, Giovanni
January 2008
BMC Genomics;2008, Vol. 9, Special section p1
Academic Journal
Background: Identification of disease-related genes in association studies is challenged by the large number of SNPs typed. To address the dilution of power caused by high dimensionality, and to generate results that are biologically interpretable, it is critical to take into consideration spatial correlation of SNPs along the genome. With the goal of identifying true genetic associations, partitioning the genome according to spatial correlation can be a powerful and meaningful way to address this dimensionality problem. Results: We developed and validated an MCMC Algorithm To Identify blocks of Linkage DisEquilibrium (MATILDE) for clustering contiguous SNPs, and a statistical testing framework to detect association using partitions as units of analysis. We compared its ability to detect true SNP associations to that of the most commonly used algorithm for block partitioning, as implemented in the Haploview and HapBlock software. Simulations were based on artificially assigning phenotypes to individuals with SNPs corresponding to region 14q11 of the HapMap database. When block partitioning is performed using MATILDE, the ability to correctly identify a disease SNP is higher, especially for small effects, than it is with the alternatives considered. Advantages can be both in terms of true positive findings and limiting the number of false discoveries. Finer partitions provided by LD-based methods or by marker-by-marker analysis are efficient only for detecting big effects, or in presence of large sample sizes. The probabilistic approach we propose offers several additional advantages, including: a) adapting the estimation of blocks to the population, technology, and sample size of the study; b) probabilistic assessment of uncertainty about block boundaries and about whether any two SNPs are in the same block; c) user selection of the probability threshold for assigning SNPs to the same block. Conclusion: We demonstrate that, in realistic scenarios, our adaptive, study-specific block partitioning approach is as or more efficient than currently available LD-based approaches in guiding the search for disease loci.


Related Articles

  • Predicting the probability of H3K4me3 occupation at a base pair from the genome sequence context. Ha, Misook; Hong, Soondo; Li, Wen-Hsiung // Bioinformatics;May2013, Vol. 29 Issue 9, p1199 

    Motivation: Histone modifications regulate chromatin structure and gene expression. Although nucleosome formation is known to be affected by primary DNA sequence composition, no sequence signature has been identified for histone modifications. It is known that dense H3K4me3 nucleosome sites are...

  • Opinion: The Human Genome Diversity Project: past, present and future. Cavalli-Sforza, L. Luca // Nature Reviews Genetics;Apr2005, Vol. 6 Issue 4, p333 

    The Human Genome Project, in accomplishing its goal of sequencing one human genome, heralded a new era of research, a component of which is the systematic study of human genetic variation. Despite delays, the Human Genome Diversity Project has started to make progress in understanding the...

  • A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks. Zhen Gao; Ruizhe Zhao; Jianhua Ruan // BMC Genomics;2013, Vol. 14 Issue Suppl 1, p1 

    Background: Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches. Results: Using promoter sequences and gene expression...

  • A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528. Studholme, David J.; Ibanez, Selena Gimenez; MacLean, Daniel; Dangl, Jeffery L.; Chang, Jeff H.; Rathjen, John P. // BMC Genomics;2009, Vol. 10, p395 

    Background: Pseudomonas syringae is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. Pathogenicity of P. syringae strains is dependent on the type III secretion system, which secretes a suite of up to about thirty virulence 'effector'...

  • Genome size in natural and synthetic autopolyploids and in a natural segmental allopolyploid of several Triticeae species. Eilam, T.; Anikster, Y.; Millet, E.; Manisterski, J.; Feldman, M. // Genome;Mar2009, Vol. 52 Issue 3, p275 

    Nuclear DNA amount (1C) was determined by flow cytometry in the autotetraploid cytotype of Hordeum bulbosum, in the cytologically diploidized autotetraploid cytotypes of Elymus elongatus, Hordeum murinum subsp. murinum and Hordeum murinum subsp. leporinum, in Hordeum marinum subsp. gussoneanum,...

  • Boundaries in vertebrate genomes: different solutions to adequately insulate gene expression domains. Moltó, Eduardo; Fernández, Almudena; Montoliu, Lluis // Briefings in Functional Genomics & Proteomics;Jul2009, Vol. 8 Issue 4, p283 

    Gene expression domains are normally not arranged in vertebrate genomes according to their expression patterns. Instead, it is not unusual to find genes expressed in different cell types, or in different developmental stages, sharing a particular region of a chromosome. Therefore, the existence...

  • Vive la difference! Lee, Charles // Nature Genetics;Jul2005, Vol. 37 Issue 7, p660 

    Until very recently, it was widely touted that the complete DNA sequences of any two human beings were 99.9% identical. A new study refutes this notion through a comprehensive comparison of two individual genomes which detects hundreds of new structural genomic variants.

  • IP6K gene identification in plant genomes by tag searching. Fassetti, Fabio; Leone, Ofelia; Palopoli, Luigi; Rombo, Simona E.; Saiardi, Adolfo // BMC Proceedings;2011 Supplement 2, Vol. 5 Issue Suppl 2, p1 

    Background: Plants have played a special role in inositol polyphosphate (IP) research since in plant seeds was discovered the first IP, the fully phosphorylated inositol ring of phytic acid (IP6). It is now known that phytic acid is further metabolized by the IP6 Kinases (IP6Ks) to generate IP...

  • A Thermodynamic Switch for Chromosome Colocalization. Nicodemi, Mario; Panning, Barbara; Prisco, Antonella // Genetics;May2008, Vol. 179 Issue 1, p717 

    A general model for the early recognition and colocalization of homologous DNA sequences is proposed. We show, on thermodynamic grounds, how the distance between two homologous DNA sequences is spontaneously regulated by the concentration and affinity of diffusible mediators binding them, which...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics