A haplotype inference algorithm for trios based on deterministic sampling

Iliadis, Alexandros; Watkinson, John; Anastassiou, Dimitris; Xiaodong Wang
January 2010
BMC Genetics;2010, Vol. 11, p78
Academic Journal
Background: In genome-wide association studies, thousands of individuals are genotyped in hundreds of thousands of single nucleotide polymorphisms (SNPs). Statistical power can be increased when haplotypes, rather than three-valued genotypes, are used in analysis, so the problem of haplotype phase inference (phasing) is particularly relevant. Several phasing algorithms have been developed for data from unrelated individuals, based on different models, some of which have been extended to father-mother-child "trio" data. Results: We introduce a technique for phasing trio datasets using a tree-based deterministic sampling scheme. We have compared our method with publicly available algorithms PHASE v2.1, BEAGLE v3.0.2 and 2SNP v1.7 on datasets of varying number of markers and trios. We have found that the computational complexity of PHASE makes it prohibitive for routine use; on the other hand 2SNP, though the fastest method for small datasets, was significantly inaccurate. We have shown that our method outperforms BEAGLE in terms of speed and accuracy for small to intermediate dataset sizes in terms of number of trios for all marker sizes examined. Our method is implemented in the "Tree-Based Deterministic Sampling" (TDS) package, available for download at http://www.ee. columbia.edu/∼anastas/tds Conclusions: Using a Tree-Based Deterministic sampling technique, we present an intuitive and conceptually simple phasing algorithm for trio data. The trade off between speed and accuracy achieved by our algorithm makes it a strong candidate for routine use on trio datasets.


Related Articles

  • Slicing and Dicing the Genome: A Statistical Physics Approach to Population Genetics. Maruvka, Yosef; Shnerb, Nadav; Solomon, Sorin; Yaari, Gur; Kessler, David // Journal of Statistical Physics;Mar2011, Vol. 142 Issue 6, p1302 

    The inference of past demographic parameters from current genetic polymorphism is a fundamental problem in population genetics. The standard techniques utilize a reconstruction of the gene-genealogy, a cumbersome process that may be applied only to small numbers of sequences. We present a method...

  • Correcting Estimators of θ and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process. Ramírez-Soriano, Anna; Nielsen, Rasmus // Genetics;Feb2009, Vol. 181 Issue 2, p701 

    Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward all excess of common alleles compared to directly sequenced data, making standard genetic methods of...

  • Genetic variations in the CεmX domain of human membrane-bound IgE. Lei Wan; Jiun-Bo Chen; Hsih Hsin Chen; Janice Huang; Hui-Ming Yu; Shue-Fen Luo; Fuu Jen Tsai; Tse Wen Chang // Immunogenetics;May2010, Vol. 62 Issue 5, p273 

    The ε chain of membrane-bound IgE (mIgE) is expressed predominantly as a “long” isoform, containing an extra segment of 52 amino acid (a.a.) residues, referred to as CεmX, between the CH4 domain and the C-terminal membrane-anchoring transmembrane peptide. CεmX results from...

  • Cytokine gene polymorphism in human disease: on-line databases, Supplement 3. Hollegaard, M. V.; Bidwell, J. L. // Genes & Immunity;Jun2006, Vol. 7 Issue 4, p269 

    Within the past few years, the focus on cytokine single nucleotide polymorphism (SNP) function and association with human diseases has increased considerably. This third supplement to the Cytokine Gene Polymorphism in Human Disease: On-line database describes the positive associations of...

  • A Taq I PCR-RFLP Detecting a Novel SNP in Exon 2 of the Bovine POU1F1 Gene. Chuanying Pan; Xianyong Lan; Yikun Guo; Jianhong Shu; Chuzhao Lei; Xinzhuang Wang // Biochemical Genetics;Aug2008, Vol. 46 Issue 7/8, p424 

    Abstract  PCR-SSCP and DNA sequencing methods were applied to reveal three novel single nucleotide polymorphisms (SNPs) in exon 2 of the POU1F1 gene in 963 Chinese cattle belonging to eight breeds. Among them, a silent SNP (NM_174579:c.545G > A) detected by TaqI endonuclease is...

  • Sample-size properties of a case-control association analysis of multistage SNP studies for identifying disease susceptibility genes. Kitamura, Nobutaka; Akazawa, Kouhei; Toyabe, Shin-ichi; Miyashita, Akinori; Kuwano, Ryozo; Nakamura, Junichiro // Journal of Human Genetics;May2008, Vol. 53 Issue 5, p390 

    A two-stage association study is the most commonly used method to efficiently identify disease susceptibility genes. However, some recent single nucleotide polymorphism (SNP) studies recently utilized three-stage designs. The purpose of this study was to investigate the practical properties of...

  • The genetic structure of 3′untranslated region of the HLA-G gene: polymorphisms and haplotypes. Castelli, E. C.; Mendes-Junior, C. T.; Deghaide, N. H. S.; de Albuquerque, R. S.; Muniz, Y. C. N.; Simões, R. T.; Carosella, E. D.; Moreau, P.; Donadi, E. A. // Genes & Immunity;Mar2010, Vol. 11 Issue 2, p134 

    The HLA-G gene is predominantly expressed at the maternal–fetal interface. It has been associated with maternal–fetal tolerance and in the inhibition of cytotoxic T lymphocyte and natural killer cytolytic functions. At least two variations in the 3′untranslated region (UTR)...

  • Estimating genetic association parameters from family data. Whittemore, Alice S. // Biometrika;Mar2004, Vol. 91 Issue 1, p219 

    We consider the problem of estimating a parameter θ reflecting association between a disease and genotypes of a genetic polymorphism, using nuclear family data. In many applications, some parental genotypes are missing, and the distribution of these genotypes is unknown. Since...

  • Genetic polymorphism of thiopurine S-methyltransferase in Argentina. Laróvere, L. E.; de Kremer, R. Dodelson; Lambooy, L. H. J.; De Abreu, R. A. // Annals of Clinical Biochemistry;Jul2003, Vol. 40 Issue 4, p388 

    Background: Thiopurine methyltransferase (TPMT) catalyses the S-methylation of 6-thiopurine drugs, which are commonly used in the treatment of autoimmune diseases, leukaemia and organ transplantation. TPMT activity is polymorphic as a result of gene mutations. Ethnic variations in phenotype and...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics