Visualization of SNPs with t-SNE

Platzer, Alexander
February 2013
PLoS ONE;Feb2013, Vol. 8 Issue 2, p1
Academic Journal
Background: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. Principal Findings: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. Significance: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.


Related Articles

  • Efficient techniques for genotype-phenotype correlational analysis. Saha, Subrata; Rajasekaran, Sanguthevar; Bi, Jinbo; Pathak, Sudipta // BMC Medical Informatics & Decision Making;2013, Vol. 13 Issue 1, p1 

    Background: Single Nucleotide Polymorphisms (SNPs) are sequence variations found in individuals at some specific points in the genomic sequence. As SNPs are highly conserved throughout evolution and within a population, the map of SNPs serves as an excellent genotypic marker. Conventional SNPs...

  • Investigation of Inversion Polymorphisms in the Human Genome Using Principal Components Analysis. Ma, Jianzhong; Amos, Christopher I. // PLoS ONE;Jul2012, Vol. 7 Issue 7, p1 

    Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack...

  • HaploPOP: a software that improves population assignment by combining markers into haplotypes. Duforet-Frebourg, Nicolas; Gattepaille, Lucie M.; Blum, Michael G. B.; Jakobsson, Mattias // BMC Bioinformatics;Jul2015, Vol. 16 Issue 1, p1 

    Background: In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment...

  • Rare Variants Detection with Kernel Machine Learning Based on Likelihood Ratio Test. Zeng, Ping; Zhao, Yang; Zhang, Liwei; Huang, Shuiping; Chen, Feng // PLoS ONE;Mar2014, Vol. 9 Issue 3, p1 

    This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the...

  • Human genetics: Pleiotropic mutations. Stower, Hannah // Nature Reviews Genetics;Jan2012, Vol. 13 Issue 1, p5 

    The article presents a study in which authors use a large-scale analysis of single nucleotide polymorphisms (SNPs) and genes which are reported to be associated with common complex traits and diseases on abundant pleiotropy, in which each individual gene mutation has a role in multiple diseases.

  • Author reply to A commentary on Pitfalls of predicting complex traits from SNPs. Wray, Naomi R.; Yang, Jian; Hayes, Ben J.; Price, Alkes L.; Goddard, Michael E.; Visscher, Peter M. // Nature Reviews Genetics;Dec2013, Vol. 14 Issue 12, p894 

    A response from the author of the article "Pitfalls of predicting complex traits from SNPs" in the 2013 issue is presented.

  • A commentary on Pitfalls of predicting complex traits from SNPs. de los Campos, Gustavo; Sorensen, Daniel A. // Nature Reviews Genetics;Dec2013, Vol. 14 Issue 12, p894 

    A letter to the editor is presented in response to the article "Pitfalls of predicting complex traits from SNPs" in the 2013 issue.

  • SNP-SIG Meeting 2011: Identification and annotation of SNPs in the context of structure, function, and disease. Bromberg, Yana; Capriotti, Emidio // BMC Genomics;2012, Vol. 13 Issue Suppl 4, p1 

    Information about several topics discussed at the Single Nucleotide Polymorphism (SNP) Special Interesting Group (SNP-SIG) meeting at the Intelligent Systems for Molecular Biology/European Conference on Computational Biology (ISMB/ECCB) conference held on July 15, 2011 in Vienna, Austria is...

  • How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. Vihinen, Mauno // BMC Genomics;2012, Vol. 13 Issue Suppl 4, p1 

    Background: Prediction methods are increasingly used in biosciences to forecast diverse features and characteristics. Binary two-state classifiers are the most common applications. They are usually based on machine learning approaches. For the end user it is often problematic to evaluate the...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics