Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs

Pavy, Nathalie; Parsons, Lee S; Paule, Charles; MacKay, John; Bousquet, Jean
January 2006
BMC Genomics;2006, Vol. 7, p174
Academic Journal
Background: High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. Results: A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP = 0.95 or = 0.99. A total of 9,310 SNPs were detected by using PSNP = 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. Conclusion: We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies.


Related Articles

  • Generation and analysis of a 29,745 unique Expressed Sequence Tags from the Pacific oyster (Crassostrea gigas) assembled into a publicly accessible database: the GigasDatabase. Fleury, Elodie; Huvet, Arnaud; Lelong, Christophe; de Lorgeril, Julien; Boulo, Viviane; Gueguen, Yannick; Bachère, Evelyne; Tanguy, Arnaud; Moraga, Dario; Fabioux, Caroline; Lindeque, Penelope; Shaw, Jenny; Reinhardt, Richard; Prunet, Patrick; Davey, Grace; Lapègue, Sylvie; Sauvage, Christopher; Corporeau, Charlotte; Moal, Jeanne; Gavory, Frederick // BMC Genomics;2009, Vol. 10, p341 

    Background: Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific...

  • Empirical Bayes analysis of single nucleotide polymorphisms. Schwender, Holger; Ickstadt, Katja // BMC Bioinformatics;2008, Vol. 9, Special section p1 

    Background: An important goal of whole-genome studies concerned with single nucleotide polymorphisms (SNPs) is the identification of SNPs associated with a covariate of interest such as the case-control status or the type of cancer. Since these studies often comprise the genotypes of hundreds of...

  • High Polymorphism in Est-SSR Loci for Cellulose Synthase and β-Amylase of Sugarcane Varieties ( Saccharum spp.) Used by the Industrial Sector for Ethanol Production. Augusto, Raphael; Maranho, Rone; Mangolin, Claudete; Pires da Silva Machado, Maria // Applied Biochemistry & Biotechnology;Jan2015, Vol. 175 Issue 2, p965 

    High and low polymorphisms in simple sequence repeats of expressed sequence tag (EST-SSR) for specific proteins and enzymes, such as β-amylase, cellulose synthase, xyloglucan endotransglucosylase, fructose 1,6-bisphosphate aldolase, and fructose 1,6-bisphosphatase, were used to illustrate the...

  • Mouse SNP Miner: an annotated database of mouse functional single nucleotide polymorphisms. Reuveni, Eli; Ramensky, Vasily E; Gross, Cornelius // BMC Genomics;2007, Vol. 8, p24 

    Background: The mapping of quantitative trait loci in rat and mouse has been extremely successful in identifying chromosomal regions associated with human disease-related phenotypes. However, identifying the specific phenotype-causing DNA sequence variations within a quantitative trait locus has...

  • Genetic Polymorphisms in a 1.2 kb Long Fragment within Intron 2 of Chicken UBTD2 Gene. H. Y. Zhang; W. J. Yang; Y. Z. Luo; J. L Han // International Journal of Poultry Science;2013, Vol. 12 Issue 5, p307 

    The widely expressed chicken UBTD2 gene in different types of tissues from embryonic to adult developmental stages implies its important role in regulating protein ubiquitination and delivery of ubiquitinated substrates, however, there is no specific study on the genomic DNA structure and...

  • Using SNP Analysis for a Clinical Look at Disease. Gywnne, Peter // Drug Discovery & Development;Jan2003, Vol. 6 Issue 1, p45 

    Focuses on the use of single-nucleotide polymorphism analysis to determine the genetic basis of a disease. Facilitation of patient care; Benefits from the sequencing of human genome. INSET: Several SNPs in a Single Tube.

  • Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Halushka, Marc K.; Fan, Jian-Bing; Bentley, Kimberly; Hsie, Linda; Shen, Naiping; Weder, Alan; Cooper, Richard; Lipshutz, Robert; Chakravarti, Aravinda // Nature Genetics;Jul99, Vol. 22 Issue 3, p239 

    Sequence variation in human genes is largely confined to single-nucleotide polymorphisms (SNPs) and is valuable in tests of association with common diseases and pharmacogenetic traits. We performed a systematic and comprehensive survey of molecular variation to assess the nature, pattern and...

  • Response to Belgard et al. Skafidas, E; Testa, R; Zantomio, D; Chana, G; Everall, I P; Pantelis, C // Molecular Psychiatry;Jun2014, Vol. 19 Issue 6, p743 

    A correction to the article "Response to Belgard et al" on single nucleotide polymorphism (SNP) that was published online on January 14, 2014 is presented.

  • Sampling SNPs. Yang, Zhiyong; Ka-Shu Wong, Gane; Eberle, Michael A.; Kibukawa, Miho; Passey, Douglas A.; Hughes, William R.; Kruglyak, Leonid; Yu, Jun // Nature Genetics;Sep2000, Vol. 26 Issue 1, p13 

    Focuses on the launch of projects to study single-nucleotide polymorphism (SNP). DNA sequencing; Rate of SNP; Analysis of alleles.


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics