Software for optimization of SNP and PCR-RFLP genotyping to discriminate many genomes with the fewest assays

Shea N Gardner; Mark C Wagner
January 2005
BMC Genomics;2005, Vol. 6, p73
Academic Journal
Background: Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. Results: A method and SPR Opt (SNP and PCR-RFLP Optimization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. Conclusion: This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at http://www.llnl.gov/IPandC/technology/software/softwaretitles/spropt.php.


Related Articles

  • Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Davey, John W.; Hohenlohe, Paul A.; Etter, Paul D.; Boone, Jason Q.; Catchen, Julian M.; Blaxter, Mark L. // Nature Reviews Genetics;Jul2011, Vol. 12 Issue 7, p499 

    The advent of next-generation sequencing (NGS) has revolutionized genomic and transcriptomic approaches to biology. These new sequencing tools are also valuable for the discovery, validation and assessment of genetic markers in populations. Here we review and discuss best practices for several...

  • SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Can Yang; Zengyou He; Xiang Wan; Qiang Yang; Hong Xue; Weichuan Yu // Bioinformatics;Feb2009, Vol. 25 Issue 4, p504 

    Motivation: Hundreds of thousands of single nucleotide polymorphisms (SNPs) are available for genome-wide association (GWA) studies nowadays. The epistatic interactions of SNPs are believed to be very important in determining individual susceptibility to complex diseases. However, existing...

  • Mapping Accuracy of Short Reads from Massively Parallel Sequencing and the Implications for Quantitative Expression Profiling. Palmieri, Nicola; Schlötterer, Christian // PLoS ONE;2009, Vol. 4 Issue 7, p1 

    Background: Massively parallel sequencing offers an enormous potential for expression profiling, in particular for interspecific comparisons. Currently, different platforms for massively parallel sequencing are available, which differ in read length and sequencing costs. The 454-technology...

  • BEsTRF: a tool for optimal resolution of terminal-restriction fragment length polymorphism analysis based on user-defined primer-enzyme-sequence databases. Blaz Stres; James M. Tiedje; Bostjan Murovec // Bioinformatics;Jun2009, Vol. 25 Issue 12, p1556 

    Summary: BEsTRF (Best Estimated T-RF) provides a standalone environment for analyzing primers-enzymes-gene section combinations used in terminal-restriction fragment length polymorphism (T-RFLP) for its optimal resolution. User-defined sequence databases of several hundred thousand DNA sequences...

  • Molecular characterization of infectious bursal disease virus (IBDV): Diversity of very virulent IBDV in Tanzania. Kasanga, C. J.; Yamaguchi, T.; Wambura, P. N.; Maeda-Machang'u, A. D.; Ohya, K.; Fukushi, H. // Archives of Virology;Apr2007, Vol. 152 Issue 4, p783 

    Nucleotide sequences of the VP2 hypervariable region (VP2-HVR) of 14 infectious bursal disease viruses (IBDVs) isolated in Tanzania from 2001 to 2004 were determined. Phylogenetic analysis showed that the isolates diverged into two genotypes and belonged to the very virulent (VV) type. In the...

  • PanCGH: a genotype-calling algorithm for pangenome CGH data. Jumamurat R. Bayjanov; Michiel Wels; Marjo Starrenburg; Johan E. T. van Hylckama Vlieg; Roland J. Siezen; Douwe Molenaar // Bioinformatics;Feb2009, Vol. 25 Issue 3, p309 

    Motivation: Pangenome arrays contain DNA oligomers targeting several sequenced reference genomes from the same species. In microbiology, these can be employed to investigate the often high genetic variability within a species by comparative genome hybridization (CGH). The biological...

  • Dynamic equilibrium of Marek's disease genomes during in vitro serial passage. Spatz, Stephen; Volkening, Jeremy; Gimeno, Isabel; Heidari, Mohammad; Witter, Richard // Virus Genes;Dec2012, Vol. 45 Issue 3, p526 

    Attenuation of Gallid herpesvirus-2 (GaHV-2), the causative agent of Marek's disease, can occur through serial passage of a virulent field isolate in avian embryo fibroblasts. In order to gain a better understanding of the genes involved in attenuation and associate observed changes in phenotype...

  • Detecting disease rare alleles using single SNPs in families and haplotyping in unrelated subjects from the Genetic Analysis Workshop 17 data. Kraja, Aldi T.; Czajkowski, Jacek; Feitosa, Mary F.; Borecki, Ingrid B.; Province, Michael A. // BMC Proceedings;2011 Supplement 9, Vol. 5 Issue Suppl 9, p1 

    We present an evaluation of discovery power for two association tests that work well with common alleles but are applied to the Genetic Analysis Workshop 17 simulations with rare causative single-nucleotide polymorphisms (SNPs) (minor allele frequency [MAF] < 1%). The methods used were...

  • Quantifying single nucleotide variant detection sensitivity in exome sequencing. Meynert, Alison M.; Bicknell, Louise S.; Hurles, Matthew E.; Jackson, Andrew P.; Taylor, Martin S. // BMC Bioinformatics;2013, Vol. 14 Issue 1, p1 

    Background: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms....


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics