Power and sample size estimation in high dimensional biology

Gadbury, Gary L.; Page, Grier P.; Jode Edwards; Kayo, Tsuyoshi; Prolla, Tomas A.; Weindruch, Richard; Permana, Paska A.; mountz, John D.; Allison, David B.
August 2004
Statistical Methods in Medical Research;Aug2004, Vol. 13 Issue 4, p325
Academic Journal
Genomic scientists often test thousands of hypotheses in a single experiment. One example is a microarray experiment that seeks to determine differential gene expression among experimental groups. Planning such experiments involves a determination of sample size that will allow meaningful interpretations. Traditional power analysis methods may not be well suited to this task when thousands of hypotheses are tested in a discovery oriented basic research. We introduce the concept of expected discovery rate (EDR) and an approach that combines parametric mixture modelling with parametric bootstrapping to estimate the sample size needed for a desired accuracy of results. While the examples included are derived from microarray studies, the methods, herein, are 'extraparadigmatic' in the approach to study design and are applicable to most high dimensional biological situations. Pilot data from three different microarray experiments are used to extrapolate EDR as well as the related false discovery rate at different sample sizes and thresholds.


Related Articles

  • Identification of proprotein convertase substrates using genome-wide expression correlation analysis.  // BMC Genomics;2011 Supplement 2, Vol. 12 Issue Suppl 2, p618 

    Background: Subtilisin/kexin-like proprotein convertase (PCSK) enzymes have important regulatory function in a wide variety of biological processes. PCSKs proteolytically process at a target sequence that contains basic amino acids arginine and lysine, which results in functional maturation of...

  • Population Models of Genomic Imprinting. II. Maternal and Fertility Selection. Spencer, Hamish G.; Dorn, Timothy; LoFaro, Thomas // Genetics;Aug2006, Vol. 173 Issue 4, p2391 

    Under several hypotheses for the evolutionary origin of imprinting, genes with maternal and reproductive effects are more likely to be imprinted. We thus investigate the effect of genomic imprinting in single-locus diallelic models of maternal and fertility selection. First, the model proposed...

  • Detecting multiple associations in genome-wide studies. Dudbridge, Frank; Gusnanto, Arief; Koeleman, Bobby P. C. // Human Genomics;Mar2006, Vol. 2 Issue 5, p310 

    Recent developments in the statistical analysis of genome-wide studies are reviewed. Genome-wide analyses are becoming increasingly common in areas such as scans for disease-associated markers and gene expression profiling. The data generated by these studies present new problems for statistical...

  • Editorial. Elgar, Greg // Briefings in Functional Genomics & Proteomics;Feb2004, Vol. 2 Issue 4, p277 

    Provides information on articles on functional genomics and proteomics. Approaches to systems biology; Gap between researchers over networking in systems biology; Views of biologists on in vitro system; Relationship between gene expression and chromatin structure.

  • The eloquent ape: genes, brains and the evolution of language. Fisher, Simon E.; Marcus, Gary F. // Nature Reviews Genetics;Jan2006, Vol. 7 Issue 1, p9 

    The human capacity to acquire complex language seems to be without parallel in the natural world. The origins of this remarkable trait have long resisted adequate explanation, but advances in fields that range from molecular genetics to cognitive neuroscience offer new promise. Here we...

  • (Im)Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression regulation: marrying control engineering with metabolic control analysis. Fei He; Fromion, Vincent; Westerhoff, Hans V. // BMC Systems Biology;2013, Vol. 7 Issue 1, p1 

    Background Metabolic control analysis (MCA) and supply-demand theory have led to appreciable understanding of the systems properties of metabolic networks that are subject exclusively to metabolic regulation. Supply-demand theory has not yet considered gene-expression regulation explicitly...

  • POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes. Hongo, Jorge A.; de Castro, Giovanni M.; Cintra, Leandro C.; Zerlotini, Adhemar; Lobo, Francisco P. // BMC Genomics;Aug2015, Vol. 16 Issue 1, p1 

    Background: Detection of genes evolving under positive Darwinian evolution in genome-scale data is nowadays a prevailing strategy in comparative genomics studies to identify genes potentially involved in adaptation processes. Despite the large number of studies aiming to detect and contextualize...

  • Validation of Computational Methods in Genomics. Dougherty, Edward R; Jianping Hua; Bittner, Michael L. // Current Genomics;Mar2007, Vol. 8 Issue 1, p1 

    High-throughput technologies for genomics provide tens of thousands of genetic measurements, for instance, gene-expression measurements on microarrays, and the availability of these measurements has motivated the use of machine learning (inference) methods for classification, clustering, and...

  • A functional genomics guide to the galaxy of neuronal cell types. Diaz, Elva // Nature Neuroscience;Jan2006, Vol. 9 Issue 1, p10 

    The article focuses on the use of a functional genomics approach to uncover the molecular basis of neuronal identity. Genome-wide approaches might be used to delineate neuronal cell types based on global gene expression. Researchers conduct an analysis of the gene expression in 12 distinct...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics