Improving the statistical detection of regulated genes from microarray data using intensity-based variance estimation

Comander, Jason; Natarajan, Sripriya; Gimbrone Jr., Michael A.; García-Cardeña, Guillermo
January 2004
BMC Genomics;2004, Vol. 5, p17
Academic Journal
Background: Gene microarray technology provides the ability to study the regulation of thousands of genes simultaneously, but its potential is limited without an estimate of the statistical significance of the observed changes in gene expression. Due to the large number of genes being tested and the comparatively small number of array replicates (e.g., N = 3), standard statistical methods such as the Student's t-test fail to produce reliable results. Two other statistical approaches commonly used to improve significance estimates are a penalized t-test and a Z-test using intensity-dependent variance estimates. Results: The performance of these approaches is compared using a dataset of 23 replicates, and a new implementation of the Z-test is introduced that pools together variance estimates of genes with similar minimum intensity. Significance estimates based on 3 replicate arrays are calculated using each statistical technique, and their accuracy is evaluated by comparing them to a reliable estimate based on the remaining 20 replicates. The reproducibility of each test statistic is evaluated by applying it to multiple, independent sets of 3 replicate arrays. Two implementations of a Z-test using intensity-dependent variance produce more reproducible results than two implementations of a penalized t-test. Furthermore, the minimum intensity-based Z-statistic demonstrates higher accuracy and higher or equal precision than all other statistical techniques tested. Conclusion: An intensity-based variance estimation technique provides one simple, effective approach that can improve p-value estimates for differentially regulated genes derived from replicated microarray datasets. Implementations of the Z-test algorithms are available at http://vessels.bwh.harvard.edu/software/papers/bmcg2004.


Related Articles

  • GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Davis Sean; Paul S. Meltzer // Bioinformatics;Jul2007, Vol. 23 Issue 14, p1846 

    Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics...

  • Discovering gene expression patterns in time course microarray experiments by ANOVA SCA. María José Nueda; Ana Conesa; Johan A. Westerhuis; Huub C. J. Hoefsloot; Age K. Smilde; Manuel Talón; Alberto Ferrer // Bioinformatics;Jul2007, Vol. 23 Issue 14, p1792 

    Motivation: Designed microarray experiments are used to investigate the effects that controlled experimental factors have on gene expression and learn about the transcriptional responses associated with external variables. In these datasets, signals of interest coexist with varying sources of...

  • CALIB: a Bioconductor package for estimating absolute expression levels from two-color microarray data. Hui Zhao; Kristof Engelen; Bart De Moor; Kathleen Marchal // Bioinformatics;Jul2007, Vol. 23 Issue 13, p1700 

    In this article we describe a new Bioconductor package ‘CALIB’ for normalization of two-color microarray data. This approach is based on the measurements of external controls and estimates an absolute target level for each gene and condition pair, as opposed to working with...

  • Characterization of mismatch and high-signal intensity probes associated with Affymetrix genechips. Yonghong Wang; Ze-Hong Miao; Yves Pommier; Ernest S. Kawasaki; Audrey Player // Bioinformatics;Sep2007, Vol. 23 Issue 16, p2088 

    Motivation: For Affymetrix microarray platforms, gene expression is determined by computing the difference in signal intensities between perfect match (PM) and mismatch (MM) probesets. Although the use of PM is not controversial, MM probesets have been associated with variance and ultimately...

  • Toxicity of Methanol and Formaldehyde Towards Saccharomyces cerevisiae as Assessed by DNA Microarray Analysis. Yasokawa, Daisuke; Murata, Satomi; Iwahashi, Yumiko; Kitagawa, Emiko; Nakagawa, Ryoji; Hashido, Tazusa; Iwahashi, Hitoshi // Applied Biochemistry & Biotechnology;Mar2010, Vol. 160 Issue 6, p1685 

    To assess the toxicity of the C1 compounds methanol and formaldehyde, gene expression profiles of treated baker's yeast were analyzed using DNA microarrays. Among approximately 6,000 open reading frames (ORFs), 314 were repressed and 375 were induced in response to methanol. The gene process...

  • PGC-1a-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Mootha, Vamsi K; Lindgren, Cecilia M; Eriksson, Karl-Fredrik; Subramanian, Aravind; Sihag, Smita; Lehar, Joseph; Puigserver, Pere; Carlsson, Emma; Ridderstrale, Martin; Laurila, Esa; Houstis, Nicholas; Daly, Mark J; Patterson, Nick; Mesirov, Jill P; Golub, Todd R; Tamayo, Pablo; Spiegelman, Bruce; Lander, Eric S; Hirschhorn, Joel N // Nature Genetics;Jul2003, Vol. 34 Issue 3, p267 

    DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but...

  • A Bayesian method for analysing spotted microarray data. Meiklejohn, Colin D.; Townsend, Jeffrey P. // Briefings in Bioinformatics;Dec2005, Vol. 6 Issue 4, p318 

    In the decade since their invention, spotted microarrays have been undergoing technical advances that have increased the utility, scope and precision of their ability to measure gene expression. At the same time, more researchers are taking advantage of the fundamentally quantitative nature of...

  • TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. Haiyan Wang; Hongyan Zhang; Zhijun Dai; Ming-shun Chen; Zheming Yuan // BMC Medical Genomics;2013, Vol. 6 Issue Suppl 1, p1 

    Background: One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP), k-Top Scoring Pairs (k-TSP), Support Vector Machines (SVM), and...

  • Bayesian variable selection for disease classification using gene expression data. Yang Ai-Jun; Song Xin-Yuan // Bioinformatics;Jan2010, Vol. 26 Issue 2, p215 

    Motivation: An important application of gene expression microarray data is the classification of samples into categories. Accurate classification depends upon the method used to identify the most relevant genes. Owing to the large number of genes and relatively small sample size, the selection...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics