Power and sample size estimation in high dimensional biology
Genomic scientists often test thousands of hypotheses in a single experiment. One example is a microarray experiment that seeks to determine differential gene expression among experimental groups. Planning such experiments involves a determination of sample size that will allow meaningful interpretations. Traditional power analysis methods may not be well suited to this task when thousands of hypotheses are tested in a discovery oriented basic research. We introduce the concept of expected discovery rate (EDR) and an approach that combines parametric mixture modelling with parametric bootstrapping to estimate the sample size needed for a desired accuracy of results. While the examples included are derived from microarray studies, the methods, herein, are ‘extraparadigmatic’ in the approach to study design and are applicable to most high dimensional biological situations. Pilot data from three different microarray experiments are used to extrapolate EDR as well as the related false discovery rate at different sample sizes and thresholds.