PUMP: Estimating Power, Minimum Detectable Effect Size, and Sample Size When Adjusting for Multiple Outcomes in Multi-Level Experiments
Main Article Content
Abstract
For randomized controlled trials (RCTs) with a single intervention's impact being measured on multiple outcomes, researchers often apply a multiple testing procedure (such as Bonferroni or Benjamini-Hochberg) to adjust p values. Such an adjustment reduces the likelihood of spurious findings, but also changes the statistical power, sometimes substantially. A reduction in power means a reduction in the probability of detecting effects when they do exist. This consideration is frequently ignored in typical power analyses, as existing tools do not easily accommodate the use of multiple testing procedures. We introduce the PUMP (Power Under Multiplicity Project) R package as a tool for analysts to estimate statistical power, minimum detectable effect size, and sample size requirements for multi-level RCTs with multiple outcomes. PUMP uses a simulation-based approach to flexibly estimate power for a wide variety of experimental designs, number of outcomes, multiple testing procedures, and other user choices. By assuming linear mixed effects models, we can draw directly from the joint distribution of test statistics across outcomes and thus estimate power via simulation. One of PUMP's main innovations is accommodating multiple outcomes, which are accounted for in two ways. First, power estimates from PUMP properly account for the adjustment in p values from applying a multiple testing procedure. Second, when considering multiple outcomes rather than a single outcome, different definitions of statistical power emerge. PUMP allows researchers to consider a variety of definitions of power in order to choose the most appropriate types of power for the goals of their study. The package supports a variety of commonly used frequentist multi-level RCT designs and linear mixed effects models. In addition to the main functionality of estimating power, minimum detectable effect size, and sample size requirements, the package allows the user to easily explore sensitivity of these quantities to changes in underlying assumptions.