5B.1 Power Analysis
Table of Contents
What is Statistical Power?
Statistical power is the probability of correctly rejecting the null hypothesis when it is false—in other words, the probability of detecting an effect when one truly exists.
- Power = 1 - β, where β is the probability of a Type II error (false negative)
- Standard target: 80% power (β = 0.20)
- More stringent: 90% power for high-stakes decisions
An underpowered study wastes resources—you collect data but can't detect effects even if they exist. It's ethically problematic (participants' time for nothing), scientifically problematic (contributes to publication bias), and practically problematic (you don't learn what you set out to learn).
Key Parameters
Power depends on four interconnected quantities:
| Parameter | Symbol | Effect on Power |
|---|---|---|
| Sample size | N | ↑ N → ↑ Power |
| Effect size | δ or d | ↑ Effect → ↑ Power |
| Variance/noise | σ² | ↑ Variance → ↓ Power |
| Significance level | α | ↑ α → ↑ Power (but more false positives) |
Given any three, you can solve for the fourth. Common scenarios:
- Sample size calculation: Given target power (80%), effect size, and variance → find required N
- Minimum detectable effect: Given N, power, and variance → find smallest effect you can detect
- Power calculation: Given N, effect size, and variance → find your actual power
Analytical Formulas
Simple Two-Group Comparison (t-test)
Sample size per group for comparing means:
n = 2 × [(z1-α/2 + z1-β)² × σ²] / δ²
Where:
n = sample size per group
z1-α/2 = 1.96 for α = 0.05 (two-sided)
z1-β = 0.84 for 80% power, 1.28 for 90%
σ² = pooled variance of outcome
δ = minimum detectable effect (difference in means)
Minimum Detectable Effect (MDE)
Given sample size, what's the smallest effect you can detect?
MDE = (z1-α/2 + z1-β) × σ × √[1/(P(1-P)N)]
Where:
P = proportion assigned to treatment
N = total sample size
σ = standard deviation of outcome
For P = 0.5 (50% treated): MDE = 2.8 × σ / √N (at 80% power, α = 0.05)
Power Commands in Stata/R/Python
# Python: Power analysis with statsmodels
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Calculate required sample size
n = analysis.solve_power(
effect_size=0.3, # Cohen's d
power=0.8, # Target power
alpha=0.05, # Significance level
ratio=1.0, # n2/n1 ratio
alternative='two-sided'
)
print(f"Required n per group: {n:.0f}")
# Calculate power given sample size
power = analysis.solve_power(
effect_size=0.3,
nobs1=100, # Sample size group 1
alpha=0.05,
ratio=1.0,
alternative='two-sided'
)
print(f"Power: {power:.1%}")
# Calculate MDE given sample size and power
mde = analysis.solve_power(
power=0.8,
nobs1=200,
alpha=0.05,
ratio=1.0,
alternative='two-sided'
)
print(f"MDE (Cohen's d): {mde:.3f}")
* Stata: Built-in power commands
* Calculate required sample size for two-sample t-test
power twomeans 50 55, sd(15) power(0.8)
* Output: n per group needed to detect difference of 5 (50 vs 55)
* Calculate power given sample size
power twomeans 50 55, sd(15) n(100)
* Calculate MDE given sample size and power
power twomeans 50, sd(15) n(200) power(0.8)
* Two-sample proportions
power twoproportions 0.3 0.4, power(0.8)
* Create power curve table
power twomeans 50 55, sd(15) n(50(50)300) table
* Create power curve graph
power twomeans 50 55, sd(15) n(50(50)300) graph
# R: Power analysis with pwr package
library(pwr)
# Calculate required sample size
result <- pwr.t.test(
d = 0.3, # Cohen's d effect size
sig.level = 0.05, # Alpha
power = 0.8, # Target power
type = "two.sample", # Two-group comparison
alternative = "two.sided"
)
print(result)
# n is per group
# Calculate power given sample size
result <- pwr.t.test(
d = 0.3,
n = 100,
sig.level = 0.05,
type = "two.sample"
)
print(result$power)
# Calculate MDE given sample size
result <- pwr.t.test(
n = 200,
sig.level = 0.05,
power = 0.8,
type = "two.sample"
)
print(result$d)
# Power curve plot
plot(result)
Required n per group: 176 Power: 55.8% MDE (Cohen's d): 0.281
. power twomeans 50 55, sd(15) power(0.8)
Performing iteration ...
Estimated sample size for a two-sample means test
t test assuming sd1 = sd2 = sd
Ho: m2 = m1 versus Ha: m2 != m1
Study parameters:
alpha = 0.0500
power = 0.8000
delta = 5.0000
m1 = 50.0000
m2 = 55.0000
sd = 15.0000
Estimated sample sizes:
N = 284
N per group = 142
. power twomeans 50 55, sd(15) n(100)
Estimated power for a two-sample means test
t test assuming sd1 = sd2 = sd
alpha = 0.0500
N = 200
N per group = 100
delta = 5.0000
m1 = 50.0000
m2 = 55.0000
sd = 15.0000
power = 0.6468
. power twomeans 50, sd(15) n(200) power(0.8)
Estimated experimental-group mean for a two-sample means test
m2 = 53.3267 (MDE = 3.33)
Two-sample t test power calculation
n = 175.3847
d = 0.3
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
[1] 0.5577254
[1] 0.2809507
Simulation-Based Power Analysis
When analytical formulas don't apply (complex designs, non-standard estimators), use simulation:
- Specify the data generating process with your assumed effect
- Simulate many datasets (1000+)
- Run your analysis on each dataset
- Count how often you reject the null (that's your power)
# Python: Simulation-based power for clustered design
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from tqdm import tqdm
def simulate_cluster_rct(n_clusters, cluster_size, effect, icc=0.1):
"""Simulate a cluster-randomized trial."""
# Total variance = 1, split between cluster and individual
cluster_var = icc
indiv_var = 1 - icc
# Generate data
data = []
for c in range(n_clusters):
treated = np.random.binomial(1, 0.5)
cluster_effect = np.random.normal(0, np.sqrt(cluster_var))
for i in range(cluster_size):
indiv_error = np.random.normal(0, np.sqrt(indiv_var))
y = effect * treated + cluster_effect + indiv_error
data.append({'cluster': c, 'treated': treated, 'y': y})
return pd.DataFrame(data)
def run_analysis(df):
"""Run analysis with cluster-robust SEs."""
model = smf.ols('y ~ treated', data=df).fit(
cov_type='cluster', cov_kwds={'groups': df['cluster']}
)
return model.pvalues['treated']
# Run power simulation
n_sims = 1000
effect_size = 0.3
n_clusters = 30
cluster_size = 20
significant = 0
for _ in tqdm(range(n_sims)):
df = simulate_cluster_rct(n_clusters, cluster_size, effect_size)
pval = run_analysis(df)
if pval < 0.05:
significant += 1
power = significant / n_sims
print(f"Estimated power: {power:.1%}")
* Stata: Simulation-based power
* Define program for one simulation
capture program drop sim_cluster
program define sim_cluster, rclass
args n_clusters cluster_size effect icc
clear
local total = `n_clusters' * `cluster_size'
set obs `total'
* Generate cluster IDs
gen cluster = ceil(_n / `cluster_size')
* Cluster-level treatment (randomize at cluster level)
bysort cluster: gen treated = (runiform() < 0.5) if _n == 1
bysort cluster: replace treated = treated[1]
* Generate outcome with ICC
local clust_sd = sqrt(`icc')
local ind_sd = sqrt(1 - `icc')
bysort cluster: gen cluster_fe = rnormal(0, `clust_sd') if _n == 1
bysort cluster: replace cluster_fe = cluster_fe[1]
gen y = `effect' * treated + cluster_fe + rnormal(0, `ind_sd')
* Run analysis with clustered SEs
quietly reg y treated, cluster(cluster)
return scalar pval = 2*ttail(e(df_r), abs(_b[treated]/_se[treated]))
end
* Run simulations
simulate pval=r(pval), reps(1000) seed(42): ///
sim_cluster 30 20 0.3 0.1
* Calculate power
count if pval < 0.05
display "Power: " r(N)/1000
# R: Simulation-based power for cluster RCT
library(lme4)
library(lmerTest)
simulate_cluster_rct <- function(n_clusters, cluster_size, effect, icc = 0.1) {
cluster_var <- icc
indiv_var <- 1 - icc
data <- do.call(rbind, lapply(1:n_clusters, function(c) {
treated <- rbinom(1, 1, 0.5)
cluster_effect <- rnorm(1, 0, sqrt(cluster_var))
data.frame(
cluster = c,
treated = treated,
y = effect * treated + cluster_effect + rnorm(cluster_size, 0, sqrt(indiv_var))
)
}))
return(data)
}
run_analysis <- function(df) {
# Cluster-robust SEs via mixed model
model <- lmer(y ~ treated + (1|cluster), data = df)
p_value <- summary(model)$coefficients["treated", "Pr(>|t|)"]
return(p_value)
}
# Run power simulation
set.seed(42)
n_sims <- 1000
effect_size <- 0.3
p_values <- replicate(n_sims, {
df <- simulate_cluster_rct(30, 20, effect_size)
run_analysis(df)
})
power <- mean(p_values < 0.05)
cat(sprintf("Estimated power: %.1f%%\n", power * 100))
100%|██████████| 1000/1000 [00:12<00:00, 78.32it/s] Estimated power: 62.4%
. simulate pval=r(pval), reps(1000) seed(42): sim_cluster 30 20 0.3 0.1
command: sim_cluster 30 20 0.3 0.1
pval: r(pval)
Simulations (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
.................................................. 250
.................................................. 300
.................................................. 350
.................................................. 400
.................................................. 450
.................................................. 500
.................................................. 550
.................................................. 600
.................................................. 650
.................................................. 700
.................................................. 750
.................................................. 800
.................................................. 850
.................................................. 900
.................................................. 950
.................................................. 1000
. count if pval < 0.05
618
. display "Power: " r(N)/1000
Power: .618
Loading required package: lme4
Loading required package: Matrix
Loading required package: lmerTest
Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
Estimated power: 61.8%
Complex Designs
Cluster Randomization
When treatment is assigned at the group level (schools, villages), you need to account for intraclass correlation (ICC):
Design Effect
Design Effect (DEFF) = 1 + (m - 1) × ICC
Where:
m = cluster size
ICC = intraclass correlation (typically 0.01-0.20)
Effective sample size = N / DEFF
With ICC = 0.1 and cluster size = 20: DEFF = 1 + 19 × 0.1 = 2.9
You need ~3x as many observations as in individual-level randomization!
Multiple Treatment Arms
With k treatment arms, you need larger samples. For pairwise comparisons, apply a multiple testing correction (e.g., Bonferroni: α/k) and recalculate power.
Pre-Post Designs (ANCOVA)
If you have a baseline measure of the outcome, ANCOVA can dramatically increase power:
ANCOVA Power Gain
Required N (ANCOVA) = Required N (simple) × (1 - ρ²)
Where ρ = correlation between baseline and endline outcome.
If ρ = 0.5: Need only 75% as many observations
If ρ = 0.7: Need only 51% as many observations
- Duflo, Glennerster, & Kremer (2007): Sections on power analysis in experimental economics
- Optimal Design: Free software for power analysis: optimal-design
- EGAP Methods Guide: Power analysis for experiments: EGAP Power Guide