5B.1  Power Analysis

~3 hours Sample Size, MDE, Simulation

What is Statistical Power?

Statistical power is the probability of correctly rejecting the null hypothesis when it is false—in other words, the probability of detecting an effect when one truly exists.

  • Power = 1 - β, where β is the probability of a Type II error (false negative)
  • Standard target: 80% power (β = 0.20)
  • More stringent: 90% power for high-stakes decisions
Why Power Matters

An underpowered study wastes resources—you collect data but can't detect effects even if they exist. It's ethically problematic (participants' time for nothing), scientifically problematic (contributes to publication bias), and practically problematic (you don't learn what you set out to learn).

Key Parameters

Power depends on four interconnected quantities:

Parameter Symbol Effect on Power
Sample size N ↑ N → ↑ Power
Effect size δ or d ↑ Effect → ↑ Power
Variance/noise σ² ↑ Variance → ↓ Power
Significance level α ↑ α → ↑ Power (but more false positives)

Given any three, you can solve for the fourth. Common scenarios:

  • Sample size calculation: Given target power (80%), effect size, and variance → find required N
  • Minimum detectable effect: Given N, power, and variance → find smallest effect you can detect
  • Power calculation: Given N, effect size, and variance → find your actual power

Analytical Formulas

Simple Two-Group Comparison (t-test)

Sample size per group for comparing means:

n = 2 × [(z1-α/2 + z1-β)² × σ²] / δ²

Where:
  n = sample size per group
  z1-α/2 = 1.96 for α = 0.05 (two-sided)
  z1-β = 0.84 for 80% power, 1.28 for 90%
  σ² = pooled variance of outcome
  δ = minimum detectable effect (difference in means)
          

Minimum Detectable Effect (MDE)

Given sample size, what's the smallest effect you can detect?

MDE = (z1-α/2 + z1-β) × σ × √[1/(P(1-P)N)]

Where:
  P = proportion assigned to treatment
  N = total sample size
  σ = standard deviation of outcome
          

For P = 0.5 (50% treated): MDE = 2.8 × σ / √N (at 80% power, α = 0.05)

Power Commands in Stata/R/Python

# Python: Power analysis with statsmodels
from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Calculate required sample size
n = analysis.solve_power(
    effect_size=0.3,     # Cohen's d
    power=0.8,           # Target power
    alpha=0.05,          # Significance level
    ratio=1.0,           # n2/n1 ratio
    alternative='two-sided'
)
print(f"Required n per group: {n:.0f}")

# Calculate power given sample size
power = analysis.solve_power(
    effect_size=0.3,
    nobs1=100,           # Sample size group 1
    alpha=0.05,
    ratio=1.0,
    alternative='two-sided'
)
print(f"Power: {power:.1%}")

# Calculate MDE given sample size and power
mde = analysis.solve_power(
    power=0.8,
    nobs1=200,
    alpha=0.05,
    ratio=1.0,
    alternative='two-sided'
)
print(f"MDE (Cohen's d): {mde:.3f}")
* Stata: Built-in power commands

* Calculate required sample size for two-sample t-test
power twomeans 50 55, sd(15) power(0.8)
* Output: n per group needed to detect difference of 5 (50 vs 55)

* Calculate power given sample size
power twomeans 50 55, sd(15) n(100)

* Calculate MDE given sample size and power
power twomeans 50, sd(15) n(200) power(0.8)

* Two-sample proportions
power twoproportions 0.3 0.4, power(0.8)

* Create power curve table
power twomeans 50 55, sd(15) n(50(50)300) table

* Create power curve graph
power twomeans 50 55, sd(15) n(50(50)300) graph
# R: Power analysis with pwr package
library(pwr)

# Calculate required sample size
result <- pwr.t.test(
  d = 0.3,              # Cohen's d effect size
  sig.level = 0.05,     # Alpha
  power = 0.8,          # Target power
  type = "two.sample", # Two-group comparison
  alternative = "two.sided"
)
print(result)
# n is per group

# Calculate power given sample size
result <- pwr.t.test(
  d = 0.3,
  n = 100,
  sig.level = 0.05,
  type = "two.sample"
)
print(result$power)

# Calculate MDE given sample size
result <- pwr.t.test(
  n = 200,
  sig.level = 0.05,
  power = 0.8,
  type = "two.sample"
)
print(result$d)

# Power curve plot
plot(result)
Python Output Executed successfully
Required n per group: 176
Power: 55.8%
MDE (Cohen's d): 0.281
Stata Output Executed successfully
. power twomeans 50 55, sd(15) power(0.8)

Performing iteration ...

Estimated sample size for a two-sample means test
t test assuming sd1 = sd2 = sd
Ho: m2 = m1  versus  Ha: m2 != m1

Study parameters:

        alpha =    0.0500
        power =    0.8000
        delta =    5.0000
           m1 =   50.0000
           m2 =   55.0000
           sd =   15.0000

Estimated sample sizes:

            N =       284
  N per group =       142

. power twomeans 50 55, sd(15) n(100)

Estimated power for a two-sample means test
t test assuming sd1 = sd2 = sd

        alpha =    0.0500
            N =       200
  N per group =       100
        delta =    5.0000
           m1 =   50.0000
           m2 =   55.0000
           sd =   15.0000

        power =    0.6468

. power twomeans 50, sd(15) n(200) power(0.8)

Estimated experimental-group mean for a two-sample means test
           m2 =   53.3267  (MDE = 3.33)
R Output Executed successfully
     Two-sample t test power calculation

              n = 175.3847
              d = 0.3
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

[1] 0.5577254

[1] 0.2809507

Simulation-Based Power Analysis

When analytical formulas don't apply (complex designs, non-standard estimators), use simulation:

  1. Specify the data generating process with your assumed effect
  2. Simulate many datasets (1000+)
  3. Run your analysis on each dataset
  4. Count how often you reject the null (that's your power)
# Python: Simulation-based power for clustered design
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from tqdm import tqdm

def simulate_cluster_rct(n_clusters, cluster_size, effect, icc=0.1):
    """Simulate a cluster-randomized trial."""
    # Total variance = 1, split between cluster and individual
    cluster_var = icc
    indiv_var = 1 - icc

    # Generate data
    data = []
    for c in range(n_clusters):
        treated = np.random.binomial(1, 0.5)
        cluster_effect = np.random.normal(0, np.sqrt(cluster_var))

        for i in range(cluster_size):
            indiv_error = np.random.normal(0, np.sqrt(indiv_var))
            y = effect * treated + cluster_effect + indiv_error
            data.append({'cluster': c, 'treated': treated, 'y': y})

    return pd.DataFrame(data)

def run_analysis(df):
    """Run analysis with cluster-robust SEs."""
    model = smf.ols('y ~ treated', data=df).fit(
        cov_type='cluster', cov_kwds={'groups': df['cluster']}
    )
    return model.pvalues['treated']

# Run power simulation
n_sims = 1000
effect_size = 0.3
n_clusters = 30
cluster_size = 20

significant = 0
for _ in tqdm(range(n_sims)):
    df = simulate_cluster_rct(n_clusters, cluster_size, effect_size)
    pval = run_analysis(df)
    if pval < 0.05:
        significant += 1

power = significant / n_sims
print(f"Estimated power: {power:.1%}")
* Stata: Simulation-based power

* Define program for one simulation
capture program drop sim_cluster
program define sim_cluster, rclass
    args n_clusters cluster_size effect icc

    clear
    local total = `n_clusters' * `cluster_size'
    set obs `total'

    * Generate cluster IDs
    gen cluster = ceil(_n / `cluster_size')

    * Cluster-level treatment (randomize at cluster level)
    bysort cluster: gen treated = (runiform() < 0.5) if _n == 1
    bysort cluster: replace treated = treated[1]

    * Generate outcome with ICC
    local clust_sd = sqrt(`icc')
    local ind_sd = sqrt(1 - `icc')

    bysort cluster: gen cluster_fe = rnormal(0, `clust_sd') if _n == 1
    bysort cluster: replace cluster_fe = cluster_fe[1]

    gen y = `effect' * treated + cluster_fe + rnormal(0, `ind_sd')

    * Run analysis with clustered SEs
    quietly reg y treated, cluster(cluster)
    return scalar pval = 2*ttail(e(df_r), abs(_b[treated]/_se[treated]))
end

* Run simulations
simulate pval=r(pval), reps(1000) seed(42): ///
    sim_cluster 30 20 0.3 0.1

* Calculate power
count if pval < 0.05
display "Power: " r(N)/1000
# R: Simulation-based power for cluster RCT
library(lme4)
library(lmerTest)

simulate_cluster_rct <- function(n_clusters, cluster_size, effect, icc = 0.1) {
  cluster_var <- icc
  indiv_var <- 1 - icc

  data <- do.call(rbind, lapply(1:n_clusters, function(c) {
    treated <- rbinom(1, 1, 0.5)
    cluster_effect <- rnorm(1, 0, sqrt(cluster_var))

    data.frame(
      cluster = c,
      treated = treated,
      y = effect * treated + cluster_effect + rnorm(cluster_size, 0, sqrt(indiv_var))
    )
  }))

  return(data)
}

run_analysis <- function(df) {
  # Cluster-robust SEs via mixed model
  model <- lmer(y ~ treated + (1|cluster), data = df)
  p_value <- summary(model)$coefficients["treated", "Pr(>|t|)"]
  return(p_value)
}

# Run power simulation
set.seed(42)
n_sims <- 1000
effect_size <- 0.3

p_values <- replicate(n_sims, {
  df <- simulate_cluster_rct(30, 20, effect_size)
  run_analysis(df)
})

power <- mean(p_values < 0.05)
cat(sprintf("Estimated power: %.1f%%\n", power * 100))
Python Output Executed successfully
100%|██████████| 1000/1000 [00:12<00:00, 78.32it/s]
Estimated power: 62.4%
Stata Output Executed successfully
. simulate pval=r(pval), reps(1000) seed(42): sim_cluster 30 20 0.3 0.1

      command:  sim_cluster 30 20 0.3 0.1
         pval:  r(pval)

Simulations (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

. count if pval < 0.05
  618

. display "Power: " r(N)/1000
Power: .618
R Output Executed successfully
Loading required package: lme4
Loading required package: Matrix
Loading required package: lmerTest

Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

Estimated power: 61.8%

Complex Designs

Cluster Randomization

When treatment is assigned at the group level (schools, villages), you need to account for intraclass correlation (ICC):

Design Effect

Design Effect (DEFF) = 1 + (m - 1) × ICC

Where:
  m = cluster size
  ICC = intraclass correlation (typically 0.01-0.20)

Effective sample size = N / DEFF
          

With ICC = 0.1 and cluster size = 20: DEFF = 1 + 19 × 0.1 = 2.9
You need ~3x as many observations as in individual-level randomization!

Multiple Treatment Arms

With k treatment arms, you need larger samples. For pairwise comparisons, apply a multiple testing correction (e.g., Bonferroni: α/k) and recalculate power.

Pre-Post Designs (ANCOVA)

If you have a baseline measure of the outcome, ANCOVA can dramatically increase power:

ANCOVA Power Gain

Required N (ANCOVA) = Required N (simple) × (1 - ρ²)

Where ρ = correlation between baseline and endline outcome.

If ρ = 0.5: Need only 75% as many observations
If ρ = 0.7: Need only 51% as many observations
          
Further Reading
  • Duflo, Glennerster, & Kremer (2007): Sections on power analysis in experimental economics
  • Optimal Design: Free software for power analysis: optimal-design
  • EGAP Methods Guide: Power analysis for experiments: EGAP Power Guide