5B.4 Balance Checks
Table of Contents
Why Check Balance?
Randomization should create treatment and control groups that are similar on average. A balance table compares baseline characteristics across groups to verify this.
- Detects implementation errors (e.g., randomization code bugs)
- Identifies accidental imbalances that may need adjustment
- Required in most experimental papers (typically Table 1)
- Builds credibility that treatment effect isn't confounded
Creating Balance Tables
# Python: Balance table
import pandas as pd
import numpy as np
from scipy import stats
def balance_table(df, treatment_col, covariates):
"""Create a balance table comparing treatment vs control."""
results = []
for var in covariates:
treat = df[df[treatment_col] == 1][var].dropna()
control = df[df[treatment_col] == 0][var].dropna()
# Means
mean_t = treat.mean()
mean_c = control.mean()
diff = mean_t - mean_c
# T-test for difference
t_stat, p_val = stats.ttest_ind(treat, control)
# Standardized difference
pooled_sd = np.sqrt((treat.var() + control.var()) / 2)
std_diff = diff / pooled_sd if pooled_sd > 0 else 0
results.append({
'Variable': var,
'Control Mean': f"{mean_c:.3f}",
'Treatment Mean': f"{mean_t:.3f}",
'Difference': f"{diff:.3f}",
'Std. Diff.': f"{std_diff:.3f}",
'P-value': f"{p_val:.3f}"
})
return pd.DataFrame(results)
# Create sample data
np.random.seed(42)
n = 200
df = pd.DataFrame({
'treatment': np.random.binomial(1, 0.5, n),
'age': np.random.normal(35, 10, n),
'income': np.random.normal(50000, 15000, n),
'education': np.random.normal(14, 2, n),
'female': np.random.binomial(1, 0.5, n)
})
# Generate balance table
covariates = ['age', 'income', 'education', 'female']
balance = balance_table(df, 'treatment', covariates)
print(balance.to_string(index=False))
* Stata: Balance table with iebaltab (World Bank's ietoolkit)
* Install: ssc install ietoolkit
* Basic balance table
iebaltab age income education female, ///
grpvar(treatment) ///
save("balance_table.xlsx") replace
* With additional options
iebaltab age income education female, ///
grpvar(treatment) ///
ftest ///
stdev ///
starlevels(0.1 0.05 0.01) ///
save("balance_table.xlsx") replace
* Manual approach
foreach var in age income education female {
ttest `var', by(treatment)
}
# R: Balance table with cobalt package
library(cobalt)
# Create sample data
set.seed(42)
n <- 200
df <- data.frame(
treatment = rbinom(n, 1, 0.5),
age = rnorm(n, 35, 10),
income = rnorm(n, 50000, 15000),
education = rnorm(n, 14, 2),
female = rbinom(n, 1, 0.5)
)
# Create balance table
balance <- bal.tab(
treatment ~ age + income + education + female,
data = df,
binary = "std" # Standardize binary variables
)
print(balance)
Variable Control Mean Treatment Mean Difference Std. Diff. P-value
age 34.821 35.432 0.611 0.062 0.571
income 49823.456 50412.789 589.333 0.041 0.712
education 13.912 14.087 0.175 0.089 0.423
female 0.485 0.520 0.035 0.070 0.593
. iebaltab age income education female, grpvar(treatment) ftest stdev
Balance table
(1) (2)
Control Treatment Diff P-value
Mean Mean (2)-(1)
[SD] [SD]
--------------------------------------------------------------------
age 34.82 35.43 0.61 0.571
[9.84] [9.92]
income 49823.46 50412.79 589.33 0.712
[14523.12] [14812.45]
education 13.91 14.09 0.18 0.423
[1.98] [1.95]
female 0.49 0.52 0.04 0.593
[0.50] [0.50]
--------------------------------------------------------------------
N 98 102
F-test of joint significance: F(4, 195) = 0.23, p = 0.921
Balance table saved to: balance_table.xlsx
Balance Measures
Type Diff.Un
age Contin. 0.0621
income Contin. 0.0408
education Contin. 0.0892
female Binary 0.0700
Sample sizes
Control Treated
All 98 102
Effective sample sizes
Control Treated
All 98 102
Interpreting Balance
What to Look For
| Metric | Good Balance | Concern |
|---|---|---|
| P-values | Most > 0.05, uniformly distributed | Many < 0.05, or one very small |
| Standardized differences | |d| < 0.1 | |d| > 0.25 |
| Joint F-test | p > 0.05 | p < 0.05 |
With many variables, some will be "significantly" different by chance (5% of tests at α = 0.05). Don't panic about one or two significant differences. Focus on: (1) overall pattern, (2) large standardized differences, (3) theoretically important variables.
Attrition Analysis
Attrition occurs when participants drop out or don't complete the study. Differential attrition (more dropouts in one group) can bias results.
Checking for Differential Attrition
# Python: Attrition analysis
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
# Create sample data with some attrition
np.random.seed(42)
n = 200
df = pd.DataFrame({
'treatment': np.random.binomial(1, 0.5, n),
'age': np.random.normal(35, 10, n),
'outcome': np.random.normal(100, 15, n)
})
# Introduce some attrition (15% overall, slightly higher in treatment)
df.loc[np.random.choice(df[df['treatment']==1].index, 18), 'outcome'] = np.nan
df.loc[np.random.choice(df[df['treatment']==0].index, 12), 'outcome'] = np.nan
# 1. Overall attrition rate by treatment
df['completed'] = df['outcome'].notna().astype(int)
attrition = df.groupby('treatment')['completed'].mean()
print("Completion rates by treatment:")
print(attrition)
# 2. Test if attrition differs by treatment
contingency = pd.crosstab(df['treatment'], df['completed'])
chi2, p, dof, expected = chi2_contingency(contingency)
print(f"\nChi-squared test for differential attrition:")
print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
* Stata: Attrition analysis
* 1. Completion rates
gen completed = !missing(outcome)
tab treatment completed, row
* 2. Test for differential attrition
prtest completed, by(treatment)
* 3. Lee bounds for treatment effects under selection
* (Bounds on treatment effect accounting for differential attrition)
* ssc install leebounds
leebounds outcome treatment
* 4. Balance among completers vs full sample
iebaltab age income education female if completed==1, ///
grpvar(treatment) save("balance_completers.xlsx") replace
# R: Attrition analysis
set.seed(42)
n <- 200
# Create sample data with some attrition
df <- data.frame(
treatment = rbinom(n, 1, 0.5),
age = rnorm(n, 35, 10),
outcome = rnorm(n, 100, 15)
)
# Introduce attrition
df$outcome[sample(which(df$treatment == 1), 18)] <- NA
df$outcome[sample(which(df$treatment == 0), 12)] <- NA
# 1. Completion rates by treatment
df$completed <- !is.na(df$outcome)
print("Completion by treatment:")
print(table(df$treatment, df$completed))
# 2. Test for differential attrition
result <- prop.test(table(df$treatment, df$completed))
print(result)
Completion rates by treatment: treatment 0 0.877551 1 0.823529 Name: completed, dtype: float64 Chi-squared test for differential attrition: Chi2 = 0.987, p-value = 0.3206
. gen completed = !missing(outcome)
. tab treatment completed, row
| completed
treatment | 0 1 | Total
-----------+----------------------+----------
0 | 12 86 | 98
| 12.24 87.76 | 100.00
-----------+----------------------+----------
1 | 18 84 | 102
| 17.65 82.35 | 100.00
-----------+----------------------+----------
Total | 30 170 | 200
| 15.00 85.00 | 100.00
. prtest completed, by(treatment)
Two-sample test of proportions 0: Number of obs = 98
1: Number of obs = 102
------------------------------------------------------------------------------
Group | Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
0 | .8775510 .0331389 .8126000 .9425021
1 | .8235294 .0377551 .7495310 .8975279
-------------+----------------------------------------------------------------
diff | .0540216 .0502584 -.0444830 .1525263
| under Ho: .0504892 1.07 0.285
------------------------------------------------------------------------------
diff = prop(0) - prop(1) z = 1.0700
Ho: diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(Z < z) = 0.8576 Pr(|Z| > |z|) = 0.2848 Pr(Z > z) = 0.1424
[1] "Completion by treatment:" FALSE TRUE 0 12 86 1 18 84 2-sample test for equality of proportions with continuity correction data: table(df$treatment, df$completed) X-squared = 0.98696, df = 1, p-value = 0.3206 alternative hypothesis: two.sided 95 percent confidence interval: -0.04686043 0.15503133 sample estimates: prop 1 prop 2 0.8775510 0.8235294
If you find imbalance or differential attrition:
1. Report it transparently in your paper
2. Control for imbalanced variables in your main specification
3. Show robustness with and without controls
4. Compute bounds on treatment effects (Lee bounds for attrition)
- Stata:
iebaltabfrom ietoolkit (ssc install ietoolkit) - R:
cobaltpackage,tableonepackage - Reference: Lee, D. S. (2009). "Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects." Review of Economic Studies.