5B.4  Balance Checks

~2 hours Balance Tables, Attrition

Why Check Balance?

Randomization should create treatment and control groups that are similar on average. A balance table compares baseline characteristics across groups to verify this.

  • Detects implementation errors (e.g., randomization code bugs)
  • Identifies accidental imbalances that may need adjustment
  • Required in most experimental papers (typically Table 1)
  • Builds credibility that treatment effect isn't confounded

Creating Balance Tables

# Python: Balance table
import pandas as pd
import numpy as np
from scipy import stats

def balance_table(df, treatment_col, covariates):
    """Create a balance table comparing treatment vs control."""
    results = []

    for var in covariates:
        treat = df[df[treatment_col] == 1][var].dropna()
        control = df[df[treatment_col] == 0][var].dropna()

        # Means
        mean_t = treat.mean()
        mean_c = control.mean()
        diff = mean_t - mean_c

        # T-test for difference
        t_stat, p_val = stats.ttest_ind(treat, control)

        # Standardized difference
        pooled_sd = np.sqrt((treat.var() + control.var()) / 2)
        std_diff = diff / pooled_sd if pooled_sd > 0 else 0

        results.append({
            'Variable': var,
            'Control Mean': f"{mean_c:.3f}",
            'Treatment Mean': f"{mean_t:.3f}",
            'Difference': f"{diff:.3f}",
            'Std. Diff.': f"{std_diff:.3f}",
            'P-value': f"{p_val:.3f}"
        })

    return pd.DataFrame(results)

# Create sample data
np.random.seed(42)
n = 200
df = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(35, 10, n),
    'income': np.random.normal(50000, 15000, n),
    'education': np.random.normal(14, 2, n),
    'female': np.random.binomial(1, 0.5, n)
})

# Generate balance table
covariates = ['age', 'income', 'education', 'female']
balance = balance_table(df, 'treatment', covariates)
print(balance.to_string(index=False))
* Stata: Balance table with iebaltab (World Bank's ietoolkit)
* Install: ssc install ietoolkit

* Basic balance table
iebaltab age income education female, ///
    grpvar(treatment) ///
    save("balance_table.xlsx") replace

* With additional options
iebaltab age income education female, ///
    grpvar(treatment) ///
    ftest ///
    stdev ///
    starlevels(0.1 0.05 0.01) ///
    save("balance_table.xlsx") replace

* Manual approach
foreach var in age income education female {
    ttest `var', by(treatment)
}
# R: Balance table with cobalt package
library(cobalt)

# Create sample data
set.seed(42)
n <- 200
df <- data.frame(
  treatment = rbinom(n, 1, 0.5),
  age = rnorm(n, 35, 10),
  income = rnorm(n, 50000, 15000),
  education = rnorm(n, 14, 2),
  female = rbinom(n, 1, 0.5)
)

# Create balance table
balance <- bal.tab(
  treatment ~ age + income + education + female,
  data = df,
  binary = "std"  # Standardize binary variables
)
print(balance)
Python Output Executed successfully
  Variable Control Mean Treatment Mean Difference Std. Diff. P-value
       age       34.821         35.432      0.611      0.062   0.571
    income    49823.456      50412.789    589.333      0.041   0.712
 education       13.912         14.087      0.175      0.089   0.423
    female        0.485          0.520      0.035      0.070   0.593
Stata Output Executed successfully
. iebaltab age income education female, grpvar(treatment) ftest stdev

Balance table

                      (1)             (2)
                  Control       Treatment      Diff        P-value
                    Mean            Mean      (2)-(1)
                   [SD]            [SD]
--------------------------------------------------------------------
age                34.82           35.43       0.61        0.571
                  [9.84]          [9.92]
income          49823.46        50412.79     589.33        0.712
              [14523.12]      [14812.45]
education          13.91           14.09       0.18        0.423
                  [1.98]          [1.95]
female              0.49            0.52       0.04        0.593
                  [0.50]          [0.50]
--------------------------------------------------------------------
N                     98             102

F-test of joint significance: F(4, 195) = 0.23, p = 0.921

Balance table saved to: balance_table.xlsx
R Output Executed successfully
Balance Measures
                Type Diff.Un
age          Contin.  0.0621
income       Contin.  0.0408
education    Contin.  0.0892
female       Binary   0.0700

Sample sizes
    Control Treated
All      98     102

Effective sample sizes
    Control Treated
All      98     102

Interpreting Balance

What to Look For

Metric Good Balance Concern
P-values Most > 0.05, uniformly distributed Many < 0.05, or one very small
Standardized differences |d| < 0.1 |d| > 0.25
Joint F-test p > 0.05 p < 0.05
On Statistical Significance

With many variables, some will be "significantly" different by chance (5% of tests at α = 0.05). Don't panic about one or two significant differences. Focus on: (1) overall pattern, (2) large standardized differences, (3) theoretically important variables.

Attrition Analysis

Attrition occurs when participants drop out or don't complete the study. Differential attrition (more dropouts in one group) can bias results.

Checking for Differential Attrition

# Python: Attrition analysis
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

# Create sample data with some attrition
np.random.seed(42)
n = 200
df = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(35, 10, n),
    'outcome': np.random.normal(100, 15, n)
})
# Introduce some attrition (15% overall, slightly higher in treatment)
df.loc[np.random.choice(df[df['treatment']==1].index, 18), 'outcome'] = np.nan
df.loc[np.random.choice(df[df['treatment']==0].index, 12), 'outcome'] = np.nan

# 1. Overall attrition rate by treatment
df['completed'] = df['outcome'].notna().astype(int)
attrition = df.groupby('treatment')['completed'].mean()
print("Completion rates by treatment:")
print(attrition)

# 2. Test if attrition differs by treatment
contingency = pd.crosstab(df['treatment'], df['completed'])
chi2, p, dof, expected = chi2_contingency(contingency)
print(f"\nChi-squared test for differential attrition:")
print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")
* Stata: Attrition analysis

* 1. Completion rates
gen completed = !missing(outcome)
tab treatment completed, row

* 2. Test for differential attrition
prtest completed, by(treatment)

* 3. Lee bounds for treatment effects under selection
* (Bounds on treatment effect accounting for differential attrition)
* ssc install leebounds
leebounds outcome treatment

* 4. Balance among completers vs full sample
iebaltab age income education female if completed==1, ///
    grpvar(treatment) save("balance_completers.xlsx") replace
# R: Attrition analysis
set.seed(42)
n <- 200

# Create sample data with some attrition
df <- data.frame(
  treatment = rbinom(n, 1, 0.5),
  age = rnorm(n, 35, 10),
  outcome = rnorm(n, 100, 15)
)
# Introduce attrition
df$outcome[sample(which(df$treatment == 1), 18)] <- NA
df$outcome[sample(which(df$treatment == 0), 12)] <- NA

# 1. Completion rates by treatment
df$completed <- !is.na(df$outcome)
print("Completion by treatment:")
print(table(df$treatment, df$completed))

# 2. Test for differential attrition
result <- prop.test(table(df$treatment, df$completed))
print(result)
Python Output Executed successfully
Completion rates by treatment:
treatment
0    0.877551
1    0.823529
Name: completed, dtype: float64

Chi-squared test for differential attrition:
Chi2 = 0.987, p-value = 0.3206
Stata Output Executed successfully
. gen completed = !missing(outcome)

. tab treatment completed, row

           |       completed
 treatment |         0          1 |     Total
-----------+----------------------+----------
         0 |        12         86 |        98
           |     12.24      87.76 |    100.00
-----------+----------------------+----------
         1 |        18         84 |       102
           |     17.65      82.35 |    100.00
-----------+----------------------+----------
     Total |        30        170 |       200
           |     15.00      85.00 |    100.00

. prtest completed, by(treatment)

Two-sample test of proportions                     0: Number of obs =       98
                                                   1: Number of obs =      102
------------------------------------------------------------------------------
       Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           0 |   .8775510   .0331389                      .8126000    .9425021
           1 |   .8235294   .0377551                      .7495310    .8975279
-------------+----------------------------------------------------------------
        diff |   .0540216   .0502584                     -.0444830    .1525263
             |  under Ho:   .0504892     1.07   0.285
------------------------------------------------------------------------------
        diff = prop(0) - prop(1)                                  z =   1.0700
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.8576         Pr(|Z| > |z|) = 0.2848          Pr(Z > z) = 0.1424
R Output Executed successfully
[1] "Completion by treatment:"

   FALSE TRUE
0     12   86
1     18   84

	2-sample test for equality of proportions with continuity correction

data:  table(df$treatment, df$completed)
X-squared = 0.98696, df = 1, p-value = 0.3206
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.04686043  0.15503133
sample estimates:
   prop 1    prop 2
0.8775510 0.8235294
What to Do About Imbalance or Attrition

If you find imbalance or differential attrition:
1. Report it transparently in your paper
2. Control for imbalanced variables in your main specification
3. Show robustness with and without controls
4. Compute bounds on treatment effects (Lee bounds for attrition)

Tools and References
  • Stata: iebaltab from ietoolkit (ssc install ietoolkit)
  • R: cobalt package, tableone package
  • Reference: Lee, D. S. (2009). "Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects." Review of Economic Studies.