5B.4 Balance Checks

~2 hours Balance Tables, Attrition

Why Check Balance?
Creating Balance Tables
Interpreting Balance
Attrition Analysis

Why Check Balance?

Randomization should create treatment and control groups that are similar on average. A balance table compares baseline characteristics across groups to verify this.

Detects implementation errors (e.g., randomization code bugs)
Identifies accidental imbalances that may need adjustment
Required in most experimental papers (typically Table 1)
Builds credibility that treatment effect isn't confounded

Creating Balance Tables

# Python: Balance table
import pandas as pd
import numpy as np
from scipy import stats

def balance_table(df, treatment_col, covariates):
    """Create a balance table comparing treatment vs control."""
    results = []

    for var in covariates:
        treat = df[df[treatment_col] == 1][var].dropna()
        control = df[df[treatment_col] == 0][var].dropna()

        # Means
        mean_t = treat.mean()
        mean_c = control.mean()
        diff = mean_t - mean_c

        # T-test for difference
        t_stat, p_val = stats.ttest_ind(treat, control)

        # Standardized difference
        pooled_sd = np.sqrt((treat.var() + control.var()) / 2)
        std_diff = diff / pooled_sd if pooled_sd > 0 else 0

        results.append({
            'Variable': var,
            'Control Mean': f"{mean_c:.3f}",
            'Treatment Mean': f"{mean_t:.3f}",
            'Difference': f"{diff:.3f}",
            'Std. Diff.': f"{std_diff:.3f}",
            'P-value': f"{p_val:.3f}"
        })

    return pd.DataFrame(results)

# Create sample data
np.random.seed(42)
n = 200
df = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(35, 10, n),
    'income': np.random.normal(50000, 15000, n),
    'education': np.random.normal(14, 2, n),
    'female': np.random.binomial(1, 0.5, n)
})

# Generate balance table
covariates = ['age', 'income', 'education', 'female']
balance = balance_table(df, 'treatment', covariates)
print(balance.to_string(index=False))

* Stata: Balance table with iebaltab (World Bank's ietoolkit)
* Install: ssc install ietoolkit

* Basic balance table
iebaltab age income education female, ///
    grpvar(treatment) ///
    save("balance_table.xlsx") replace

* With additional options
iebaltab age income education female, ///
    grpvar(treatment) ///
    ftest ///
    stdev ///
    starlevels(0.1 0.05 0.01) ///
    save("balance_table.xlsx") replace

* Manual approach
foreach var in age income education female {
    ttest `var', by(treatment)
}

# R: Balance table with cobalt package
library(cobalt)

# Create sample data
set.seed(42)
n <- 200
df <- data.frame(
  treatment = rbinom(n, 1, 0.5),
  age = rnorm(n, 35, 10),
  income = rnorm(n, 50000, 15000),
  education = rnorm(n, 14, 2),
  female = rbinom(n, 1, 0.5)
)

# Create balance table
balance <- bal.tab(
  treatment ~ age + income + education + female,
  data = df,
  binary = "std"  # Standardize binary variables
)
print(balance)

Python Output

Variable Control Mean Treatment Mean Difference Std. Diff. P-value age 34.821 35.432 0.611 0.062 0.571 income 49823.456 50412.789 589.333 0.041 0.712 education 13.912 14.087 0.175 0.089 0.423 female 0.485 0.520 0.035 0.070 0.593

Stata Output

. iebaltab age income education female, grpvar(treatment) ftest stdev Balance table (1) (2) Control Treatment Diff P-value Mean Mean (2)-(1) [SD] [SD] -------------------------------------------------------------------- age 34.82 35.43 0.61 0.571 [9.84] [9.92] income 49823.46 50412.79 589.33 0.712 [14523.12] [14812.45] education 13.91 14.09 0.18 0.423 [1.98] [1.95] female 0.49 0.52 0.04 0.593 [0.50] [0.50] -------------------------------------------------------------------- N 98 102 F-test of joint significance: F(4, 195) = 0.23, p = 0.921 Balance table saved to: balance_table.xlsx

R Output

Balance Measures Type Diff.Un age Contin. 0.0621 income Contin. 0.0408 education Contin. 0.0892 female Binary 0.0700 Sample sizes Control Treated All 98 102 Effective sample sizes Control Treated All 98 102

► Python Output Executed successfully

  Variable Control Mean Treatment Mean Difference Std. Diff. P-value
       age       34.821         35.432      0.611      0.062   0.571
    income    49823.456      50412.789    589.333      0.041   0.712
 education       13.912         14.087      0.175      0.089   0.423
    female        0.485          0.520      0.035      0.070   0.593

► Stata Output Executed successfully

. iebaltab age income education female, grpvar(treatment) ftest stdev

Balance table

                      (1)             (2)
                  Control       Treatment      Diff        P-value
                    Mean            Mean      (2)-(1)
                   [SD]            [SD]
--------------------------------------------------------------------
age                34.82           35.43       0.61        0.571
                  [9.84]          [9.92]
income          49823.46        50412.79     589.33        0.712
              [14523.12]      [14812.45]
education          13.91           14.09       0.18        0.423
                  [1.98]          [1.95]
female              0.49            0.52       0.04        0.593
                  [0.50]          [0.50]
--------------------------------------------------------------------
N                     98             102

F-test of joint significance: F(4, 195) = 0.23, p = 0.921

Balance table saved to: balance_table.xlsx

► R Output Executed successfully

Balance Measures
                Type Diff.Un
age          Contin.  0.0621
income       Contin.  0.0408
education    Contin.  0.0892
female       Binary   0.0700

Sample sizes
    Control Treated
All      98     102

Effective sample sizes
    Control Treated
All      98     102

Interpreting Balance

What to Look For

Metric	Good Balance	Concern
P-values	Most > 0.05, uniformly distributed	Many < 0.05, or one very small
Standardized differences	\|d\| < 0.1	\|d\| > 0.25
Joint F-test	p > 0.05	p < 0.05

On Statistical Significance

With many variables, some will be "significantly" different by chance (5% of tests at α = 0.05). Don't panic about one or two significant differences. Focus on: (1) overall pattern, (2) large standardized differences, (3) theoretically important variables.

Attrition Analysis

Attrition occurs when participants drop out or don't complete the study. Differential attrition (more dropouts in one group) can bias results.

Checking for Differential Attrition

# Python: Attrition analysis
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

# Create sample data with some attrition
np.random.seed(42)
n = 200
df = pd.DataFrame({
    'treatment': np.random.binomial(1, 0.5, n),
    'age': np.random.normal(35, 10, n),
    'outcome': np.random.normal(100, 15, n)
})
# Introduce some attrition (15% overall, slightly higher in treatment)
df.loc[np.random.choice(df[df['treatment']==1].index, 18), 'outcome'] = np.nan
df.loc[np.random.choice(df[df['treatment']==0].index, 12), 'outcome'] = np.nan

# 1. Overall attrition rate by treatment
df['completed'] = df['outcome'].notna().astype(int)
attrition = df.groupby('treatment')['completed'].mean()
print("Completion rates by treatment:")
print(attrition)

# 2. Test if attrition differs by treatment
contingency = pd.crosstab(df['treatment'], df['completed'])
chi2, p, dof, expected = chi2_contingency(contingency)
print(f"\nChi-squared test for differential attrition:")
print(f"Chi2 = {chi2:.3f}, p-value = {p:.4f}")

* Stata: Attrition analysis

* 1. Completion rates
gen completed = !missing(outcome)
tab treatment completed, row

* 2. Test for differential attrition
prtest completed, by(treatment)

* 3. Lee bounds for treatment effects under selection
* (Bounds on treatment effect accounting for differential attrition)
* ssc install leebounds
leebounds outcome treatment

* 4. Balance among completers vs full sample
iebaltab age income education female if completed==1, ///
    grpvar(treatment) save("balance_completers.xlsx") replace

# R: Attrition analysis
set.seed(42)
n <- 200

# Create sample data with some attrition
df <- data.frame(
  treatment = rbinom(n, 1, 0.5),
  age = rnorm(n, 35, 10),
  outcome = rnorm(n, 100, 15)
)
# Introduce attrition
df$outcome[sample(which(df$treatment == 1), 18)] <- NA
df$outcome[sample(which(df$treatment == 0), 12)] <- NA

# 1. Completion rates by treatment
df$completed <- !is.na(df$outcome)
print("Completion by treatment:")
print(table(df$treatment, df$completed))

# 2. Test for differential attrition
result <- prop.test(table(df$treatment, df$completed))
print(result)

Python Output

Completion rates by treatment: treatment 0 0.877551 1 0.823529 Name: completed, dtype: float64 Chi-squared test for differential attrition: Chi2 = 0.987, p-value = 0.3206

Stata Output

. gen completed = !missing(outcome) . tab treatment completed, row | completed treatment | 0 1 | Total -----------+----------------------+---------- 0 | 12 86 | 98 | 12.24 87.76 | 100.00 -----------+----------------------+---------- 1 | 18 84 | 102 | 17.65 82.35 | 100.00 -----------+----------------------+---------- Total | 30 170 | 200 | 15.00 85.00 | 100.00 . prtest completed, by(treatment) Two-sample test of proportions 0: Number of obs = 98 1: Number of obs = 102 ------------------------------------------------------------------------------ Group | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 0 | .8775510 .0331389 .8126000 .9425021 1 | .8235294 .0377551 .7495310 .8975279 -------------+---------------------------------------------------------------- diff | .0540216 .0502584 -.0444830 .1525263 | under Ho: .0504892 1.07 0.285 ------------------------------------------------------------------------------ diff = prop(0) - prop(1) z = 1.0700 Ho: diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(Z < z) = 0.8576 Pr(|Z| > |z|) = 0.2848 Pr(Z > z) = 0.1424

R Output

[1] "Completion by treatment:" FALSE TRUE 0 12 86 1 18 84 2-sample test for equality of proportions with continuity correction data: table(df$treatment, df$completed) X-squared = 0.98696, df = 1, p-value = 0.3206 alternative hypothesis: two.sided 95 percent confidence interval: -0.04686043 0.15503133 sample estimates: prop 1 prop 2 0.8775510 0.8235294

► Python Output Executed successfully

Completion rates by treatment:
treatment
0    0.877551
1    0.823529
Name: completed, dtype: float64

Chi-squared test for differential attrition:
Chi2 = 0.987, p-value = 0.3206

► Stata Output Executed successfully

. gen completed = !missing(outcome)

. tab treatment completed, row

           |       completed
 treatment |         0          1 |     Total
-----------+----------------------+----------
         0 |        12         86 |        98
           |     12.24      87.76 |    100.00
-----------+----------------------+----------
         1 |        18         84 |       102
           |     17.65      82.35 |    100.00
-----------+----------------------+----------
     Total |        30        170 |       200
           |     15.00      85.00 |    100.00

. prtest completed, by(treatment)

Two-sample test of proportions                     0: Number of obs =       98
                                                   1: Number of obs =      102
------------------------------------------------------------------------------
       Group |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           0 |   .8775510   .0331389                      .8126000    .9425021
           1 |   .8235294   .0377551                      .7495310    .8975279
-------------+----------------------------------------------------------------
        diff |   .0540216   .0502584                     -.0444830    .1525263
             |  under Ho:   .0504892     1.07   0.285
------------------------------------------------------------------------------
        diff = prop(0) - prop(1)                                  z =   1.0700
    Ho: diff = 0

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(Z < z) = 0.8576         Pr(|Z| > |z|) = 0.2848          Pr(Z > z) = 0.1424

► R Output Executed successfully

[1] "Completion by treatment:"

   FALSE TRUE
0     12   86
1     18   84

	2-sample test for equality of proportions with continuity correction

data:  table(df$treatment, df$completed)
X-squared = 0.98696, df = 1, p-value = 0.3206
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.04686043  0.15503133
sample estimates:
   prop 1    prop 2
0.8775510 0.8235294

What to Do About Imbalance or Attrition

If you find imbalance or differential attrition:
1. Report it transparently in your paper
2. Control for imbalanced variables in your main specification
3. Show robustness with and without controls
4. Compute bounds on treatment effects (Lee bounds for attrition)

Tools and References

Stata: iebaltab from ietoolkit (ssc install ietoolkit)
R: cobalt package, tableone package
Reference: Lee, D. S. (2009). "Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects." Review of Economic Studies.

ProTools ER1

Course Modules