6A Matching
Table of Contents
The Logic of Matching
The Fundamental Problem
We want to estimate the effect of a treatment D on outcome Y. The problem: treated and untreated units may differ systematically in ways that also affect Y. For example, people who participate in job training programs (treatment) may be more motivated—and motivation affects employment (outcome) regardless of training.
Matching solution: Find untreated units that "look like" treated units on observable characteristics X. If we can find comparison units with identical X values, comparing outcomes gives us a causal effect—provided there are no unmeasured confounders.
Visual Intuition: How Matching Works
Treated (red) and control (blue) differ systematically
Compare only similar units (matched pairs)
- Ho et al. (2007) - "Matching as Nonparametric Preprocessing" explains why matching is preprocessing, not a method itself
- Causal Inference: The Mixtape, Ch. 5 - Excellent intuitive treatment of matching
- Rosenbaum & Rubin (1983) - Original propensity score paper
Propensity Score Matching
The propensity score is the probability of treatment given observed covariates: e(X) = P(D=1|X). The key insight (Rosenbaum & Rubin, 1983) is that conditioning on the propensity score balances all covariates, reducing many dimensions to one.
The process has 4 steps: (1) Estimate propensity scores using logistic regression, (2) Match treated to control units with similar scores, (3) Verify balance improved on all covariates, (4) Estimate treatment effect on matched sample. Watch for overlap/common support—if propensity scores for treated and control don't overlap, matching is impossible.
# Python: Propensity Score Matching
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
# Step 1: Estimate propensity scores
X = df[['age', 'income', 'education']]
y = df['treatment']
model = LogisticRegression(max_iter=1000)
model.fit(X, y)
df['pscore'] = model.predict_proba(X)[:, 1]
# Step 2: Match treated to nearest control
treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(control[['pscore']])
distances, indices = nn.kneighbors(treated[['pscore']])
# Step 3: Create matched sample
matched_control = control.iloc[indices.flatten()]
matched_df = pd.concat([treated.reset_index(drop=True),
matched_control.reset_index(drop=True)],
keys=['treated', 'control'])
# Step 4: Estimate ATT
att = treated['outcome'].mean() - matched_control['outcome'].mean()
print(f"ATT: {att:.3f}")
* Stata: Propensity Score Matching with teffects
* Method 1: teffects psmatch (built-in, recommended)
teffects psmatch (outcome) (treatment age income education), ///
atet nn(1)
* With caliper (maximum allowed distance)
teffects psmatch (outcome) (treatment age income education), ///
atet nn(1) caliper(0.1)
* Method 2: psmatch2 (community-contributed, more options)
* ssc install psmatch2
psmatch2 treatment age income education, ///
outcome(outcome) logit common
* Check balance after matching
pstest age income education, treated(treatment)
# R: Propensity Score Matching with MatchIt
library(MatchIt)
library(cobalt)
# Estimate propensity scores and match
m.out <- matchit(treatment ~ age + income + education,
data = df,
method = "nearest",
distance = "glm", # logistic regression
caliper = 0.1)
# Check match quality
summary(m.out)
# Extract matched data
matched_df <- match.data(m.out)
# Estimate ATT on matched sample
att_model <- lm(outcome ~ treatment, data = matched_df,
weights = weights)
summary(att_model)
Interpreting PSM Results
Example output: ATT: 0.342 (SE: 0.089)
This means: among those who received treatment, the average effect was a 0.342 unit increase in the outcome compared to what they would have experienced without treatment. The ATT (Average Treatment effect on the Treated) is the relevant estimand when you're evaluating a voluntary program—we want to know whether the program helped those who participated.
Key checks: (1) How many treated units were matched? (2) What's the average distance between matched pairs? (3) Did covariate balance improve? If you lost many treated units to trimming, your estimate applies only to the "matchable" population, not all treated units.
Coarsened Exact Matching
CEM (Iacus et al., 2012) temporarily coarsens covariates into bins, performs exact matching within bins, then uses the original (non-coarsened) data for analysis. This avoids propensity score model dependence and guarantees a maximum imbalance bound.
Use CEM when you have a clear sense of what covariate values should be "close enough" (e.g., ages 30-35 are comparable). CEM discards more data but provides stronger balance guarantees. Use PSM when covariates are numerous or continuous without natural groupings.
# Python: Coarsened Exact Matching
import pandas as pd
import numpy as np
# Step 1: Coarsen variables into bins
df['age_bin'] = pd.cut(df['age'], bins=[18, 30, 45, 60, 100])
df['income_bin'] = pd.cut(df['income'], bins=5)
df['edu_bin'] = df['education'] # already categorical
# Step 2: Create strata
df['strata'] = df.groupby(['age_bin', 'income_bin', 'edu_bin']).ngroup()
# Step 3: Keep only matched strata (both treated and control)
strata_counts = df.groupby(['strata', 'treatment']).size().unstack(fill_value=0)
matched_strata = strata_counts[(strata_counts[0] > 0) & (strata_counts[1] > 0)].index
df_matched = df[df['strata'].isin(matched_strata)]
# Step 4: Calculate CEM weights
strata_weights = df_matched.groupby(['strata', 'treatment']).size().unstack()
df_matched['cem_weight'] = df_matched.apply(
lambda row: 1 / strata_weights.loc[row['strata'], row['treatment']], axis=1
)
print(f"Matched: {len(df_matched)} of {len(df)} observations")
* Stata: Coarsened Exact Matching
* ssc install cem
* Basic CEM with automatic coarsening
cem age income education, treatment(treatment)
* Custom cutpoints for coarsening
cem age (#5) income (#4) education, treatment(treatment)
* #5 means 5 equal-sized bins
* Estimate treatment effect with CEM weights
reg outcome treatment age income education [pw=cem_weights]
* Check match quality
imb age income education, treatment(treatment)
# R: Coarsened Exact Matching with MatchIt
library(MatchIt)
# CEM matching
m.cem <- matchit(treatment ~ age + income + education,
data = df,
method = "cem")
# With custom cutpoints
m.cem <- matchit(treatment ~ age + income + education,
data = df,
method = "cem",
cutpoints = list(age = c(30, 45, 60),
income = "q4")) # quartiles
# Extract matched data with weights
matched_df <- match.data(m.cem)
# Estimate ATT using weights
att_model <- lm(outcome ~ treatment + age + income + education,
data = matched_df,
weights = weights)
summary(att_model)
Nearest Neighbor Matching
Match on Mahalanobis distance (accounts for covariance) or Euclidean distance directly on covariates.
# Python: Mahalanobis Distance Matching
from scipy.spatial.distance import cdist
import numpy as np
covariates = ['age', 'income', 'education']
X_treated = df.loc[df['treatment'] == 1, covariates].values
X_control = df.loc[df['treatment'] == 0, covariates].values
# Calculate Mahalanobis distance matrix
cov_matrix = np.cov(df[covariates].values.T)
VI = np.linalg.inv(cov_matrix)
distances = cdist(X_treated, X_control, metric='mahalanobis', VI=VI)
# Find nearest neighbor for each treated
matches = np.argmin(distances, axis=1)
# Extract matched control outcomes
control_indices = df[df['treatment'] == 0].index
matched_outcomes = df.loc[control_indices[matches], 'outcome'].values
# ATT
att = df.loc[df['treatment'] == 1, 'outcome'].mean() - matched_outcomes.mean()
* Stata: Mahalanobis Distance Matching
* Using teffects nnmatch
teffects nnmatch (outcome age income education) (treatment), ///
atet metric(maha) nn(1)
* With bias adjustment (Abadie-Imbens)
teffects nnmatch (outcome age income education) (treatment), ///
atet metric(maha) nn(3) biasadj(age income education)
* Multiple matches with replacement
teffects nnmatch (outcome age income education) (treatment), ///
atet metric(maha) nn(5)
# R: Mahalanobis Distance Matching
library(MatchIt)
# Mahalanobis distance matching
m.maha <- matchit(treatment ~ age + income + education,
data = df,
method = "nearest",
distance = "mahalanobis")
# Multiple matches (k:1)
m.maha <- matchit(treatment ~ age + income + education,
data = df,
method = "nearest",
distance = "mahalanobis",
ratio = 3) # 3 controls per treated
# With caliper on propensity score
m.maha <- matchit(treatment ~ age + income + education,
data = df,
method = "nearest",
distance = "mahalanobis",
mahvars = ~ age + income + education,
caliper = 0.25,
std.caliper = TRUE)
Balance Diagnostics
Always check that matching improved covariate balance. Report standardized mean differences (SMD) and variance ratios.
# Python: Balance Diagnostics
import matplotlib.pyplot as plt
def calc_smd(df, var, treatment_col):
"""Calculate standardized mean difference."""
treated = df[df[treatment_col] == 1][var]
control = df[df[treatment_col] == 0][var]
pooled_std = np.sqrt((treated.var() + control.var()) / 2)
return (treated.mean() - control.mean()) / pooled_std
# Before and after matching
covariates = ['age', 'income', 'education']
smd_before = [calc_smd(df, v, 'treatment') for v in covariates]
smd_after = [calc_smd(matched_df, v, 'treatment') for v in covariates]
# Love plot
fig, ax = plt.subplots(figsize=(8, 5))
y_pos = np.arange(len(covariates))
ax.scatter(smd_before, y_pos, marker='o', label='Before', s=100)
ax.scatter(smd_after, y_pos, marker='s', label='After', s=100)
ax.axvline(x=0.1, color='r', linestyle='--', alpha=0.5)
ax.axvline(x=-0.1, color='r', linestyle='--', alpha=0.5)
ax.set_yticks(y_pos)
ax.set_yticklabels(covariates)
ax.set_xlabel('Standardized Mean Difference')
ax.legend()
plt.tight_layout()
plt.show()
* Stata: Balance Diagnostics
* After psmatch2
pstest age income education, treated(treatment) both graph
* Standardized differences table
tebalance summarize
* Density plots of propensity scores
tebalance density
* Overidentification test for balance
tebalance overid
# R: Balance Diagnostics with cobalt
library(cobalt)
# Balance table
bal.tab(m.out, un = TRUE, thresholds = c(m = 0.1))
# Love plot (visual balance check)
love.plot(m.out,
binary = "std",
thresholds = c(m = 0.1),
var.order = "unadjusted")
# Density plots
bal.plot(m.out, var.name = "age", which = "both")
# Full balance summary
summary(m.out)
Interpreting Balance Diagnostics
Standardized Mean Difference (SMD): The difference in means between treated and control, divided by the pooled standard deviation. Rule of thumb: SMD < 0.1 indicates good balance; SMD < 0.25 is acceptable.
Variance Ratio: Ratio of variances between groups. Should be close to 1.0 (range 0.5-2.0 is acceptable).
What to report: Always show a balance table or Love plot comparing before/after matching. If balance is poor on key covariates, your matching has failed and you need to: (1) try different matching specifications, (2) include additional covariates, or (3) acknowledge the limitation.
Doubly Robust Estimation
Doubly robust methods (Robins et al., 1994) combine propensity score weighting with outcome regression. The estimator is consistent if either the propensity score model OR the outcome model is correctly specified—you get "two chances" to get it right.
Standard matching relies entirely on the propensity score model. If it's misspecified, your estimate is biased. Doubly robust methods add an outcome model as insurance. With modern machine learning (SuperLearner, random forests), you can flexibly estimate both models without imposing restrictive functional forms.
# Python: Doubly Robust Estimation (AIPW)
from sklearn.linear_model import LogisticRegression, LinearRegression
import numpy as np
covariates = ['age', 'income', 'education']
X = df[covariates].values
D = df['treatment'].values
Y = df['outcome'].values
# Step 1: Estimate propensity scores
ps_model = LogisticRegression(max_iter=1000)
ps_model.fit(X, D)
pscore = ps_model.predict_proba(X)[:, 1]
# Step 2: Estimate outcome models
mu1_model = LinearRegression()
mu0_model = LinearRegression()
mu1_model.fit(X[D == 1], Y[D == 1])
mu0_model.fit(X[D == 0], Y[D == 0])
mu1 = mu1_model.predict(X) # E[Y(1)|X]
mu0 = mu0_model.predict(X) # E[Y(0)|X]
# Step 3: AIPW estimator
aipw_1 = mu1 + D * (Y - mu1) / pscore
aipw_0 = mu0 + (1 - D) * (Y - mu0) / (1 - pscore)
ate_aipw = np.mean(aipw_1 - aipw_0)
print(f"ATE (AIPW): {ate_aipw:.3f}")
* Stata: Doubly Robust with teffects
* AIPW (Augmented IPW)
teffects aipw (outcome age income education) ///
(treatment age income education), atet
* IPWRA (IPW + Regression Adjustment)
teffects ipwra (outcome age income education) ///
(treatment age income education), atet
* With different link functions
teffects aipw (outcome age income education, poisson) ///
(treatment age income education, probit), atet
# R: Doubly Robust with AIPW package
library(AIPW)
# Basic AIPW
aipw_obj <- AIPW$new(
Y = df$outcome,
A = df$treatment,
W = df[, c("age", "income", "education")],
Q.SL.library = "SL.glm", # outcome model
g.SL.library = "SL.glm" # propensity model
)
# Fit and summarize
aipw_obj$fit()
aipw_obj$summary()
# With SuperLearner (ML ensemble)
library(SuperLearner)
aipw_ml <- AIPW$new(
Y = df$outcome,
A = df$treatment,
W = df[, c("age", "income", "education")],
Q.SL.library = c("SL.glm", "SL.ranger", "SL.xgboost"),
g.SL.library = c("SL.glm", "SL.ranger")
)
aipw_ml$fit()
aipw_ml$summary()
All matching methods assume no unmeasured confounders—that once you condition on observed covariates, treatment assignment is as good as random. This is untestable. Use sensitivity analysis (e.g., Rosenbaum bounds) to assess robustness to hidden bias.
- Iacus, S., King, G., & Porro, G. (2012). "Causal Inference without Balance Checking: Coarsened Exact Matching." Political Analysis.
- Ho, D., Imai, K., King, G., & Stuart, E. (2007). "Matching as Nonparametric Preprocessing." Political Analysis.
- Abadie, A. & Imbens, G. (2006). "Large Sample Properties of Matching Estimators." Econometrica.