6B Difference-in-Differences
Table of Contents
Classic 2x2 DiD
The simplest DiD compares changes in outcomes between a treatment group and control group before and after a policy change.
# Python: Classic 2x2 DiD
import pandas as pd
import statsmodels.formula.api as smf
# Data structure: unit, time, treated_group, post, outcome
# treated_group = 1 if unit is in treatment group
# post = 1 if period is after treatment
# Create interaction term
df['treat_post'] = df['treated_group'] * df['post']
# DiD regression
model = smf.ols('outcome ~ treated_group + post + treat_post', data=df).fit()
print(model.summary())
# With clustered standard errors
model_clustered = smf.ols('outcome ~ treated_group + post + treat_post',
data=df).fit(cov_type='cluster',
cov_kwds={'groups': df['unit_id']})
* Stata: Classic 2x2 DiD
* Basic DiD regression
reg outcome i.treated_group##i.post, cluster(unit_id)
* Equivalent with manually created interaction
gen treat_post = treated_group * post
reg outcome treated_group post treat_post, cluster(unit_id)
* With controls
reg outcome i.treated_group##i.post age income, cluster(unit_id)
# R: Classic 2x2 DiD
library(fixest)
# Basic DiD
did_model <- feols(outcome ~ treated_group * post,
data = df,
cluster = ~ unit_id)
summary(did_model)
# Using lm with sandwich standard errors
library(sandwich)
library(lmtest)
model <- lm(outcome ~ treated_group * post, data = df)
coeftest(model, vcov = vcovCL, cluster = ~ unit_id)
Two-Way Fixed Effects
With panel data, use unit and time fixed effects to control for time-invariant unit characteristics and common time shocks.
# Python: Two-Way Fixed Effects
import pandas as pd
from linearmodels.panel import PanelOLS
# Set panel index
df = df.set_index(['unit_id', 'year'])
# TWFE regression
model = PanelOLS.from_formula(
'outcome ~ treated + controls',
data=df,
entity_effects=True,
time_effects=True
)
results = model.fit(cov_type='clustered', cluster_entity=True)
print(results.summary)
* Stata: Two-Way Fixed Effects
* Set panel structure
xtset unit_id year
* TWFE with xtreg
xtreg outcome treated i.year, fe cluster(unit_id)
* Alternative: reghdfe (faster, more FE options)
* ssc install reghdfe
reghdfe outcome treated, absorb(unit_id year) cluster(unit_id)
# R: Two-Way Fixed Effects with fixest
library(fixest)
# TWFE regression
twfe_model <- feols(outcome ~ treated | unit_id + year,
data = df,
cluster = ~ unit_id)
summary(twfe_model)
# With covariates
twfe_model <- feols(outcome ~ treated + age + income | unit_id + year,
data = df,
cluster = ~ unit_id)
Event Study Design
Event studies estimate treatment effects for each period relative to treatment, allowing you to visualize pre-trends and dynamic effects.
# Python: Event Study
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from linearmodels.panel import PanelOLS
# Create relative time variable
df['rel_time'] = df['year'] - df['treatment_year']
# Create dummies for each relative time (excluding -1 as reference)
rel_time_dummies = pd.get_dummies(df['rel_time'], prefix='rel')
rel_time_dummies = rel_time_dummies.drop('rel_-1', axis=1) # reference period
df = pd.concat([df, rel_time_dummies], axis=1)
# Event study regression
rel_cols = [c for c in df.columns if c.startswith('rel_')]
formula = 'outcome ~ ' + ' + '.join(rel_cols)
df_panel = df.set_index(['unit_id', 'year'])
model = PanelOLS.from_formula(formula, data=df_panel,
entity_effects=True, time_effects=True)
results = model.fit(cov_type='clustered', cluster_entity=True)
# Plot event study coefficients
coefs = results.params[rel_cols]
ses = results.std_errors[rel_cols]
plt.figure(figsize=(10, 6))
plt.errorbar(range(len(coefs)), coefs, yerr=1.96*ses, fmt='o', capsize=3)
plt.axhline(y=0, color='r', linestyle='--')
plt.axvline(x=coefs.index.get_loc('rel_0')-0.5, color='gray', linestyle=':')
plt.xlabel('Periods Relative to Treatment')
plt.ylabel('Coefficient')
plt.title('Event Study')
plt.show()
* Stata: Event Study
* Create relative time
gen rel_time = year - treatment_year
* Event study with factor variables
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
* Using eventdd (event study visualization)
* ssc install eventdd
eventdd outcome, timevar(rel_time) ///
method(hdfe, absorb(unit_id year)) ///
cluster(unit_id) graph
* Alternative: coefplot for manual plotting
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
coefplot, keep(*.rel_time) vertical yline(0) xline(10.5)
# R: Event Study with fixest
library(fixest)
# Create relative time
df$rel_time <- df$year - df$treatment_year
# Event study with i() syntax
es_model <- feols(outcome ~ i(rel_time, ref = -1) | unit_id + year,
data = df,
cluster = ~ unit_id)
summary(es_model)
# Built-in event study plot
iplot(es_model,
xlab = "Periods Relative to Treatment",
main = "Event Study")
# Customized plot
iplot(es_model,
ci.col = "steelblue",
pt.pch = 19,
grid = TRUE)
Generated figures:
Staggered Adoption: The Problem
When units are treated at different times, standard TWFE can produce biased estimates because it uses already-treated units as controls. Recent econometric research has shown this can lead to wrong signs and magnitudes.
Standard TWFE implicitly compares:
- Newly-treated vs. never-treated (good)
- Newly-treated vs. not-yet-treated (good)
- Newly-treated vs. already-treated (problematic!)
The third comparison can produce negative weights, causing bias when treatment effects are heterogeneous across time or units.
Modern DiD Estimators
Several new estimators address the staggered treatment problem. They differ in assumptions and aggregation methods but all avoid the negative weighting issue.
Callaway-Sant'Anna (2021)
# Python: Callaway-Sant'Anna with csdid
# pip install csdid
from csdid import ATTgt
# Estimate group-time ATTs
att_gt = ATTgt(
data=df,
yname='outcome',
tname='year',
idname='unit_id',
gname='first_treat', # year of first treatment (0 if never)
control_group='nevertreated' # or 'notyettreated'
)
results = att_gt.fit()
# Aggregate to overall ATT
agg_overall = results.aggregate('simple')
print(agg_overall)
# Aggregate to event-study
agg_event = results.aggregate('event')
agg_event.plot()
* Stata: Callaway-Sant'Anna
* ssc install csdid
* Basic estimation
csdid outcome, ivar(unit_id) time(year) gvar(first_treat)
* Aggregate to overall ATT
csdid_stats simple
* Event study aggregation
csdid_stats event
* Plot event study
csdid_plot, style(rcap)
# R: Callaway-Sant'Anna with did package
library(did)
# Estimate group-time ATTs
out <- att_gt(
yname = "outcome",
tname = "year",
idname = "unit_id",
gname = "first_treat",
data = df,
control_group = "nevertreated" # or "notyettreated"
)
summary(out)
# Aggregate to simple ATT
agg_simple <- aggte(out, type = "simple")
summary(agg_simple)
# Event study aggregation
agg_es <- aggte(out, type = "dynamic")
ggdid(agg_es)
Generated figures:
de Chaisemartin-D'Haultfoeuille (2020)
* Stata: de Chaisemartin-D'Haultfoeuille
* ssc install did_multiplegt
* Basic estimation
did_multiplegt outcome unit_id year treatment, ///
robust_dynamic dynamic(5) placebo(3) ///
cluster(unit_id) breps(100)
* With controls
did_multiplegt outcome unit_id year treatment, ///
robust_dynamic dynamic(5) placebo(3) ///
controls(age income) cluster(unit_id)
# R: de Chaisemartin-D'Haultfoeuille with DIDmultiplegt
library(DIDmultiplegt)
# Basic estimation
result <- did_multiplegt(
df = df,
Y = "outcome",
G = "unit_id",
T = "year",
D = "treatment",
dynamic = 5,
placebo = 3,
cluster = "unit_id"
)
summary(result)
# Plot results
did_multiplegt_plot(result)
Sun-Abraham (2021)
* Stata: Sun-Abraham Interaction-Weighted Estimator
* ssc install eventstudyinteract
* Create cohort variable and relative time dummies
gen cohort = first_treat
gen never_treat = (first_treat == 0)
* Sun-Abraham estimator
eventstudyinteract outcome rel_time_*, ///
cohort(cohort) control_cohort(never_treat) ///
absorb(unit_id year) vce(cluster unit_id)
# R: Sun-Abraham with fixest
library(fixest)
# Create cohort indicator
df$cohort <- df$first_treat
# Sun-Abraham estimator using sunab()
sa_model <- feols(outcome ~ sunab(cohort, year) | unit_id + year,
data = df,
cluster = ~ unit_id)
summary(sa_model)
# Aggregate and plot
summary(sa_model, agg = "ATT")
iplot(sa_model)
Comparing All Estimators
A powerful robustness check is to plot multiple DiD estimators on the same figure, as in Braghieri, Levy, and Makarin (2022) on social media and mental health. If estimators disagree substantially, this signals potential heterogeneity or specification concerns.
Different estimators make different assumptions:
- TWFE: Assumes homogeneous treatment effects (can be biased with staggered adoption)
- Callaway-Sant'Anna: Uses never-treated or not-yet-treated as controls
- Sun-Abraham: Interaction-weighted approach with cohort-specific effects
- de Chaisemartin-D'Haultfoeuille: Robust to heterogeneous effects and switching treatments
- Borusyak et al.: Imputation approach using only untreated observations for counterfactuals
When all estimators agree, this strengthens your causal claims. When they diverge, investigate why.
# Python: Multi-Estimator Comparison Plot (Braghieri et al. style)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
# After running each estimator, collect the event-study coefficients
# Example structure: dict with estimator -> (rel_time, coef, ci_lower, ci_upper)
results = {
'TWFE': {'rel_time': rel_times, 'coef': twfe_coefs, 'ci_lo': twfe_ci_lo, 'ci_hi': twfe_ci_hi},
'Callaway-Sant\'Anna': {'rel_time': rel_times, 'coef': cs_coefs, 'ci_lo': cs_ci_lo, 'ci_hi': cs_ci_hi},
'Sun-Abraham': {'rel_time': rel_times, 'coef': sa_coefs, 'ci_lo': sa_ci_lo, 'ci_hi': sa_ci_hi},
'de Chaisemartin-D\'H': {'rel_time': rel_times, 'coef': dcdh_coefs, 'ci_lo': dcdh_ci_lo, 'ci_hi': dcdh_ci_hi},
}
# Create comparison plot
fig, ax = plt.subplots(figsize=(12, 7))
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
markers = ['o', 's', '^', 'D']
offsets = [-0.15, -0.05, 0.05, 0.15] # Horizontal jitter
for i, (name, data) in enumerate(results.items()):
x = np.array(data['rel_time']) + offsets[i]
y = data['coef']
yerr = [np.array(y) - np.array(data['ci_lo']),
np.array(data['ci_hi']) - np.array(y)]
ax.errorbar(x, y, yerr=yerr, fmt=markers[i], color=colors[i],
capsize=3, capthick=1.5, label=name, markersize=8, linewidth=1.5)
# Add reference lines
ax.axhline(y=0, color='black', linestyle='-', linewidth=0.8)
ax.axvline(x=-0.5, color='gray', linestyle='--', linewidth=1, alpha=0.7)
# Formatting
ax.set_xlabel('Periods Relative to Treatment', fontsize=12)
ax.set_ylabel('Treatment Effect Estimate', fontsize=12)
ax.set_title('DiD Estimator Comparison', fontsize=14, fontweight='bold')
ax.legend(loc='upper left', framealpha=0.9)
ax.set_xticks(range(min(rel_times), max(rel_times)+1))
plt.tight_layout()
plt.savefig('estimator_comparison.png', dpi=300, bbox_inches='tight')
plt.show()
* Stata: Multi-Estimator Comparison Plot (Braghieri et al. style)
* Run each estimator and store results, then combine in one figure
* 1. Run TWFE event study
reghdfe outcome ib-1.rel_time, absorb(unit_id year) cluster(unit_id)
parmest, saving(twfe_results, replace)
* 2. Run Callaway-Sant'Anna
csdid outcome, ivar(unit_id) time(year) gvar(first_treat)
csdid_stats event
* Export results manually or use csdid's stored matrices
* 3. Run Sun-Abraham
eventstudyinteract outcome rel_time_*, cohort(cohort) ///
control_cohort(never_treat) absorb(unit_id year) cluster(unit_id)
matrix sa_coef = e(b_iw)
matrix sa_var = e(V_iw)
* 4. Run de Chaisemartin-D'Haultfoeuille
did_multiplegt outcome unit_id year treatment, ///
robust_dynamic dynamic(5) placebo(3) cluster(unit_id)
* Combine results into one dataset for plotting
clear
input str30 estimator rel_time coef ci_lo ci_hi
* ... manually enter or append saved results ...
end
* Create the comparison plot
twoway (rcap ci_lo ci_hi rel_time if estimator=="TWFE", ///
lcolor(navy) lwidth(medium)) ///
(scatter coef rel_time if estimator=="TWFE", ///
mcolor(navy) msymbol(O) msize(medium)) ///
(rcap ci_lo ci_hi rel_time if estimator=="CS", ///
lcolor(cranberry) lwidth(medium)) ///
(scatter coef rel_time if estimator=="CS", ///
mcolor(cranberry) msymbol(S) msize(medium)) ///
(rcap ci_lo ci_hi rel_time if estimator=="SA", ///
lcolor(forest_green) lwidth(medium)) ///
(scatter coef rel_time if estimator=="SA", ///
mcolor(forest_green) msymbol(T) msize(medium)) ///
(rcap ci_lo ci_hi rel_time if estimator=="dCDH", ///
lcolor(dkorange) lwidth(medium)) ///
(scatter coef rel_time if estimator=="dCDH", ///
mcolor(dkorange) msymbol(D) msize(medium)), ///
xline(-0.5, lcolor(gray) lpattern(dash)) ///
yline(0, lcolor(black)) ///
legend(order(2 "TWFE" 4 "Callaway-Sant'Anna" ///
6 "Sun-Abraham" 8 "de Chaisemartin-D'H") ///
rows(1) position(6)) ///
xtitle("Periods Relative to Treatment") ///
ytitle("Treatment Effect Estimate") ///
title("DiD Estimator Comparison")
graph export "estimator_comparison.png", replace width(1200)
# R: Multi-Estimator Comparison Plot (Braghieri et al. style)
library(tidyverse)
library(fixest)
library(did)
# Run each estimator and extract event-study coefficients
# 1. TWFE
twfe_model <- feols(outcome ~ i(rel_time, ref = -1) | unit_id + year,
data = df, cluster = ~unit_id)
# 2. Callaway-Sant'Anna
cs_out <- att_gt(yname = "outcome", tname = "year", idname = "unit_id",
gname = "first_treat", data = df, control_group = "nevertreated")
cs_agg <- aggte(cs_out, type = "dynamic")
# 3. Sun-Abraham
sa_model <- feols(outcome ~ sunab(first_treat, year) | unit_id + year,
data = df, cluster = ~unit_id)
# Extract coefficients into a tidy data frame
extract_coefs <- function(model, name) {
coefs <- coef(model)
se <- sqrt(diag(vcov(model)))
tibble(
estimator = name,
rel_time = parse_number(names(coefs)),
coef = coefs,
ci_lo = coefs - 1.96 * se,
ci_hi = coefs + 1.96 * se
)
}
all_results <- bind_rows(
extract_coefs(twfe_model, "TWFE"),
tibble(
estimator = "Callaway-Sant'Anna",
rel_time = cs_agg$egt,
coef = cs_agg$att.egt,
ci_lo = cs_agg$att.egt - 1.96 * cs_agg$se.egt,
ci_hi = cs_agg$att.egt + 1.96 * cs_agg$se.egt
),
extract_coefs(sa_model, "Sun-Abraham"),
tibble(
estimator = "Borusyak et al.",
rel_time = did_imp_es$term,
coef = did_imp_es$estimate,
ci_lo = did_imp_es$conf.low,
ci_hi = did_imp_es$conf.high
)
)
# Create the comparison plot
ggplot(all_results, aes(x = rel_time, y = coef, color = estimator, shape = estimator)) +
geom_hline(yintercept = 0, color = "black", linewidth = 0.5) +
geom_vline(xintercept = -0.5, color = "gray", linetype = "dashed") +
geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi),
width = 0.2, linewidth = 0.8,
position = position_dodge(width = 0.3)) +
geom_point(size = 3, position = position_dodge(width = 0.3)) +
scale_color_manual(values = c("TWFE" = "#1f77b4",
"Callaway-Sant'Anna" = "#ff7f0e",
"Sun-Abraham" = "#2ca02c",
"de Chaisemartin-D'H" = "#d62728",
"Borusyak et al." = "#9467bd")) +
scale_shape_manual(values = c("TWFE" = 16,
"Callaway-Sant'Anna" = 15,
"Sun-Abraham" = 17,
"de Chaisemartin-D'H" = 18,
"Borusyak et al." = 8)) +
labs(x = "Periods Relative to Treatment",
y = "Treatment Effect Estimate",
title = "DiD Estimator Comparison",
color = "Estimator", shape = "Estimator") +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom",
panel.grid.minor = element_blank())
ggsave("estimator_comparison.png", width = 10, height = 6, dpi = 300)
Generated figures:
- Braghieri, L., Levy, R., & Makarin, A. (2022). Social Media and Mental Health. American Economic Review, 112(11), 3660-3693. doi:10.1257/aer.20211218
Parallel Trends Diagnostics
The parallel trends assumption is untestable, but you can provide supporting evidence by checking pre-treatment trends.
# Python: Parallel Trends Check
import matplotlib.pyplot as plt
# Plot mean outcomes by group over time
fig, ax = plt.subplots(figsize=(10, 6))
for treated, group_df in df.groupby('treated_group'):
yearly_means = group_df.groupby('year')['outcome'].mean()
label = 'Treated' if treated == 1 else 'Control'
ax.plot(yearly_means.index, yearly_means.values, marker='o', label=label)
ax.axvline(x=treatment_year, color='r', linestyle='--', label='Treatment')
ax.legend()
ax.set_xlabel('Year')
ax.set_ylabel('Mean Outcome')
plt.show()
# Statistical test: pre-treatment coefficients = 0
# (from event study, test joint significance of pre-treatment dummies)
* Stata: Parallel Trends Diagnostics
* Visual check
collapse (mean) outcome, by(year treated_group)
twoway (connected outcome year if treated_group==1) ///
(connected outcome year if treated_group==0), ///
xline(2015, lpattern(dash)) ///
legend(label(1 "Treated") label(2 "Control"))
* Test pre-treatment coefficients jointly = 0
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
testparm *rel_time#* // for negative rel_time values
* Placebo test: fake treatment timing
gen fake_post = (year >= 2013) // true treatment in 2015
gen fake_treat_post = treated_group * fake_post
reg outcome treated_group fake_post fake_treat_post if year < 2015
# R: Parallel Trends Diagnostics
library(ggplot2)
library(dplyr)
# Visual check
df %>%
group_by(year, treated_group) %>%
summarise(mean_outcome = mean(outcome)) %>%
ggplot(aes(x = year, y = mean_outcome, color = factor(treated_group))) +
geom_line() +
geom_point() +
geom_vline(xintercept = 2015, linetype = "dashed") +
labs(color = "Group", y = "Mean Outcome")
# Test pre-treatment coefficients
es_model <- feols(outcome ~ i(rel_time, ref = -1) | unit_id + year,
data = df, cluster = ~ unit_id)
# F-test for pre-treatment coefficients = 0
pretreat_coefs <- grep("rel_time::-", names(coef(es_model)), value = TRUE)
wald(es_model, pretreat_coefs)
Generated figures:
Goodman-Bacon Decomposition
TWFE DiD with staggered adoption is a weighted average of all possible 2×2 DiD comparisons. Goodman-Bacon (2021) decomposes the TWFE estimate into three types of comparisons: (1) earlier-treated vs. later-treated, (2) later-treated vs. earlier-treated, and (3) treated vs. never-treated. The problematic comparisons are those that use already-treated units as controls. Visualizing the weights vs. estimates by comparison type reveals whether bias is likely.
# Python: No direct package, but can implement manually
# or use R via rpy2. Conceptual implementation:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Get unique treatment cohorts
cohorts = df[df['treatment'] == 1].groupby('unit_id')['year'].min().unique()
# For each pair of cohorts, compute 2x2 DiD
# (Simplified — full implementation requires careful timing)
# In practice, use R's bacondecomp or Stata's bacondecomp
# Visualization of results (after computing in R/Stata)
# bacon_df = pd.read_csv('bacon_decomposition.csv')
# plt.scatter(bacon_df['weight'], bacon_df['estimate'],
# c=bacon_df['type'].map(color_map))
# plt.xlabel('Weight'); plt.ylabel('2x2 DiD Estimate')
# plt.axhline(y=0, linestyle='--', color='gray')
# plt.title('Goodman-Bacon Decomposition')
# plt.show()
* Install: ssc install bacondecomp
* Goodman-Bacon decomposition
bacondecomp outcome treatment, id(unit_id) time(year)
* The command automatically produces:
* - Table with 2x2 estimates, weights, and comparison types
* - Scatter plot of weight vs estimate
library(bacondecomp)
# Goodman-Bacon decomposition of TWFE
bacon_out <- bacon(outcome ~ treatment, data = df,
id_var = "unit_id", time_var = "year")
# View decomposition
print(bacon_out)
# Visualization: weight vs estimate by comparison type
library(ggplot2)
ggplot(bacon_out, aes(x = weight, y = estimate,
shape = type, color = type)) +
geom_point(size = 3) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(x = "Weight", y = "2x2 DiD Estimate",
title = "Goodman-Bacon Decomposition",
color = "Comparison Type", shape = "Comparison Type") +
theme_minimal()
# Weighted average = TWFE coefficient
cat("TWFE estimate:", sum(bacon_out$estimate * bacon_out$weight), "\n")
Generated figures:
- Goodman-Bacon, A. (2021). "Difference-in-Differences with Variation in Treatment Timing." Journal of Econometrics, 225(2), 254-277.
TWFE Weight Diagnostics
de Chaisemartin & D'Haultfoeuille (2020) show that TWFE weights can be negative. When weights are negative and treatment effects are heterogeneous, TWFE can be severely biased—even producing wrong signs. The twowayfeweights command computes and visualizes these weights. If many weights are negative, use robust estimators instead.
# Python: Use R via rpy2 or implement manually
# The key insight is computing the regression weights
# from the TWFE regression and checking for negatives
# Conceptual check using fixest residualization
import statsmodels.api as sm
# Demean by unit and time (TWFE residualization)
df['outcome_dm'] = df.groupby('unit_id')['outcome'].transform('demean')
df['outcome_dm'] -= df.groupby('year')['outcome_dm'].transform('mean')
df['treatment_dm'] = df.groupby('unit_id')['treatment'].transform('demean')
df['treatment_dm'] -= df.groupby('year')['treatment_dm'].transform('mean')
# The implicit weight on each observation is proportional to treatment_dm
# Negative treatment_dm for treated observations → negative weights
print("Observations with negative implicit weights:",
(df.loc[df['treatment']==1, 'treatment_dm'] < 0).sum())
* Install: ssc install twowayfeweights
* Compute TWFE weights
twowayfeweights outcome unit_id year treatment, type(feTR)
* The command reports:
* - Number of positive and negative weights
* - Sum of negative weights (measure of potential bias)
* - Under what assumptions TWFE is valid
* With sensitivity analysis
twowayfeweights outcome unit_id year treatment, type(feTR) ///
summary_measures
library(TwoWayFEWeights)
# Compute TWFE weights
weights_out <- twowayfeweights(df, "outcome", "unit_id", "year", "treatment",
type = "feTR")
summary(weights_out)
# Check for negative weights
neg_weights <- weights_out$weights[weights_out$weights < 0]
cat("Number of negative weights:", length(neg_weights), "\n")
cat("Sum of negative weights:", sum(neg_weights), "\n")
# Histogram of weights
hist(weights_out$weights, breaks = 30, main = "TWFE Weights Distribution",
xlab = "Weight", col = ifelse(weights_out$weights < 0, "red", "steelblue"))
abline(v = 0, lty = 2, lwd = 2)
If a substantial fraction of weights are negative, do NOT rely on TWFE. Use one of the robust estimators covered in this module instead.
Borusyak et al. Imputation Estimator
Borusyak, Jaravel & Spiess (2024) propose an imputation approach: (1) estimate the unit FE + time FE model using ONLY untreated observations, (2) impute counterfactual outcomes for treated observations, (3) the treatment effect is the difference between actual and imputed outcomes. This avoids the negative weighting problem entirely.
# Python: No direct package yet
# Conceptual implementation of the imputation approach:
import statsmodels.api as sm
import pandas as pd
# Step 1: Estimate FE model on untreated observations only
untreated = df[df['treatment'] == 0]
# Add unit and time dummies
untreated_dummies = pd.get_dummies(untreated[['unit_id', 'year']],
columns=['unit_id', 'year'])
model_untreated = sm.OLS(untreated['outcome'], untreated_dummies).fit()
# Step 2: Predict counterfactual for treated observations
treated = df[df['treatment'] == 1]
treated_dummies = pd.get_dummies(treated[['unit_id', 'year']],
columns=['unit_id', 'year'])
# Align columns
treated_dummies = treated_dummies.reindex(columns=untreated_dummies.columns, fill_value=0)
y0_hat = model_untreated.predict(treated_dummies)
# Step 3: ATT = mean(Y1 - Y0_hat)
att = (treated['outcome'].values - y0_hat).mean()
print(f"Imputation ATT: {att:.4f}")
# In practice, use R's didimputation or Stata's did_imputation
* Install: ssc install did_imputation
* Borusyak et al. imputation estimator
* Static ATT
did_imputation outcome unit_id year first_treated, allhorizons pretrends(5)
* Event study
did_imputation outcome unit_id year first_treated, ///
horizons(0/5) pretrends(5)
* The command reports ATT by horizon (event time)
* with standard errors clustered at unit level
library(didimputation)
# Borusyak et al. imputation estimator
# Static ATT
did_imp <- did_imputation(data = df, yname = "outcome",
gname = "first_treated", # cohort indicator
tname = "year", idname = "unit_id")
summary(did_imp)
# Event study with pre-trends
did_imp_es <- did_imputation(data = df, yname = "outcome",
gname = "first_treated",
tname = "year", idname = "unit_id",
horizon = TRUE, pretrends = TRUE)
summary(did_imp_es)
# Plot event study
library(ggplot2)
plot(did_imp_es) # built-in plotting method
- Borusyak, K., Jaravel, X., & Spiess, J. (2024). "Revisiting Event-Study Designs: Robust and Efficient Estimation." Review of Economic Studies, 91(6), 3253-3285.
Fuzzy Difference-in-Differences
Standard (sharp) DiD assumes perfect compliance—all units in the treatment group actually receive treatment. Fuzzy DiD handles imperfect compliance, analogous to fuzzy RDD. The idea: use the group×post interaction as an instrument for actual treatment. This yields a LATE for compliers, just as in IV estimation.
from linearmodels.iv import IV2SLS
from linearmodels.panel import PanelOLS
import pandas as pd
# Set up panel data
df = df.set_index(['unit_id', 'year'])
# Sharp DiD
sharp = PanelOLS(df['outcome'], df[['treatment_actual']],
entity_effects=True, time_effects=True).fit(
cov_type='clustered', cluster_entity=True)
# Fuzzy DiD with IV
# Use treatment_assigned as instrument for treatment_actual
fuzzy = IV2SLS(df['outcome'],
exog=None,
endog=df[['treatment_actual']],
instruments=df[['treatment_assigned']]).fit(
cov_type='clustered', clusters=df.index.get_level_values(0))
print(fuzzy.summary)
* Sharp DiD
reghdfe outcome treatment_actual, absorb(unit_id year) cluster(unit_id)
* Fuzzy DiD: IV within DiD framework
* treatment_assigned = group x post interaction (intent to treat)
ivreghdfe outcome (treatment_actual = treatment_assigned), ///
absorb(unit_id year) cluster(unit_id)
* Or equivalently with xtivreg
xtivreg outcome (treatment_actual = treatment_assigned) i.year, ///
fe vce(cluster unit_id)
* Check first stage
estat firststage
library(fixest)
# Sharp DiD (standard)
sharp <- feols(outcome ~ treatment_actual | unit_id + year, data = df)
# Fuzzy DiD: instrument actual treatment with assignment
# The DiD interaction (group x post) instruments for actual treatment
fuzzy <- feols(outcome ~ 1 | unit_id + year |
treatment_actual ~ treatment_assigned, data = df)
summary(fuzzy)
# The coefficient on treatment_actual is now the LATE
# (Local Average Treatment Effect for compliers)
# First-stage F-statistic
fitstat(fuzzy, "ivf") # should be > 10
Fuzzy DiD estimates the LATE for compliers—units whose actual treatment status changes because of the policy. This is the same interpretation as in IV/2SLS. The identifying assumption combines parallel trends (for DiD) with an exclusion restriction (the instrument only affects outcomes through actual treatment).
- de Chaisemartin, C. & D'Haultfoeuille, X. (2018). "Fuzzy Differences-in-Differences." Review of Economic Studies, 85(2), 999-1028.
HonestDiD Sensitivity Analysis
Standard parallel trends tests (pre-treatment coefficient = 0) have low power (Roth, 2022). HonestDiD (Rambachan & Roth, 2023) asks: how robust are your results to violations of parallel trends? It computes confidence intervals that remain valid even if pre-trends are violated by up to a magnitude M. The key output is a sensitivity plot showing how CIs change as the allowed violation M increases.
# Python: No direct package — use R via rpy2
# Conceptual approach:
import numpy as np
# After running event study, extract coefficients and vcov
# betahat = np.array([...]) # event study coefficients
# sigma = np.array([[...]]) # variance-covariance matrix
# The idea: for a given M (maximum violation of parallel trends),
# compute the worst-case bias and adjust confidence intervals
# M = 0: standard CI (assumes exact parallel trends)
# M > 0: wider CI (allows for some violation)
# In practice, use R's HonestDiD package:
# from rpy2.robjects.packages import importr
# honestdid = importr('HonestDiD')
# sensitivity = honestdid.createSensitivityResults_relativeMagnitudes(
# betahat=betahat, sigma=sigma,
# numPrePeriods=4, numPostPeriods=5)
print("Use R's HonestDiD package for sensitivity analysis")
print("See: https://github.com/asheshrambachan/HonestDiD")
* Install: ssc install honestdid
* Step 1: Run event study
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
* Step 2: HonestDiD sensitivity analysis
honestdid, pre(4) post(5) mvec(0 0.5 1 1.5 2)
* The command produces a sensitivity plot showing
* how confidence intervals widen as M increases.
* If CIs still exclude zero at reasonable M values,
* the result is robust to plausible parallel trends violations.
library(HonestDiD)
library(fixest)
# Step 1: Run event study
es <- feols(outcome ~ i(rel_time, ref = c(-1, -Inf)) | unit_id + year,
data = df, cluster = ~unit_id)
# Step 2: Extract coefficients and variance-covariance matrix
betahat <- coef(es)
sigma <- vcov(es)
# Step 3: Sensitivity analysis — relative magnitudes approach
sensitivity <- createSensitivityResults_relativeMagnitudes(
betahat = betahat,
sigma = sigma,
numPrePeriods = 4, # number of pre-treatment periods
numPostPeriods = 5, # number of post-treatment periods
Mbarvec = seq(0, 2, by = 0.5) # grid of M values
)
# Step 4: Plot sensitivity
createSensitivityPlot_relativeMagnitudes(sensitivity, rescaleFactor = 1)
# Original CI (M=0) vs robust CIs
print(sensitivity)
Roth (2022) shows that standard pre-trend tests have low power—failing to reject parallel trends does NOT mean trends are actually parallel. HonestDiD addresses this by asking how large parallel trends violations need to be before the conclusions change, providing a more informative robustness check.
- Rambachan, A. & Roth, J. (2023). "A More Credible Approach to Parallel Trends." Review of Economic Studies, 90(5), 2555-2591.
- Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." AER: Insights, 4(3), 305-322.
- Callaway, B. & Sant'Anna, P. (2021). "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics.
- de Chaisemartin, C. & D'Haultfoeuille, X. (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." AER.
- Sun, L. & Abraham, S. (2021). "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." Journal of Econometrics.
- Goodman-Bacon, A. (2021). "Difference-in-Differences with Variation in Treatment Timing." Journal of Econometrics.
- Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." AER: Insights.
- Borusyak, K., Jaravel, X., & Spiess, J. (2024). "Revisiting Event-Study Designs: Robust and Efficient Estimation." Review of Economic Studies, 91(6), 3253-3285.
- Rambachan, A. & Roth, J. (2023). "A More Credible Approach to Parallel Trends." Review of Economic Studies, 90(5), 2555-2591.
- de Chaisemartin, C. & D'Haultfoeuille, X. (2018). "Fuzzy Differences-in-Differences." Review of Economic Studies, 85(2), 999-1028.
- de Chaisemartin, C. & D'Haultfoeuille, X. (2022). "Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey." Econometrics Journal, 25(3), 493-510.