6B  Difference-in-Differences

~3 hours Classic, Staggered, Event Studies

Classic 2x2 DiD

The simplest DiD compares changes in outcomes between a treatment group and control group before and after a policy change.

# Python: Classic 2x2 DiD
import pandas as pd
import statsmodels.formula.api as smf

# Data structure: unit, time, treated_group, post, outcome
# treated_group = 1 if unit is in treatment group
# post = 1 if period is after treatment

# Create interaction term
df['treat_post'] = df['treated_group'] * df['post']

# DiD regression
model = smf.ols('outcome ~ treated_group + post + treat_post', data=df).fit()
print(model.summary())

# With clustered standard errors
model_clustered = smf.ols('outcome ~ treated_group + post + treat_post',
                          data=df).fit(cov_type='cluster',
                                       cov_kwds={'groups': df['unit_id']})
* Stata: Classic 2x2 DiD

* Basic DiD regression
reg outcome i.treated_group##i.post, cluster(unit_id)

* Equivalent with manually created interaction
gen treat_post = treated_group * post
reg outcome treated_group post treat_post, cluster(unit_id)

* With controls
reg outcome i.treated_group##i.post age income, cluster(unit_id)
# R: Classic 2x2 DiD
library(fixest)

# Basic DiD
did_model <- feols(outcome ~ treated_group * post,
                   data = df,
                   cluster = ~ unit_id)
summary(did_model)

# Using lm with sandwich standard errors
library(sandwich)
library(lmtest)
model <- lm(outcome ~ treated_group * post, data = df)
coeftest(model, vcov = vcovCL, cluster = ~ unit_id)
Python Output
OLS Regression Results ============================================================================== Dep. Variable: outcome R-squared: 0.312 Model: OLS Adj. R-squared: 0.309 Method: Least Squares F-statistic: 89.23 Date: Mon, 27 Jan 2026 Prob (F-statistic): 1.23e-45 Time: 14:32:18 Log-Likelihood: -1234.5 No. Observations: 1000 AIC: 2477. Df Residuals: 996 BIC: 2497. Df Model: 3 ============================================================================== coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- Intercept 3.2456 0.089 36.467 0.000 3.071 3.420 treated_group 0.1234 0.126 0.979 0.328 -0.124 0.371 post 0.4567 0.126 3.624 0.000 0.210 0.704 treat_post 0.5823 0.178 3.271 0.001 0.233 0.932 ============================================================================== DiD Estimate: 0.582 (SE: 0.178, p < 0.01) Interpretation: Treatment increased outcome by 0.58 units
Stata Output
Linear regression Number of obs = 1,000 F(3, 49) = 42.17 Prob > F = 0.0000 R-squared = 0.3124 Root MSE = 1.2345 (Std. Err. adjusted for 50 clusters in unit_id) ------------------------------------------------------------------------------ | Robust outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.treated_~p | .1234567 .1456789 0.85 0.401 -.1693456 .4162590 1.post | .4567891 .1234567 3.70 0.001 .2085234 .7050548 | treated_g~p#| post | 1 1 | .5823456 .1789012 3.26 0.002 .2227890 .9419022 | _cons | 3.245678 .1023456 31.71 0.000 3.039765 3.451591 ------------------------------------------------------------------------------ DiD Estimate (1.treated_group#1.post): 0.582 (p = 0.002)
R Output
OLS estimation, Dep. Var.: outcome Observations: 1,000 Standard-errors: Clustered (unit_id) Estimate Std. Error t value Pr(>|t|) (Intercept) 3.24568 0.10235 31.7134 < 2e-16 *** treated_group 0.12346 0.14568 0.8474 0.4012 post 0.45679 0.12346 3.6997 0.0005 *** treated_group:post 0.58235 0.17890 3.2551 0.0020 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 RMSE: 1.234 Adj. R2: 0.309 DiD Estimate: 0.582 95% CI: [0.223, 0.942] Interpretation: Treatment effect = 0.58 units (p = 0.002)

Two-Way Fixed Effects

With panel data, use unit and time fixed effects to control for time-invariant unit characteristics and common time shocks.

# Python: Two-Way Fixed Effects
import pandas as pd
from linearmodels.panel import PanelOLS

# Set panel index
df = df.set_index(['unit_id', 'year'])

# TWFE regression
model = PanelOLS.from_formula(
    'outcome ~ treated + controls',
    data=df,
    entity_effects=True,
    time_effects=True
)
results = model.fit(cov_type='clustered', cluster_entity=True)
print(results.summary)
* Stata: Two-Way Fixed Effects

* Set panel structure
xtset unit_id year

* TWFE with xtreg
xtreg outcome treated i.year, fe cluster(unit_id)

* Alternative: reghdfe (faster, more FE options)
* ssc install reghdfe
reghdfe outcome treated, absorb(unit_id year) cluster(unit_id)
# R: Two-Way Fixed Effects with fixest
library(fixest)

# TWFE regression
twfe_model <- feols(outcome ~ treated | unit_id + year,
                    data = df,
                    cluster = ~ unit_id)
summary(twfe_model)

# With covariates
twfe_model <- feols(outcome ~ treated + age + income | unit_id + year,
                    data = df,
                    cluster = ~ unit_id)
Python Output
PanelOLS Estimation Summary ================================================================================ Dep. Variable: outcome R-squared: 0.4523 Estimator: PanelOLS R-squared (Between): 0.3245 No. Observations: 5000 R-squared (Within): 0.4523 Date: Mon, Jan 27 2026 R-squared (Overall): 0.3891 Time: 14:35:22 Log-likelihood -6234.5 Cov. Estimator: Clustered F-statistic: 156.78 Entities: 500 P-value 0.0000 Avg Obs: 10.000 Distribution: F(2,4498) Min Obs: 10.000 Max Obs: 10.000 F-statistic (robust): 89.234 P-value 0.0000 Time periods: 10 Distribution: F(2,4498) Avg Obs: 500.00 Min Obs: 500.00 Max Obs: 500.00 Parameter Estimates ================================================================================ Parameter Std. Err. T-stat P-value Lower CI Upper CI -------------------------------------------------------------------------------- treated 0.5234 0.0912 5.7391 0.0000 0.3446 0.7022 controls 0.0345 0.0123 2.8049 0.0051 0.0104 0.0586 ================================================================================ F-test for entity effects: 12.34 (p = 0.0000) F-test for time effects: 8.91 (p = 0.0000)
Stata Output
. xtset unit_id year panel variable: unit_id (strongly balanced) time variable: year, 2010 to 2019 delta: 1 unit . reghdfe outcome treated, absorb(unit_id year) cluster(unit_id) (MWFE estimator converged in 2 iterations) HDFE Linear regression Number of obs = 5,000 Absorbing 2 HDFE groups F(1, 499) = 32.89 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.4523 Adj R-squared = 0.3891 Within R-sq. = 0.0234 Number of clusters (unit_id) = 500 Root MSE = 0.8912 (Std. Err. adjusted for 500 clusters in unit_id) ------------------------------------------------------------------------------ | Robust outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- treated | .5234123 .0912456 5.74 0.000 .3440234 .7028012 _cons | 3.456789 .0456123 75.79 0.000 3.367123 3.546455 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Loss Coverage Dups. -------------+---------------------------------------+ unit_id | 500 500 99.8% 0 year | 10 10 100.0% 0 -----------------------------------------------------+
R Output
OLS estimation, Dep. Var.: outcome Observations: 5,000 Fixed-effects: unit_id: 500, year: 10 Standard-errors: Clustered (unit_id) Estimate Std. Error t value Pr(>|t|) treated 0.52341 0.09125 5.7389 1.67e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 RMSE: 0.891 Adj. R2: 0.389 Within R2: 0.023 Wald test for joint significance of FEs: Unit FEs: F(499, 4490) = 12.34, p < 0.001 Time FEs: F(9, 4490) = 8.91, p < 0.001

Event Study Design

Event studies estimate treatment effects for each period relative to treatment, allowing you to visualize pre-trends and dynamic effects.

# Python: Event Study
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from linearmodels.panel import PanelOLS

# Create relative time variable
df['rel_time'] = df['year'] - df['treatment_year']

# Create dummies for each relative time (excluding -1 as reference)
rel_time_dummies = pd.get_dummies(df['rel_time'], prefix='rel')
rel_time_dummies = rel_time_dummies.drop('rel_-1', axis=1)  # reference period
df = pd.concat([df, rel_time_dummies], axis=1)

# Event study regression
rel_cols = [c for c in df.columns if c.startswith('rel_')]
formula = 'outcome ~ ' + ' + '.join(rel_cols)

df_panel = df.set_index(['unit_id', 'year'])
model = PanelOLS.from_formula(formula, data=df_panel,
                              entity_effects=True, time_effects=True)
results = model.fit(cov_type='clustered', cluster_entity=True)

# Plot event study coefficients
coefs = results.params[rel_cols]
ses = results.std_errors[rel_cols]

plt.figure(figsize=(10, 6))
plt.errorbar(range(len(coefs)), coefs, yerr=1.96*ses, fmt='o', capsize=3)
plt.axhline(y=0, color='r', linestyle='--')
plt.axvline(x=coefs.index.get_loc('rel_0')-0.5, color='gray', linestyle=':')
plt.xlabel('Periods Relative to Treatment')
plt.ylabel('Coefficient')
plt.title('Event Study')
plt.show()
* Stata: Event Study

* Create relative time
gen rel_time = year - treatment_year

* Event study with factor variables
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)

* Using eventdd (event study visualization)
* ssc install eventdd
eventdd outcome, timevar(rel_time) ///
    method(hdfe, absorb(unit_id year)) ///
    cluster(unit_id) graph

* Alternative: coefplot for manual plotting
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
coefplot, keep(*.rel_time) vertical yline(0) xline(10.5)
# R: Event Study with fixest
library(fixest)

# Create relative time
df$rel_time <- df$year - df$treatment_year

# Event study with i() syntax
es_model <- feols(outcome ~ i(rel_time, ref = -1) | unit_id + year,
                  data = df,
                  cluster = ~ unit_id)
summary(es_model)

# Built-in event study plot
iplot(es_model,
      xlab = "Periods Relative to Treatment",
      main = "Event Study")

# Customized plot
iplot(es_model,
      ci.col = "steelblue",
      pt.pch = 19,
      grid = TRUE)
Python Output
Event Study Coefficients (reference: t = -1) ============================================ Period Coef. Std.Err. 95% CI ------ ----- -------- ------ t-5 -0.023 0.089 [-0.198, 0.152] t-4 0.012 0.087 [-0.159, 0.183] t-3 -0.045 0.091 [-0.223, 0.133] t-2 0.034 0.088 [-0.138, 0.206] t-1 [reference period] t=0 0.312 0.095 [ 0.126, 0.498] ** t+1 0.456 0.098 [ 0.264, 0.648] *** t+2 0.523 0.102 [ 0.323, 0.723] *** t+3 0.578 0.108 [ 0.366, 0.790] *** t+4 0.612 0.115 [ 0.387, 0.837] *** t+5 0.634 0.121 [ 0.397, 0.871] *** Pre-trend test (H0: all pre-treatment coefs = 0): F(4, 4490) = 0.89, p = 0.469 [No evidence of differential pre-trends] [Event Study Plot displayed showing flat pre-trend, positive post-treatment effects]
Stata Output
. reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id) HDFE Linear regression Number of obs = 5,000 Absorbing 2 HDFE groups F(10, 499) = 18.23 Prob > F = 0.0000 R-squared = 0.4891 Adj R-squared = 0.4234 Number of clusters (unit_id) = 500 Root MSE = 0.8456 (Std. Err. adjusted for 500 clusters in unit_id) ------------------------------------------------------------------------------ | Robust outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- rel_time | -5 | -.0234567 .0891234 -0.26 0.793 -.1986234 .1517100 -4 | .0123456 .0875123 0.14 0.888 -.1596234 .1843146 -3 | -.0456789 .0912345 -0.50 0.617 -.2250234 .1336656 -2 | .0345678 .0878912 0.39 0.694 -.1381234 .2072590 -1 | 0 (omitted) 0 | .3123456 .0951234 3.28 0.001 .1254234 .4992678 1 | .4567891 .0978123 4.67 0.000 .2646234 .6489548 2 | .5234567 .1023456 5.11 0.000 .3224234 .7244900 3 | .5789012 .1078912 5.37 0.000 .3670234 .7907790 4 | .6123456 .1145678 5.34 0.000 .3873234 .8373678 5 | .6345678 .1212345 5.23 0.000 .3964234 .8727122 | _cons | 3.234567 .0567891 56.96 0.000 3.123234 3.345900 ------------------------------------------------------------------------------ . testparm L(-5/-2).rel_time ( 1) -5.rel_time = 0 ( 2) -4.rel_time = 0 ( 3) -3.rel_time = 0 ( 4) -2.rel_time = 0 F( 4, 499) = 0.89 Prob > F = 0.4692
R Output
OLS estimation, Dep. Var.: outcome Observations: 5,000 Fixed-effects: unit_id: 500, year: 10 Standard-errors: Clustered (unit_id) Estimate Std. Error t value Pr(>|t|) rel_time::-5 -0.02346 0.08912 -0.2632 0.7926 rel_time::-4 0.01235 0.08751 0.1411 0.8878 rel_time::-3 -0.04568 0.09123 -0.5007 0.6168 rel_time::-2 0.03457 0.08789 0.3932 0.6943 rel_time::0 0.31235 0.09512 3.2838 0.0011 ** rel_time::1 0.45679 0.09781 4.6700 < 1e-5 *** rel_time::2 0.52346 0.10235 5.1145 < 1e-6 *** rel_time::3 0.57890 0.10789 5.3656 < 1e-7 *** rel_time::4 0.61235 0.11457 5.3448 < 1e-7 *** rel_time::5 0.63457 0.12123 5.2346 < 1e-6 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 RMSE: 0.846 Joint test of pre-treatment coefficients: F(4, 499) = 0.89, p = 0.469 [iplot() displays event study figure with flat pre-trends and positive, increasing post-treatment effects]

Staggered Adoption: The Problem

When units are treated at different times, standard TWFE can produce biased estimates because it uses already-treated units as controls. Recent econometric research has shown this can lead to wrong signs and magnitudes.

The TWFE Problem with Staggered Treatment

Standard TWFE implicitly compares:

  • Newly-treated vs. never-treated (good)
  • Newly-treated vs. not-yet-treated (good)
  • Newly-treated vs. already-treated (problematic!)

The third comparison can produce negative weights, causing bias when treatment effects are heterogeneous across time or units.

Modern DiD Estimators

Several new estimators address the staggered treatment problem. They differ in assumptions and aggregation methods but all avoid the negative weighting issue.

Callaway-Sant'Anna (2021)

# Python: Callaway-Sant'Anna with csdid
# pip install csdid
from csdid import ATTgt

# Estimate group-time ATTs
att_gt = ATTgt(
    data=df,
    yname='outcome',
    tname='year',
    idname='unit_id',
    gname='first_treat',  # year of first treatment (0 if never)
    control_group='nevertreated'  # or 'notyettreated'
)
results = att_gt.fit()

# Aggregate to overall ATT
agg_overall = results.aggregate('simple')
print(agg_overall)

# Aggregate to event-study
agg_event = results.aggregate('event')
agg_event.plot()
* Stata: Callaway-Sant'Anna
* ssc install csdid

* Basic estimation
csdid outcome, ivar(unit_id) time(year) gvar(first_treat)

* Aggregate to overall ATT
csdid_stats simple

* Event study aggregation
csdid_stats event

* Plot event study
csdid_plot, style(rcap)
# R: Callaway-Sant'Anna with did package
library(did)

# Estimate group-time ATTs
out <- att_gt(
  yname = "outcome",
  tname = "year",
  idname = "unit_id",
  gname = "first_treat",
  data = df,
  control_group = "nevertreated"  # or "notyettreated"
)
summary(out)

# Aggregate to simple ATT
agg_simple <- aggte(out, type = "simple")
summary(agg_simple)

# Event study aggregation
agg_es <- aggte(out, type = "dynamic")
ggdid(agg_es)
Python Output
Callaway-Sant'Anna (2021) Group-Time ATT Estimates ================================================== Control Group: Never Treated Anticipation Periods: 0 Estimation Method: Doubly Robust (DR) Group-Time Average Treatment Effects: ------------------------------------- Group | Time | ATT | Std.Err | [95% CI] -------|------|---------|----------|------------- 2014 | 2014 | 0.298 | 0.112 | [0.078, 0.518] 2014 | 2015 | 0.412 | 0.098 | [0.220, 0.604] 2014 | 2016 | 0.489 | 0.105 | [0.283, 0.695] 2015 | 2015 | 0.267 | 0.108 | [0.055, 0.479] 2015 | 2016 | 0.378 | 0.095 | [0.192, 0.564] 2016 | 2016 | 0.234 | 0.115 | [0.009, 0.459] Aggregated Treatment Effects: ----------------------------- Simple ATT: 0.346 (SE: 0.089) 95% CI: [0.172, 0.520] p-value: 0.0001 [Event study plot shows cohort-specific treatment effects]
Stata Output
. csdid outcome, ivar(unit_id) time(year) gvar(first_treat) Callaway and Sant'Anna (2021) Doubly Robust DiD Estimator Control Group: Never Treated ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- g2014 t2014| .2978123 .1123456 2.65 0.008 .0776234 .5180012 g2014 t2015| .4123456 .0981234 4.20 0.000 .2200234 .6046678 g2014 t2016| .4891234 .1052345 4.65 0.000 .2828678 .6953790 g2015 t2015| .2671234 .1081234 2.47 0.014 .0552012 .4790456 g2015 t2016| .3781234 .0952345 3.97 0.000 .1914678 .5647790 g2016 t2016| .2341234 .1152345 2.03 0.042 .0082678 .4599790 ------------------------------------------------------------------------------ . csdid_stats simple ATT Aggregation: Simple ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATT | .3462345 .0891234 3.88 0.000 .1715567 .5209123 ------------------------------------------------------------------------------ . csdid_stats event, window(-5 5) [Event study table with coefficients for each relative time period]
R Output
Call: att_gt(yname = "outcome", tname = "year", idname = "unit_id", gname = "first_treat", data = df, control_group = "nevertreated") Group-Time Average Treatment Effects: Group Time ATT SE [95% Pointwise Conf. Band] 2014 2014 0.2978 0.1123 [0.0777, 0.5180] 2014 2015 0.4123 0.0981 [0.2200, 0.6047] 2014 2016 0.4891 0.1052 [0.2829, 0.6954] 2015 2015 0.2671 0.1081 [0.0552, 0.4790] 2015 2016 0.3781 0.0952 [0.1915, 0.5648] 2016 2016 0.2341 0.1152 [0.0083, 0.4600] Signif. codes: `*' confidence band does not cover 0 P-value for pre-test of parallel trends assumption: 0.523 Control Group: Never Treated Anticipation Periods: 0 Estimation Method: Doubly Robust > summary(agg_simple) Call: aggte(MP = out, type = "simple") Overall ATT: ATT SE [95% CI] 0.3462 0.0891 [0.1716, 0.5209] * [ggdid() displays event study plot with proper aggregation]

de Chaisemartin-D'Haultfoeuille (2020)

* Stata: de Chaisemartin-D'Haultfoeuille
* ssc install did_multiplegt

* Basic estimation
did_multiplegt outcome unit_id year treatment, ///
    robust_dynamic dynamic(5) placebo(3) ///
    cluster(unit_id) breps(100)

* With controls
did_multiplegt outcome unit_id year treatment, ///
    robust_dynamic dynamic(5) placebo(3) ///
    controls(age income) cluster(unit_id)
# R: de Chaisemartin-D'Haultfoeuille with DIDmultiplegt
library(DIDmultiplegt)

# Basic estimation
result <- did_multiplegt(
  df = df,
  Y = "outcome",
  G = "unit_id",
  T = "year",
  D = "treatment",
  dynamic = 5,
  placebo = 3,
  cluster = "unit_id"
)
summary(result)

# Plot results
did_multiplegt_plot(result)
Stata Output
. did_multiplegt outcome unit_id year treatment, /// > robust_dynamic dynamic(5) placebo(3) cluster(unit_id) breps(100) de Chaisemartin and D'Haultfoeuille (2020) Estimator Robust to heterogeneous treatment effects Treatment is first switched on at some point during the panel Effect at time of treatment switch-on ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Effect_0 | .3123456 .0912345 3.42 0.001 .1335291 .4911621 ------------------------------------------------------------------------------ Dynamic effects ------------------------------------------------------------------------------ Effect_1 | .4234567 .0978123 4.33 0.000 .2317478 .6151656 Effect_2 | .4891234 .1023456 4.78 0.000 .2885295 .6897173 Effect_3 | .5234567 .1089123 4.81 0.000 .3099923 .7369211 Effect_4 | .5567891 .1145678 4.86 0.000 .3322403 .7813379 Effect_5 | .5789012 .1198234 4.83 0.000 .3440516 .8137508 ------------------------------------------------------------------------------ Placebo tests (pre-treatment effects should be zero) ------------------------------------------------------------------------------ Placebo_1 | -.0123456 .0856789 -0.14 0.885 -.1802722 .1555810 Placebo_2 | .0234567 .0891234 0.26 0.792 -.1512220 .1981354 Placebo_3 | -.0345678 .0923456 -0.37 0.708 -.2155618 .1464262 ------------------------------------------------------------------------------ Joint test for placebos: chi2(3) = 0.78, p = 0.854
R Output
de Chaisemartin and D'Haultfoeuille (2020) Estimator ==================================================== Instantaneous effect at treatment switch-on: Effect_0: 0.3123 (SE: 0.0912) 95% CI: [0.134, 0.491] p-value: 0.0006 Dynamic effects (periods after treatment): Effect | Estimate | SE | 95% CI -------|----------|--------|--------------- 1 | 0.4235 | 0.0978 | [0.232, 0.615] 2 | 0.4891 | 0.1023 | [0.289, 0.690] 3 | 0.5235 | 0.1089 | [0.310, 0.737] 4 | 0.5568 | 0.1146 | [0.332, 0.781] 5 | 0.5789 | 0.1198 | [0.344, 0.814] Placebo tests (pre-treatment periods): Placebo | Estimate | SE | 95% CI --------|-----------|--------|--------------- -1 | -0.0123 | 0.0857 | [-0.180, 0.156] -2 | 0.0235 | 0.0891 | [-0.151, 0.198] -3 | -0.0346 | 0.0923 | [-0.216, 0.146] Joint test for placebos: chi2(3) = 0.78, p = 0.854 [No evidence against parallel trends] [did_multiplegt_plot() shows event study figure]

Sun-Abraham (2021)

* Stata: Sun-Abraham Interaction-Weighted Estimator
* ssc install eventstudyinteract

* Create cohort variable and relative time dummies
gen cohort = first_treat
gen never_treat = (first_treat == 0)

* Sun-Abraham estimator
eventstudyinteract outcome rel_time_*, ///
    cohort(cohort) control_cohort(never_treat) ///
    absorb(unit_id year) vce(cluster unit_id)
# R: Sun-Abraham with fixest
library(fixest)

# Create cohort indicator
df$cohort <- df$first_treat

# Sun-Abraham estimator using sunab()
sa_model <- feols(outcome ~ sunab(cohort, year) | unit_id + year,
                  data = df,
                  cluster = ~ unit_id)
summary(sa_model)

# Aggregate and plot
summary(sa_model, agg = "ATT")
iplot(sa_model)
Stata Output
. eventstudyinteract outcome rel_time_*, /// > cohort(cohort) control_cohort(never_treat) /// > absorb(unit_id year) vce(cluster unit_id) Sun and Abraham (2021) Interaction-Weighted Estimator Control cohort: never_treat IW estimates by relative time to treatment ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- rel_time_5 | -.0178234 .0892345 -0.20 0.842 -.1927199 .1570731 rel_time_4 | .0089123 .0878912 0.10 0.919 -.1633513 .1811759 rel_time_3 | -.0312456 .0912345 -0.34 0.732 -.2100621 .1475709 rel_time_2 | .0234567 .0891234 0.26 0.792 -.1512220 .1981354 rel_time_1 | 0 (reference) rel_time0 | .2891234 .0912456 3.17 0.002 .1102853 .4679615 rel_time1 | .4123456 .0956789 4.31 0.000 .2248173 .5998739 rel_time2 | .4789123 .0998234 4.80 0.000 .2832620 .6745626 rel_time3 | .5234567 .1045678 5.01 0.000 .3185074 .7284060 rel_time4 | .5567891 .1098234 5.07 0.000 .3415391 .7720391 rel_time5 | .5812345 .1145678 5.07 0.000 .3566857 .8057833 ------------------------------------------------------------------------------ Average ATT across all post-treatment periods: 0.4736 (SE: 0.0923)
R Output
OLS estimation, Dep. Var.: outcome Observations: 5,000 Fixed-effects: unit_id: 500, year: 10 Standard-errors: Clustered (unit_id) Sun and Abraham (2021) estimation using cohort-specific effects Estimate Std. Error t value Pr(>|t|) cohort::2014:-5 -0.0178 0.0892 -0.1996 0.8419 cohort::2014:-4 0.0089 0.0879 0.1014 0.9193 cohort::2014:-3 -0.0312 0.0912 -0.3426 0.7320 cohort::2014:-2 0.0235 0.0891 0.2632 0.7925 cohort::2014:0 0.2891 0.0912 3.1696 0.0016 ** cohort::2014:1 0.4123 0.0957 4.3101 < 1e-4 *** cohort::2014:2 0.4789 0.0998 4.7978 < 1e-5 *** cohort::2015:-4 -0.0145 0.0901 -0.1609 0.8723 ... --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Aggregated ATT (average post-treatment effect): ATT SE [95% CI] 0.4736 0.0923 [0.2927, 0.6545] *** [iplot() shows Sun-Abraham event study with cohort-weighted estimates]

Parallel Trends Diagnostics

The parallel trends assumption is untestable, but you can provide supporting evidence by checking pre-treatment trends.

# Python: Parallel Trends Check
import matplotlib.pyplot as plt

# Plot mean outcomes by group over time
fig, ax = plt.subplots(figsize=(10, 6))

for treated, group_df in df.groupby('treated_group'):
    yearly_means = group_df.groupby('year')['outcome'].mean()
    label = 'Treated' if treated == 1 else 'Control'
    ax.plot(yearly_means.index, yearly_means.values, marker='o', label=label)

ax.axvline(x=treatment_year, color='r', linestyle='--', label='Treatment')
ax.legend()
ax.set_xlabel('Year')
ax.set_ylabel('Mean Outcome')
plt.show()

# Statistical test: pre-treatment coefficients = 0
# (from event study, test joint significance of pre-treatment dummies)
* Stata: Parallel Trends Diagnostics

* Visual check
collapse (mean) outcome, by(year treated_group)
twoway (connected outcome year if treated_group==1) ///
       (connected outcome year if treated_group==0), ///
       xline(2015, lpattern(dash)) ///
       legend(label(1 "Treated") label(2 "Control"))

* Test pre-treatment coefficients jointly = 0
reghdfe outcome ib(-1).rel_time, absorb(unit_id year) cluster(unit_id)
testparm *rel_time#*  // for negative rel_time values

* Placebo test: fake treatment timing
gen fake_post = (year >= 2013)  // true treatment in 2015
gen fake_treat_post = treated_group * fake_post
reg outcome treated_group fake_post fake_treat_post if year < 2015
# R: Parallel Trends Diagnostics
library(ggplot2)
library(dplyr)

# Visual check
df %>%
  group_by(year, treated_group) %>%
  summarise(mean_outcome = mean(outcome)) %>%
  ggplot(aes(x = year, y = mean_outcome, color = factor(treated_group))) +
  geom_line() +
  geom_point() +
  geom_vline(xintercept = 2015, linetype = "dashed") +
  labs(color = "Group", y = "Mean Outcome")

# Test pre-treatment coefficients
es_model <- feols(outcome ~ i(rel_time, ref = -1) | unit_id + year,
                  data = df, cluster = ~ unit_id)

# F-test for pre-treatment coefficients = 0
pretreat_coefs <- grep("rel_time::-", names(coef(es_model)), value = TRUE)
wald(es_model, pretreat_coefs)
Python Output
Parallel Trends Diagnostic ========================== Mean Outcomes by Group and Year: Treated Control Difference 2010 3.12 3.08 0.04 2011 3.25 3.21 0.04 2012 3.38 3.33 0.05 2013 3.51 3.47 0.04 2014 3.64 3.59 0.05 -------- TREATMENT (2015) -------- 2015 4.21 3.72 0.49 2016 4.35 3.84 0.51 2017 4.48 3.96 0.52 2018 4.61 4.09 0.52 2019 4.73 4.21 0.52 Pre-treatment difference is stable (~0.04-0.05) Post-treatment gap widens to ~0.50 Joint test of pre-treatment coefficients: F(4, 495) = 0.76 p-value = 0.552 Conclusion: Cannot reject parallel pre-trends [Plot shows parallel lines pre-2015, diverging after treatment]
Stata Output
. testparm L(-5/-2).rel_time ( 1) -5.rel_time = 0 ( 2) -4.rel_time = 0 ( 3) -3.rel_time = 0 ( 4) -2.rel_time = 0 F( 4, 499) = 0.76 Prob > F = 0.5523 Conclusion: Cannot reject H0 that all pre-treatment coefficients = 0 This supports (but does not prove) parallel trends . reg outcome treated_group fake_post fake_treat_post if year < 2015 Source | SS df MS Number of obs = 2,500 -------------+---------------------------------- F(3, 2496) = 0.89 Model | 2.34567891 3 .781892970 Prob > F = 0.4456 Residual | 2189.12345 2,496 .877244171 R-squared = 0.0011 -------------+---------------------------------- Adj R-squared = -0.0001 Total | 2191.46913 2,499 .876939227 Root MSE = .93662 ------------------------------------------------------------------------------ outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- treated_gr~p | .0412345 .0456789 0.90 0.367 -.0483456 .1308146 fake_post | .1234567 .0478912 2.58 0.010 .0295234 .2173900 fake_treat~t | .0078912 .0567891 0.14 0.889 -.1034567 .1192391 _cons | 3.123456 .0345678 90.35 0.000 3.055678 3.191234 ------------------------------------------------------------------------------ Placebo DiD estimate: 0.008 (p = 0.889) [No spurious "effect" in pre-treatment period]
R Output
# Visual inspection shows parallel pre-trends # (Plot displayed with parallel lines before treatment year) Wald test for joint significance of pre-treatment coefficients: H0: All pre-treatment coefficients = 0 Tested coefficients: rel_time::-5, rel_time::-4, rel_time::-3, rel_time::-2 Wald stat: 3.04 on 4 df p-value: 0.5510 Conclusion: Cannot reject H0 (parallel trends assumption supported) Note: This test has limited power. Even with p > 0.05: - Parallel trends could be violated - Small violations may be economically meaningful - Consider sensitivity analysis (Rambachan & Roth, 2023) Pre-treatment coefficient magnitudes: All coefficients < 0.05 in absolute value All 95% CIs include zero No evidence of differential pre-trends
Key Papers on Modern DiD
  • Callaway, B. & Sant'Anna, P. (2021). "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics.
  • de Chaisemartin, C. & D'Haultfoeuille, X. (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." AER.
  • Sun, L. & Abraham, S. (2021). "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." Journal of Econometrics.
  • Goodman-Bacon, A. (2021). "Difference-in-Differences with Variation in Treatment Timing." Journal of Econometrics.
  • Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." AER: Insights.