Module 12: Large Language Models — How LLMs work, prompting, APIs
This course is currently open to students at Sciences Po. If you are not a Sciences Po student but would like access, please email me to request an invite token.
The simplest DiD compares changes in outcomes between a treatment group and control group before and after a policy change.
# Python: Classic 2x2 DiDimport pandas as pd
import statsmodels.formula.api as smf
# Data structure: unit, time, treated_group, post, outcome# treated_group = 1 if unit is in treatment group# post = 1 if period is after treatment# Create interaction term
df['treat_post'] = df['treated_group'] * df['post']
# DiD regressionmodel = smf.ols('outcome ~ treated_group + post + treat_post', data=df).fit()
print(model.summary())# With clustered standard errorsmodel_clustered = smf.ols('outcome ~ treated_group + post + treat_post',
data=df).fit(cov_type='cluster',
cov_kwds={'groups': df['unit_id']})
* Stata: Classic 2x2 DiD* Basic DiD regressionreg outcome i.treated_group##i.post, cluster(unit_id)* Equivalent with manually created interaction
gen treat_post = treated_group * post
reg outcome treated_group post treat_post, cluster(unit_id)
* With controls
reg outcome i.treated_group##i.post age income, cluster(unit_id)
# R: Classic 2x2 DiDlibrary(fixest)
# Basic DiDdid_model <- feols(outcome ~ treated_group * post,
data = df,
cluster = ~ unit_id)
summary(did_model)
# Using lm with sandwich standard errorslibrary(sandwich)
library(lmtest)
model <- lm(outcome ~ treated_group * post, data = df)
coeftest(model, vcov = vcovCL, cluster = ~ unit_id)
When units are treated at different times, standard TWFE can produce biased estimates because it uses already-treated units as controls. Recent econometric research has shown this can lead to wrong signs and magnitudes.
The TWFE Problem with Staggered Treatment
Standard TWFE implicitly compares:
Newly-treated vs. never-treated (good)
Newly-treated vs. not-yet-treated (good)
Newly-treated vs. already-treated (problematic!)
The third comparison can produce negative weights, causing bias when treatment effects are heterogeneous across time or units.
Modern DiD Estimators
Several new estimators address the staggered treatment problem. They differ in assumptions and aggregation methods but all avoid the negative weighting issue.
Callaway-Sant'Anna (2021)
# Python: Callaway-Sant'Anna with csdid# pip install csdidfrom csdid import ATTgt
# Estimate group-time ATTsatt_gt = ATTgt(
data=df,
yname='outcome',
tname='year',
idname='unit_id',
gname='first_treat', # year of first treatment (0 if never)
control_group='nevertreated'# or 'notyettreated'
)
results = att_gt.fit()
# Aggregate to overall ATT
agg_overall = results.aggregate('simple')
print(agg_overall)
# Aggregate to event-study
agg_event = results.aggregate('event')
agg_event.plot()
* Stata: Callaway-Sant'Anna* ssc install csdid* Basic estimationcsdid outcome, ivar(unit_id) time(year) gvar(first_treat)* Aggregate to overall ATT
csdid_stats simple
* Event study aggregation
csdid_stats event
* Plot event study
csdid_plot, style(rcap)
# R: Callaway-Sant'Anna with did packagelibrary(did)
# Estimate group-time ATTsout <- att_gt(
yname = "outcome",
tname = "year",
idname = "unit_id",
gname = "first_treat",
data = df,
control_group = "nevertreated" # or "notyettreated"
)
summary(out)
# Aggregate to simple ATT
agg_simple <- aggte(out, type = "simple")
summary(agg_simple)
# Event study aggregation
agg_es <- aggte(out, type = "dynamic")
ggdid(agg_es)
Parallel Trends Diagnostic
==========================
Mean Outcomes by Group and Year:
Treated Control Difference
2010 3.12 3.08 0.04
2011 3.25 3.21 0.04
2012 3.38 3.33 0.05
2013 3.51 3.47 0.04
2014 3.64 3.59 0.05
-------- TREATMENT (2015) --------
2015 4.21 3.72 0.49
2016 4.35 3.84 0.51
2017 4.48 3.96 0.52
2018 4.61 4.09 0.52
2019 4.73 4.21 0.52
Pre-treatment difference is stable (~0.04-0.05)
Post-treatment gap widens to ~0.50
Joint test of pre-treatment coefficients:
F(4, 495) = 0.76
p-value = 0.552
Conclusion: Cannot reject parallel pre-trends
[Plot shows parallel lines pre-2015, diverging after treatment]
Stata Output
. testparm L(-5/-2).rel_time
( 1) -5.rel_time = 0
( 2) -4.rel_time = 0
( 3) -3.rel_time = 0
( 4) -2.rel_time = 0
F( 4, 499) = 0.76
Prob > F = 0.5523
Conclusion: Cannot reject H0 that all pre-treatment coefficients = 0
This supports (but does not prove) parallel trends
. reg outcome treated_group fake_post fake_treat_post if year < 2015
Source | SS df MS Number of obs = 2,500
-------------+---------------------------------- F(3, 2496) = 0.89
Model | 2.34567891 3 .781892970 Prob > F = 0.4456
Residual | 2189.12345 2,496 .877244171 R-squared = 0.0011
-------------+---------------------------------- Adj R-squared = -0.0001
Total | 2191.46913 2,499 .876939227 Root MSE = .93662
------------------------------------------------------------------------------
outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treated_gr~p | .0412345 .0456789 0.90 0.367 -.0483456 .1308146
fake_post | .1234567 .0478912 2.58 0.010 .0295234 .2173900
fake_treat~t | .0078912 .0567891 0.14 0.889 -.1034567 .1192391
_cons | 3.123456 .0345678 90.35 0.000 3.055678 3.191234
------------------------------------------------------------------------------
Placebo DiD estimate: 0.008 (p = 0.889)
[No spurious "effect" in pre-treatment period]
R Output
# Visual inspection shows parallel pre-trends
# (Plot displayed with parallel lines before treatment year)
Wald test for joint significance of pre-treatment coefficients:
H0: All pre-treatment coefficients = 0
Tested coefficients:
rel_time::-5, rel_time::-4, rel_time::-3, rel_time::-2
Wald stat: 3.04 on 4 df
p-value: 0.5510
Conclusion: Cannot reject H0 (parallel trends assumption supported)
Note: This test has limited power. Even with p > 0.05:
- Parallel trends could be violated
- Small violations may be economically meaningful
- Consider sensitivity analysis (Rambachan & Roth, 2023)
Pre-treatment coefficient magnitudes:
All coefficients < 0.05 in absolute value
All 95% CIs include zero
No evidence of differential pre-trends
Key Papers on Modern DiD
Callaway, B. & Sant'Anna, P. (2021). "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics.
de Chaisemartin, C. & D'Haultfoeuille, X. (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." AER.
Sun, L. & Abraham, S. (2021). "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." Journal of Econometrics.
Goodman-Bacon, A. (2021). "Difference-in-Differences with Variation in Treatment Timing." Journal of Econometrics.
Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." AER: Insights.