6 Causal Inference
Learning Objectives
- Implement causal inference methods in Python, Stata, and R
- Choose the appropriate method for your research design
- Conduct robustness checks and sensitivity analysis
- Apply modern advances (staggered DiD, ML-based matching)
- Present and interpret causal estimates correctly
Prerequisites: Causal Inference Theory
This module assumes you understand the theoretical foundations of causal inference: potential outcomes, selection bias, and identification strategies. If you need to build this background:
- Causal Inference: The Mixtape by Scott Cunningham (free online)
- The Effect by Nick Huntington-Klein (free online)
- Mostly Harmless Econometrics by Angrist & Pischke (classic text)
- Mastering 'Metrics by Angrist & Pischke (accessible introduction)
Choosing Your Method
The identification strategy depends on what source of variation you can credibly argue is exogenous. Ask yourself: Why do some units get treated and others don't? The answer determines which method is appropriate.
| Data Structure & Variation Source | Method | Technical Requirements | Example Papers |
|---|---|---|---|
| Cross-sectional with rich observables. Treatment is observationally determined—you believe that after conditioning on X, treatment is as good as random. | Matching | Substantial covariate overlap (common support); no unmeasured confounders (CIA/unconfoundedness); enough control units with similar propensity scores. | Dehejia & Wahba (2002); Chetty et al. (2014) |
| Panel data with staggered policy adoption. Some units adopt treatment at different times; you assume absent treatment, treated and control units would have evolved similarly. | Difference-in-Differences | Pre-treatment parallel trends (testable in pre-period); no anticipation effects; SUTVA (no spillovers); with staggered timing, beware of negative weighting. | Card & Krueger (1994); Dube et al. (2019) |
| Treatment assigned by a score crossing a cutoff. Eligibility is determined by whether a running variable exceeds a threshold (test scores, income limits, age cutoffs). | Regression Discontinuity | No manipulation of the running variable (McCrary density test); continuity of potential outcomes at cutoff; local treatment effect at cutoff (not generalizable). | Lee (2008); Lee & Lemieux (2010) |
| Endogenous treatment with external shifter. You have a variable Z that affects treatment D but has no direct effect on outcome Y except through D. | Instrumental Variables | Relevance (strong first stage, F > 10); exclusion restriction (no direct effect); monotonicity for LATE interpretation; beware of weak instruments. | Angrist & Krueger (1991); Acemoglu et al. (2001) |
| Single treated unit (or very few) with many potential controls. A policy affects one state/country; you construct a synthetic comparison by weighting other units. | Synthetic Control | Good pre-treatment fit (low RMSPE); donor pool units are unaffected by treatment (SUTVA); sufficient pre-treatment periods for validation. | Abadie et al. (2010); Abadie (2021) |
Every causal inference method requires untestable assumptions. The goal is not to find a "perfect" method but to choose one whose assumptions are most plausible in your context—and then stress-test those assumptions with robustness checks, placebo tests, and sensitivity analysis.
Method Overview
Matching
Find untreated units similar to treated units on observable characteristics. Methods include propensity score matching, coarsened exact matching, and nearest-neighbor matching.
Key Assumption: Conditional Independence (CIA)
After conditioning on observables X, treatment assignment is independent of potential outcomes. There are no unmeasured confounders.
Difference-in-Differences
Compare changes over time between treatment and control groups. Includes classic 2x2 DiD, staggered adoption designs, and modern heterogeneous treatment effects estimators.
Key Assumption: Parallel Trends
In the absence of treatment, treated and control groups would have followed the same trajectory. Trends, not levels, must be equal.
Regression Discontinuity
Exploit discontinuous changes in treatment probability at a threshold. Sharp RDD (deterministic assignment) and fuzzy RDD (probabilistic assignment).
Key Assumptions: Continuity + No Manipulation
Potential outcomes are continuous at the cutoff; units cannot precisely manipulate their running variable. Effect is local to the cutoff.
Instrumental Variables
Use external variation that affects treatment but not outcomes directly. Includes Bartik/shift-share instruments and recent econometric advances.
Key Assumptions: Relevance + Exclusion
Z must affect D (first stage); Z must only affect Y through D (exclusion). Exclusion restriction is untestable.
Synthetic Control
Construct a weighted combination of control units to match the treated unit's pre-treatment trajectory. Ideal for case studies and policy evaluations.
Key Assumptions: Fit + SUTVA
Synthetic control matches pre-treatment outcomes well; donor pool units are unaffected. Good fit ≠ identification, but poor fit is a red flag.
Experiments (RCTs)
Randomly assign treatment to ensure comparability between groups. The gold standard for causal inference when feasible.
Key Assumption: Proper Randomization
Randomization was correctly implemented and maintained (no differential attrition, no compliance issues). Check balance on observables.
Modern Developments
Causal inference methods have seen major advances in recent years. Each subpage covers these modern techniques with implementation code:
- Staggered DiD: When treatment adoption varies across time, classic two-way fixed effects can be biased. New estimators by Callaway & Sant'Anna (2021), Sun & Abraham (2021), and de Chaisemartin & D'Haultfoeuille (2020) provide solutions.
- Double/Debiased ML: When you have many potential confounders, machine learning can help select controls while maintaining valid inference. See Chernozhukov et al. (2018).
- Synthetic Control Extensions: Augmented SC (Ben-Michael et al. 2021) and Synthetic DiD (Arkhangelsky et al. 2021) combine matching with DiD logic.
- Shift-Share/Bartik Instruments: Popular in labor and trade, but with subtle identification issues. See Goldsmith-Pinkham et al. (2020) and Borusyak et al. (2022) on when they work.
Each submodule provides working code in Python, Stata, and R. We show you how to implement each method, conduct diagnostic tests, and interpret results. For the underlying theory and derivations, consult the references above.
- Imbens, G. & Wooldridge, J. (2009). "Recent Developments in the Econometrics of Program Evaluation." Journal of Economic Literature.
- Angrist, J. & Pischke, J. (2010). "The Credibility Revolution in Empirical Economics." JEP.
- Athey, S. & Imbens, G. (2017). "The State of Applied Econometrics." JEP.