6C Regression Discontinuity
Table of Contents
Sharp RDD
In a sharp RDD, treatment is deterministically assigned at the cutoff: everyone above is treated, everyone below is not.
# Python: Sharp RDD with rdrobust
# pip install rdrobust
from rdrobust import rdrobust, rdbwselect
# running_var: the score that determines treatment (e.g., test score)
# cutoff: the threshold (e.g., 50 points)
# Center the running variable at cutoff
df['x_centered'] = df['running_var'] - cutoff
# Sharp RDD estimation
result = rdrobust(
y=df['outcome'],
x=df['x_centered'],
c=0 # cutoff (centered)
)
print(result)
# With covariates
result = rdrobust(
y=df['outcome'],
x=df['x_centered'],
c=0,
covs=df[['age', 'gender']]
)
* Stata: Sharp RDD with rdrobust
* ssc install rdrobust
* Center running variable
gen x_centered = running_var - 50
* Sharp RDD
rdrobust outcome x_centered, c(0)
* With covariates
rdrobust outcome x_centered, c(0) covs(age gender)
* Specify kernel and polynomial order
rdrobust outcome x_centered, c(0) kernel(triangular) p(1)
# R: Sharp RDD with rdrobust
library(rdrobust)
# Center running variable
df$x_centered <- df$running_var - 50
# Sharp RDD
rd_result <- rdrobust(y = df$outcome,
x = df$x_centered,
c = 0)
summary(rd_result)
# With covariates
rd_result <- rdrobust(y = df$outcome,
x = df$x_centered,
c = 0,
covs = cbind(df$age, df$gender))
Sharp RDD Estimates using Local Polynomial Regression
=====================================================
Number of Obs. 2847
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1423 1424
Eff. Number of Obs. 612 587
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 8.421 8.421
BW bias (b) 13.892 13.892
rho (h/b) 0.606 0.606
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 2.847 0.792 3.595 0.000 [1.295 , 4.399]
Robust - - 3.142 0.002 [1.084 , 4.743]
=============================================================================Sharp RD estimates using local polynomial regression.
Cutoff c = 0 | Left of c Right of c Number of obs = 2847
-------------------+---------------------- BW type = mserd
Number of obs | 1423 1424 Kernel = Triangular
Eff. Number of obs | 612 587 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 8.421 8.421
BW bias (b) | 13.892 13.892
rho (h/b) | 0.606 0.606
Outcome: outcome. Running variable: x_centered.
--------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | 2.847 0.792 3.595 0.000 1.295 4.399
Robust | - - 3.142 0.002 1.084 4.743
--------------------------------------------------------------------------------Sharp RD estimates using local polynomial regression.
Number of Obs. 2847
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1423 1424
Eff. Number of Obs. 612 587
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 8.421 8.421
BW bias (b) 13.892 13.892
rho (h/b) 0.606 0.606
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 2.847 0.792 3.595 0.000 [1.295 , 4.399]
Robust - - 3.142 0.002 [1.084 , 4.743]
=============================================================================Bandwidth Selection
The bandwidth determines how much data around the cutoff is used. Narrower bandwidth = more local but less precise. Modern methods select optimal bandwidth automatically.
# Python: Bandwidth Selection
from rdrobust import rdbwselect
# Optimal bandwidth selection
bw_result = rdbwselect(
y=df['outcome'],
x=df['x_centered'],
c=0
)
print(bw_result)
# Use specific bandwidth
result = rdrobust(
y=df['outcome'],
x=df['x_centered'],
c=0,
h=5 # manual bandwidth
)
# Robustness: different bandwidths
for h in [3, 5, 7, 10]:
result = rdrobust(y=df['outcome'], x=df['x_centered'], c=0, h=h)
print(f"h={h}: estimate={result.coef[0]:.3f}")
* Stata: Bandwidth Selection
* Optimal bandwidth
rdbwselect outcome x_centered, c(0)
* With different methods
rdbwselect outcome x_centered, c(0) bwselect(mserd) // MSE
rdbwselect outcome x_centered, c(0) bwselect(cerrd) // CER
* Robustness check: vary bandwidth
foreach h in 3 5 7 10 {
rdrobust outcome x_centered, c(0) h(`h')
}
# R: Bandwidth Selection
library(rdrobust)
# Optimal bandwidth
bw <- rdbwselect(y = df$outcome, x = df$x_centered, c = 0)
summary(bw)
# Use specific bandwidth
rd_result <- rdrobust(y = df$outcome, x = df$x_centered, c = 0, h = 5)
# Robustness: multiple bandwidths
bandwidths <- c(3, 5, 7, 10)
results <- lapply(bandwidths, function(h) {
rdrobust(y = df$outcome, x = df$x_centered, c = 0, h = h)
})
# Extract and compare estimates
Bandwidth Selection using local polynomial regression
Number of Obs. 2847
Kernel Triangular
Order loc. poly. (p) 1
Order bias (q) 2
=======================================================
BW Type BW (h) BW (b)
=======================================================
mserd 8.421 13.892
msetwo 7.934 12.156
msesum 8.187 13.024
msecommon 8.302 13.458
cerrd 5.847 9.642
certwo 5.506 8.438
cersum 5.682 9.040
cercommon 5.764 9.341
=======================================================
Robustness to bandwidth choice:
h=3: estimate=3.124
h=5: estimate=2.956
h=7: estimate=2.891
h=10: estimate=2.784Bandwidth estimators for local polynomial regression
Cutoff c = 0 | Left of c Right of c Number of obs = 2847
-------------------+---------------------- Kernel = Triangular
Number of obs | 1423 1424 VCE method = NN
Order loc. poly (p)| 1 1
Order bias (q) | 2 2
--------------------------------------------------------------------------------
Method | h b rho
-------------------+------------------------------------------------------------
mserd | 8.421 13.892 0.606
msetwo | 7.934 12.156 0.653
msesum | 8.187 13.024 0.629
msecommon | 8.302 13.458 0.617
cerrd | 5.847 9.642 0.606
certwo | 5.506 8.438 0.653
cersum | 5.682 9.040 0.629
cercommon | 5.764 9.341 0.617
--------------------------------------------------------------------------------
Robustness across bandwidths:
h=3: Coef. = 3.124 (SE = 1.234)
h=5: Coef. = 2.956 (SE = 0.923)
h=7: Coef. = 2.891 (SE = 0.834)
h=10: Coef. = 2.784 (SE = 0.756)Bandwidth Selection for RD Local Polynomial Regression
Number of Obs. 2847
Kernel Triangular
Order loc. poly. (p) 1
Order bias (q) 2
Bandwidth Estimates:
Type h (left) h (right) b
mserd 8.421 8.421 13.892
msetwo 7.934 7.934 12.156
msesum 8.187 8.187 13.024
msecommon 8.302 8.302 13.458
cerrd 5.847 5.847 9.642
certwo 5.506 5.506 8.438
cersum 5.682 5.682 9.040
cercommon 5.764 5.764 9.341
Robustness check across bandwidths:
h=3: Estimate = 3.124, SE = 1.234, p = 0.011
h=5: Estimate = 2.956, SE = 0.923, p = 0.001
h=7: Estimate = 2.891, SE = 0.834, p < 0.001
h=10: Estimate = 2.784, SE = 0.756, p < 0.001Fuzzy RDD
In fuzzy RDD, the probability of treatment jumps at the cutoff but isn't deterministic. This is essentially IV using the threshold as an instrument for treatment.
# Python: Fuzzy RDD
from rdrobust import rdrobust
# Fuzzy RDD: treatment is not perfectly determined by cutoff
result = rdrobust(
y=df['outcome'],
x=df['x_centered'],
c=0,
fuzzy=df['treatment'] # actual treatment status
)
print(result)
# First stage: check jump in treatment probability
first_stage = rdrobust(
y=df['treatment'],
x=df['x_centered'],
c=0
)
print("First stage jump:", first_stage.coef[0])
* Stata: Fuzzy RDD
* Fuzzy RDD with treatment indicator
rdrobust outcome x_centered, c(0) fuzzy(treatment)
* Check first stage
rdrobust treatment x_centered, c(0)
* Plot first stage discontinuity
rdplot treatment x_centered, c(0) graph_options(title("First Stage"))
# R: Fuzzy RDD
library(rdrobust)
# Fuzzy RDD
rd_fuzzy <- rdrobust(y = df$outcome,
x = df$x_centered,
c = 0,
fuzzy = df$treatment)
summary(rd_fuzzy)
# First stage: jump in treatment at cutoff
first_stage <- rdrobust(y = df$treatment,
x = df$x_centered,
c = 0)
summary(first_stage)
Fuzzy RDD Estimates using Local Polynomial Regression
=====================================================
Number of Obs. 2847
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1423 1424
Eff. Number of Obs. 589 612
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 7.892 7.892
BW bias (b) 12.456 12.456
rho (h/b) 0.634 0.634
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 4.523 1.156 3.912 0.000 [2.257 , 6.789]
Robust - - 3.421 0.001 [1.892 , 7.234]
=============================================================================
First stage jump: 0.628
(Treatment probability jumps by 62.8 percentage points at the cutoff)Fuzzy RD estimates using local polynomial regression.
Cutoff c = 0 | Left of c Right of c Number of obs = 2847
-------------------+---------------------- BW type = mserd
Number of obs | 1423 1424 Kernel = Triangular
Eff. Number of obs | 589 612 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 7.892 7.892
BW bias (b) | 12.456 12.456
rho (h/b) | 0.634 0.634
Outcome: outcome. Running variable: x_centered. Treatment: treatment
--------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | 4.523 1.156 3.912 0.000 2.257 6.789
Robust | - - 3.421 0.001 1.892 7.234
--------------------------------------------------------------------------------
First Stage Estimate:
--------------------------------------------------------------------------------
Conventional | 0.628 0.089 7.056 0.000 0.453 0.803
--------------------------------------------------------------------------------Fuzzy RD estimates using local polynomial regression.
Number of Obs. 2847
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 1423 1424
Eff. Number of Obs. 589 612
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 7.892 7.892
BW bias (b) 12.456 12.456
rho (h/b) 0.634 0.634
Fuzzy RD Estimates:
=============================================================================
Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
=============================================================================
Conventional 4.523 1.156 3.912 0.000 [2.257 , 6.789]
Robust - - 3.421 0.001 [1.892 , 7.234]
=============================================================================
First Stage Results:
Jump in treatment probability at cutoff: 0.628 (SE = 0.089)
First-stage F-statistic: 49.79Diagnostics and Validity Tests
Key validity checks: (1) no manipulation of the running variable, (2) continuity of covariates at cutoff.
# Python: RDD Diagnostics
from rddensity import rddensity
from rdrobust import rdrobust
# 1. McCrary density test (manipulation check)
density_test = rddensity(X=df['x_centered'], c=0)
print(density_test)
# 2. Covariate balance at cutoff
covariates = ['age', 'gender', 'prior_income']
for cov in covariates:
result = rdrobust(y=df[cov], x=df['x_centered'], c=0)
print(f"{cov}: coef={result.coef[0]:.3f}, p={result.pv[0]:.3f}")
# 3. Placebo cutoffs
placebo_cutoffs = [-10, -5, 5, 10] # relative to true cutoff
for pc in placebo_cutoffs:
result = rdrobust(y=df['outcome'], x=df['x_centered'], c=pc)
print(f"Placebo c={pc}: coef={result.coef[0]:.3f}")
* Stata: RDD Diagnostics
* ssc install rddensity
* 1. McCrary density test
rddensity x_centered, c(0) plot
* 2. Covariate balance tests
foreach var in age gender prior_income {
rdrobust `var' x_centered, c(0)
}
* 3. Placebo cutoffs
foreach c in -10 -5 5 10 {
rdrobust outcome x_centered, c(`c')
}
* 4. Donut hole test (exclude obs near cutoff)
rdrobust outcome x_centered if abs(x_centered) > 1, c(0)
# R: RDD Diagnostics
library(rdrobust)
library(rddensity)
# 1. McCrary density test
dens_test <- rddensity(X = df$x_centered, c = 0)
summary(dens_test)
# Plot density
rdplotdensity(rdd = dens_test, X = df$x_centered)
# 2. Covariate balance tests
covariates <- c("age", "gender", "prior_income")
lapply(covariates, function(var) {
rdrobust(y = df[[var]], x = df$x_centered, c = 0)
})
# 3. Placebo cutoffs
placebo <- c(-10, -5, 5, 10)
lapply(placebo, function(c) {
rdrobust(y = df$outcome, x = df$x_centered, c = c)
})
Manipulation Testing using Local Polynomial Density Estimation
==============================================================
Number of Obs. 2847
Cutoff 0
Model unrestricted
Kernel Triangular
BW method comb
Running Variable: x_centered
------------------------------------------
Left of c Right of c
------------------------------------------
Number of Obs. 1423 1424
Eff. Obs. 812 798
BW est. 9.234 8.921
------------------------------------------
Test Statistic p-value
------------------------------------------
Binomial 0.024 0.807
T-statistic -0.412 0.680
------------------------------------------
Note: No evidence of manipulation at the cutoff.
Covariate Balance at Cutoff:
age: coef=0.234, p=0.672 (no discontinuity)
gender: coef=0.012, p=0.891 (no discontinuity)
prior_income: coef=-124.56, p=0.534 (no discontinuity)
Placebo Cutoff Tests:
Placebo c=-10: coef=0.234 (p=0.782)
Placebo c=-5: coef=-0.156 (p=0.856)
Placebo c=5: coef=0.089 (p=0.923)
Placebo c=10: coef=-0.312 (p=0.645)Manipulation testing using local polynomial density estimation.
Cutoff c = 0 | Left of c Right of c Number of obs = 2847
-------------------+---------------------- Model = unrestricted
Number of obs | 1423 1424 Kernel = Triangular
Eff. Number of obs | 812 798 BW method = comb
BW est. | 9.234 8.921
--------------------------------------------------------------------------------
Method | T P>|T|
--------------------+---------------------------
Robust | -0.412 0.680
--------------------+---------------------------
Ho: no discontinuity in density at c = 0
Covariate Balance Tests:
--------------------------------------------------------------------------------
Variable | Coef. Std. Err. z P>|z|
-------------+----------------------------------------------
age | 0.234 0.552 0.424 0.672
gender | 0.012 0.087 0.138 0.891
prior_income | -124.56 199.234 -0.625 0.534
--------------------------------------------------------------------------------
Placebo Cutoff Tests:
c=-10: Coef. = 0.234 (p = 0.782)
c=-5: Coef. = -0.156 (p = 0.856)
c=5: Coef. = 0.089 (p = 0.923)
c=10: Coef. = -0.312 (p = 0.645)Manipulation test using local polynomial density estimation
Cutoff c = 0 Number of obs = 2847
Model = unrestricted Kernel = Triangular
Left of c Right of c
Number of Obs. 1423 1424
Eff. Obs. 812 798
BW est. 9.234 8.921
Test Results:
Statistic p-value
T-statistic -0.412 0.680
------------------------------------------
Conclusion: No evidence of manipulation (fail to reject null)
Covariate Balance Tests:
Coef. Std. Err. z Pr(>|z|)
age 0.234 0.552 0.424 0.672
gender 0.012 0.087 0.138 0.891
prior_income -124.56 199.234 -0.625 0.534
Placebo Cutoff Tests:
Cutoff Estimate Std. Err. p-value
-10 0.234 0.845 0.782
-5 -0.156 0.867 0.856
5 0.089 0.912 0.923
10 -0.312 0.678 0.645Visualization
Always visualize the discontinuity. Good RDD plots show the raw data and fitted relationships on both sides of the cutoff.
# Python: RDD Plot
from rdrobust import rdplot
import matplotlib.pyplot as plt
# rdplot creates binned scatter with polynomial fit
rdplot(
y=df['outcome'],
x=df['x_centered'],
c=0,
nbins=[20, 20], # bins on each side
binselect='es' # evenly spaced
)
# Custom plot
fig, ax = plt.subplots(figsize=(10, 6))
# Scatter with bins
df['bin'] = pd.cut(df['x_centered'], bins=40)
binned = df.groupby('bin').agg({
'x_centered': 'mean',
'outcome': 'mean'
}).dropna()
ax.scatter(binned['x_centered'], binned['outcome'], alpha=0.7)
# Fit lines on each side
import numpy as np
for side, data in df.groupby(df['x_centered'] >= 0):
z = np.polyfit(data['x_centered'], data['outcome'], 1)
p = np.poly1d(z)
x_range = np.linspace(data['x_centered'].min(), data['x_centered'].max(), 100)
ax.plot(x_range, p(x_range), 'r-', linewidth=2)
ax.axvline(x=0, color='gray', linestyle='--')
ax.set_xlabel('Running Variable (centered)')
ax.set_ylabel('Outcome')
plt.show()
* Stata: RDD Plot
* rdplot creates optimal binned scatter plot
rdplot outcome x_centered, c(0)
* With options
rdplot outcome x_centered, c(0) ///
nbins(20 20) ///
graph_options(title("RDD: Effect of Scholarship") ///
xtitle("Test Score (centered)") ///
ytitle("College Enrollment"))
* CI shade plot
rdplot outcome x_centered, c(0) ci(95) shade
# R: RDD Plot
library(rdrobust)
# rdplot creates binned scatter with polynomial fit
rdplot(y = df$outcome,
x = df$x_centered,
c = 0,
title = "RDD: Effect of Scholarship",
x.label = "Test Score (centered)",
y.label = "College Enrollment")
# Custom ggplot version
library(ggplot2)
ggplot(df, aes(x = x_centered, y = outcome)) +
geom_point(alpha = 0.3) +
geom_smooth(data = subset(df, x_centered < 0),
method = "lm", se = TRUE, color = "blue") +
geom_smooth(data = subset(df, x_centered >= 0),
method = "lm", se = TRUE, color = "blue") +
geom_vline(xintercept = 0, linetype = "dashed") +
labs(x = "Running Variable (centered)", y = "Outcome")
RD Plot with evenly-spaced bins
Cutoff c = 0 Number of Obs = 2847
Number of bins: 20 (left) 20 (right)
Bin selection: Evenly spaced
Polynomial order: 4 (fit) 4 (CI)
Left of c Right of c
Number of Obs. 1423 1424
Eff. Obs. 1423 1424
Bin Length 2.50 2.50
IMSE-optimal bins 17 19
[Plot displayed: Binned scatter plot showing clear discontinuity at cutoff]
Left side mean (at cutoff): 45.23
Right side mean (at cutoff): 48.08
Visual gap at cutoff: ~2.85 pointsRD Plot with evenly-spaced bins.
Cutoff c = 0 | Left of c Right of c Number of obs = 2847
-------------------+---------------------- Kernel = Uniform
Number of obs | 1423 1424
Eff. Number of obs| 1423 1424
Number of bins| 20 20
Bin Length | 2.50 2.50
Poly. Order (fit) | 4 4
Poly. Order (CI) | 4 4
[Graph saved as rdd_plot.gph]
Visual inspection shows clear discontinuity:
- Left limit at cutoff: 45.23
- Right limit at cutoff: 48.08
- Estimated jump: 2.85RD Plot
Cutoff c = 0 Number of Obs = 2847
Number of bins: 20 (left), 20 (right)
Bin selection: Evenly spaced
Polynomial order: 4 (fit), 4 (CI)
Left of c Right of c
Number of Obs. 1423 1424
Eff. Obs. 1423 1424
Bin Length 2.50 2.50
IMSE-optimal bins 17 19
[Plot: Binned scatter with fitted polynomial on each side of cutoff]
Title: "RDD: Effect of Scholarship"
X-axis: Test Score (centered)
Y-axis: College Enrollment
Visual discontinuity estimate:
Left limit: 0.452 (enrollment rate)
Right limit: 0.481 (enrollment rate)
Jump: 0.029 (2.9 percentage points)RDD identifies the effect only at the cutoff—the Local Average Treatment Effect (LATE). This may not generalize to units far from the cutoff. Be careful about extrapolation.
- Calonico, S., Cattaneo, M., & Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica.
- Cattaneo, M., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs. Cambridge.
- Lee, D. & Lemieux, T. (2010). "Regression Discontinuity Designs in Economics." JEL.