6C  Regression Discontinuity

~2 hours Sharp, Fuzzy, Diagnostics

Sharp RDD

In a sharp RDD, treatment is deterministically assigned at the cutoff: everyone above is treated, everyone below is not.

# Python: Sharp RDD with rdrobust
# pip install rdrobust
from rdrobust import rdrobust, rdbwselect

# running_var: the score that determines treatment (e.g., test score)
# cutoff: the threshold (e.g., 50 points)

# Center the running variable at cutoff
df['x_centered'] = df['running_var'] - cutoff

# Sharp RDD estimation
result = rdrobust(
    y=df['outcome'],
    x=df['x_centered'],
    c=0  # cutoff (centered)
)
print(result)

# With covariates
result = rdrobust(
    y=df['outcome'],
    x=df['x_centered'],
    c=0,
    covs=df[['age', 'gender']]
)
* Stata: Sharp RDD with rdrobust
* ssc install rdrobust

* Center running variable
gen x_centered = running_var - 50

* Sharp RDD
rdrobust outcome x_centered, c(0)

* With covariates
rdrobust outcome x_centered, c(0) covs(age gender)

* Specify kernel and polynomial order
rdrobust outcome x_centered, c(0) kernel(triangular) p(1)
# R: Sharp RDD with rdrobust
library(rdrobust)

# Center running variable
df$x_centered <- df$running_var - 50

# Sharp RDD
rd_result <- rdrobust(y = df$outcome,
                      x = df$x_centered,
                      c = 0)
summary(rd_result)

# With covariates
rd_result <- rdrobust(y = df$outcome,
                      x = df$x_centered,
                      c = 0,
                      covs = cbind(df$age, df$gender))
Python Output
Sharp RDD Estimates using Local Polynomial Regression
=====================================================

Number of Obs.                 2847
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 1423        1424
Eff. Number of Obs.             612         587
Order est. (p)                    1           1
Order bias (q)                    2           2
BW est. (h)                   8.421       8.421
BW bias (b)                  13.892      13.892
rho (h/b)                     0.606       0.606

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]
=============================================================================
  Conventional     2.847     0.792     3.595     0.000     [1.295 , 4.399]
        Robust         -         -     3.142     0.002     [1.084 , 4.743]
=============================================================================
Stata Output
Sharp RD estimates using local polynomial regression.

      Cutoff c = 0 | Left of c  Right of c       Number of obs =       2847
-------------------+----------------------       BW type       =      mserd
     Number of obs |      1423        1424       Kernel        = Triangular
Eff. Number of obs |       612         587       VCE method    =         NN
    Order est. (p) |         1           1
    Order bias (q) |         2           2
       BW est. (h) |     8.421       8.421
       BW bias (b) |    13.892      13.892
         rho (h/b) |     0.606       0.606

Outcome: outcome. Running variable: x_centered.
--------------------------------------------------------------------------------
            Method |   Coef.    Std. Err.    z      P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
      Conventional |   2.847      0.792    3.595    0.000      1.295       4.399
            Robust |      -          -     3.142    0.002      1.084       4.743
--------------------------------------------------------------------------------
R Output
Sharp RD estimates using local polynomial regression.

Number of Obs.                 2847
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 1423         1424
Eff. Number of Obs.             612          587
Order est. (p)                    1            1
Order bias (q)                    2            2
BW est. (h)                   8.421        8.421
BW bias (b)                  13.892       13.892
rho (h/b)                     0.606        0.606

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]
=============================================================================
  Conventional     2.847     0.792     3.595     0.000     [1.295 , 4.399]
        Robust         -         -     3.142     0.002     [1.084 , 4.743]
=============================================================================

Bandwidth Selection

The bandwidth determines how much data around the cutoff is used. Narrower bandwidth = more local but less precise. Modern methods select optimal bandwidth automatically.

# Python: Bandwidth Selection
from rdrobust import rdbwselect

# Optimal bandwidth selection
bw_result = rdbwselect(
    y=df['outcome'],
    x=df['x_centered'],
    c=0
)
print(bw_result)

# Use specific bandwidth
result = rdrobust(
    y=df['outcome'],
    x=df['x_centered'],
    c=0,
    h=5  # manual bandwidth
)

# Robustness: different bandwidths
for h in [3, 5, 7, 10]:
    result = rdrobust(y=df['outcome'], x=df['x_centered'], c=0, h=h)
    print(f"h={h}: estimate={result.coef[0]:.3f}")
* Stata: Bandwidth Selection

* Optimal bandwidth
rdbwselect outcome x_centered, c(0)

* With different methods
rdbwselect outcome x_centered, c(0) bwselect(mserd)  // MSE
rdbwselect outcome x_centered, c(0) bwselect(cerrd)  // CER

* Robustness check: vary bandwidth
foreach h in 3 5 7 10 {
    rdrobust outcome x_centered, c(0) h(`h')
}
# R: Bandwidth Selection
library(rdrobust)

# Optimal bandwidth
bw <- rdbwselect(y = df$outcome, x = df$x_centered, c = 0)
summary(bw)

# Use specific bandwidth
rd_result <- rdrobust(y = df$outcome, x = df$x_centered, c = 0, h = 5)

# Robustness: multiple bandwidths
bandwidths <- c(3, 5, 7, 10)
results <- lapply(bandwidths, function(h) {
  rdrobust(y = df$outcome, x = df$x_centered, c = 0, h = h)
})
# Extract and compare estimates
Python Output
Bandwidth Selection using local polynomial regression

Number of Obs.                 2847
Kernel                   Triangular
Order loc. poly. (p)              1
Order bias (q)                    2

=======================================================
         BW Type         BW (h)      BW (b)
=======================================================
         mserd           8.421      13.892
         msetwo          7.934      12.156
         msesum          8.187      13.024
         msecommon       8.302      13.458
         cerrd           5.847       9.642
         certwo          5.506       8.438
         cersum          5.682       9.040
         cercommon       5.764       9.341
=======================================================

Robustness to bandwidth choice:
h=3: estimate=3.124
h=5: estimate=2.956
h=7: estimate=2.891
h=10: estimate=2.784
Stata Output
Bandwidth estimators for local polynomial regression

      Cutoff c = 0 | Left of c  Right of c       Number of obs =       2847
-------------------+----------------------       Kernel        = Triangular
     Number of obs |      1423        1424       VCE method    =         NN
Order loc. poly (p)|         1           1
   Order bias (q)  |         2           2

--------------------------------------------------------------------------------
            Method |         h          b                rho
-------------------+------------------------------------------------------------
             mserd |     8.421     13.892              0.606
            msetwo |     7.934     12.156              0.653
            msesum |     8.187     13.024              0.629
         msecommon |     8.302     13.458              0.617
             cerrd |     5.847      9.642              0.606
            certwo |     5.506      8.438              0.653
            cersum |     5.682      9.040              0.629
         cercommon |     5.764      9.341              0.617
--------------------------------------------------------------------------------

Robustness across bandwidths:
h=3:  Coef. = 3.124  (SE = 1.234)
h=5:  Coef. = 2.956  (SE = 0.923)
h=7:  Coef. = 2.891  (SE = 0.834)
h=10: Coef. = 2.784  (SE = 0.756)
R Output
Bandwidth Selection for RD Local Polynomial Regression

Number of Obs.                 2847
Kernel                   Triangular
Order loc. poly. (p)              1
Order bias (q)                    2

Bandwidth Estimates:
          Type       h (left)    h (right)        b
         mserd          8.421        8.421   13.892
         msetwo         7.934        7.934   12.156
         msesum         8.187        8.187   13.024
      msecommon         8.302        8.302   13.458
         cerrd          5.847        5.847    9.642
         certwo         5.506        5.506    8.438
         cersum         5.682        5.682    9.040
      cercommon         5.764        5.764    9.341

Robustness check across bandwidths:
  h=3:  Estimate = 3.124, SE = 1.234, p = 0.011
  h=5:  Estimate = 2.956, SE = 0.923, p = 0.001
  h=7:  Estimate = 2.891, SE = 0.834, p < 0.001
  h=10: Estimate = 2.784, SE = 0.756, p < 0.001

Fuzzy RDD

In fuzzy RDD, the probability of treatment jumps at the cutoff but isn't deterministic. This is essentially IV using the threshold as an instrument for treatment.

# Python: Fuzzy RDD
from rdrobust import rdrobust

# Fuzzy RDD: treatment is not perfectly determined by cutoff
result = rdrobust(
    y=df['outcome'],
    x=df['x_centered'],
    c=0,
    fuzzy=df['treatment']  # actual treatment status
)
print(result)

# First stage: check jump in treatment probability
first_stage = rdrobust(
    y=df['treatment'],
    x=df['x_centered'],
    c=0
)
print("First stage jump:", first_stage.coef[0])
* Stata: Fuzzy RDD

* Fuzzy RDD with treatment indicator
rdrobust outcome x_centered, c(0) fuzzy(treatment)

* Check first stage
rdrobust treatment x_centered, c(0)

* Plot first stage discontinuity
rdplot treatment x_centered, c(0) graph_options(title("First Stage"))
# R: Fuzzy RDD
library(rdrobust)

# Fuzzy RDD
rd_fuzzy <- rdrobust(y = df$outcome,
                     x = df$x_centered,
                     c = 0,
                     fuzzy = df$treatment)
summary(rd_fuzzy)

# First stage: jump in treatment at cutoff
first_stage <- rdrobust(y = df$treatment,
                        x = df$x_centered,
                        c = 0)
summary(first_stage)
Python Output
Fuzzy RDD Estimates using Local Polynomial Regression
=====================================================

Number of Obs.                 2847
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 1423        1424
Eff. Number of Obs.             589         612
Order est. (p)                    1           1
Order bias (q)                    2           2
BW est. (h)                   7.892       7.892
BW bias (b)                  12.456      12.456
rho (h/b)                     0.634       0.634

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]
=============================================================================
  Conventional     4.523     1.156     3.912     0.000     [2.257 , 6.789]
        Robust         -         -     3.421     0.001     [1.892 , 7.234]
=============================================================================

First stage jump: 0.628
(Treatment probability jumps by 62.8 percentage points at the cutoff)
Stata Output
Fuzzy RD estimates using local polynomial regression.

      Cutoff c = 0 | Left of c  Right of c       Number of obs =       2847
-------------------+----------------------       BW type       =      mserd
     Number of obs |      1423        1424       Kernel        = Triangular
Eff. Number of obs |       589         612       VCE method    =         NN
    Order est. (p) |         1           1
    Order bias (q) |         2           2
       BW est. (h) |     7.892       7.892
       BW bias (b) |    12.456      12.456
         rho (h/b) |     0.634       0.634

Outcome: outcome. Running variable: x_centered. Treatment: treatment
--------------------------------------------------------------------------------
            Method |   Coef.    Std. Err.    z      P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
      Conventional |   4.523      1.156    3.912    0.000      2.257       6.789
            Robust |      -          -     3.421    0.001      1.892       7.234
--------------------------------------------------------------------------------

First Stage Estimate:
--------------------------------------------------------------------------------
      Conventional |   0.628      0.089    7.056    0.000      0.453       0.803
--------------------------------------------------------------------------------
R Output
Fuzzy RD estimates using local polynomial regression.

Number of Obs.                 2847
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 1423         1424
Eff. Number of Obs.             589          612
Order est. (p)                    1            1
Order bias (q)                    2            2
BW est. (h)                   7.892        7.892
BW bias (b)                  12.456       12.456
rho (h/b)                     0.634        0.634

Fuzzy RD Estimates:
=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]
=============================================================================
  Conventional     4.523     1.156     3.912     0.000     [2.257 , 6.789]
        Robust         -         -     3.421     0.001     [1.892 , 7.234]
=============================================================================

First Stage Results:
Jump in treatment probability at cutoff: 0.628 (SE = 0.089)
First-stage F-statistic: 49.79

Diagnostics and Validity Tests

Key validity checks: (1) no manipulation of the running variable, (2) continuity of covariates at cutoff.

# Python: RDD Diagnostics
from rddensity import rddensity
from rdrobust import rdrobust

# 1. McCrary density test (manipulation check)
density_test = rddensity(X=df['x_centered'], c=0)
print(density_test)

# 2. Covariate balance at cutoff
covariates = ['age', 'gender', 'prior_income']
for cov in covariates:
    result = rdrobust(y=df[cov], x=df['x_centered'], c=0)
    print(f"{cov}: coef={result.coef[0]:.3f}, p={result.pv[0]:.3f}")

# 3. Placebo cutoffs
placebo_cutoffs = [-10, -5, 5, 10]  # relative to true cutoff
for pc in placebo_cutoffs:
    result = rdrobust(y=df['outcome'], x=df['x_centered'], c=pc)
    print(f"Placebo c={pc}: coef={result.coef[0]:.3f}")
* Stata: RDD Diagnostics
* ssc install rddensity

* 1. McCrary density test
rddensity x_centered, c(0) plot

* 2. Covariate balance tests
foreach var in age gender prior_income {
    rdrobust `var' x_centered, c(0)
}

* 3. Placebo cutoffs
foreach c in -10 -5 5 10 {
    rdrobust outcome x_centered, c(`c')
}

* 4. Donut hole test (exclude obs near cutoff)
rdrobust outcome x_centered if abs(x_centered) > 1, c(0)
# R: RDD Diagnostics
library(rdrobust)
library(rddensity)

# 1. McCrary density test
dens_test <- rddensity(X = df$x_centered, c = 0)
summary(dens_test)

# Plot density
rdplotdensity(rdd = dens_test, X = df$x_centered)

# 2. Covariate balance tests
covariates <- c("age", "gender", "prior_income")
lapply(covariates, function(var) {
  rdrobust(y = df[[var]], x = df$x_centered, c = 0)
})

# 3. Placebo cutoffs
placebo <- c(-10, -5, 5, 10)
lapply(placebo, function(c) {
  rdrobust(y = df$outcome, x = df$x_centered, c = c)
})
Python Output
Manipulation Testing using Local Polynomial Density Estimation
==============================================================

Number of Obs.                 2847
Cutoff                            0
Model                        unrestricted
Kernel                       Triangular
BW method                    comb

Running Variable: x_centered
------------------------------------------
            Left of c    Right of c
------------------------------------------
Number of Obs.   1423         1424
Eff. Obs.         812          798
BW est.         9.234        8.921
------------------------------------------

Test                 Statistic    p-value
------------------------------------------
Binomial                 0.024      0.807
T-statistic             -0.412      0.680
------------------------------------------
Note: No evidence of manipulation at the cutoff.

Covariate Balance at Cutoff:
age: coef=0.234, p=0.672 (no discontinuity)
gender: coef=0.012, p=0.891 (no discontinuity)
prior_income: coef=-124.56, p=0.534 (no discontinuity)

Placebo Cutoff Tests:
Placebo c=-10: coef=0.234 (p=0.782)
Placebo c=-5: coef=-0.156 (p=0.856)
Placebo c=5: coef=0.089 (p=0.923)
Placebo c=10: coef=-0.312 (p=0.645)
Stata Output
Manipulation testing using local polynomial density estimation.

      Cutoff c = 0 | Left of c  Right of c       Number of obs =       2847
-------------------+----------------------       Model         = unrestricted
     Number of obs |      1423        1424       Kernel        =  Triangular
Eff. Number of obs |       812         798       BW method     =        comb
         BW est.   |     9.234       8.921
--------------------------------------------------------------------------------

Method              |       T          P>|T|
--------------------+---------------------------
Robust              |    -0.412        0.680
--------------------+---------------------------
Ho: no discontinuity in density at c = 0

Covariate Balance Tests:
--------------------------------------------------------------------------------
    Variable |   Coef.    Std. Err.    z      P>|z|
-------------+----------------------------------------------
         age |   0.234      0.552    0.424    0.672
      gender |   0.012      0.087    0.138    0.891
prior_income | -124.56    199.234   -0.625    0.534
--------------------------------------------------------------------------------

Placebo Cutoff Tests:
c=-10: Coef. = 0.234  (p = 0.782)
c=-5:  Coef. = -0.156 (p = 0.856)
c=5:   Coef. = 0.089  (p = 0.923)
c=10:  Coef. = -0.312 (p = 0.645)
R Output
Manipulation test using local polynomial density estimation

Cutoff c = 0           Number of obs = 2847
Model = unrestricted   Kernel = Triangular

                   Left of c    Right of c
Number of Obs.          1423          1424
Eff. Obs.                812           798
BW est.                9.234         8.921

Test Results:
                 Statistic     p-value
T-statistic         -0.412       0.680
------------------------------------------
Conclusion: No evidence of manipulation (fail to reject null)

Covariate Balance Tests:
              Coef.    Std. Err.       z     Pr(>|z|)
age           0.234       0.552    0.424        0.672
gender        0.012       0.087    0.138        0.891
prior_income -124.56    199.234   -0.625        0.534

Placebo Cutoff Tests:
  Cutoff   Estimate   Std. Err.   p-value
     -10      0.234       0.845     0.782
      -5     -0.156       0.867     0.856
       5      0.089       0.912     0.923
      10     -0.312       0.678     0.645

Visualization

Always visualize the discontinuity. Good RDD plots show the raw data and fitted relationships on both sides of the cutoff.

# Python: RDD Plot
from rdrobust import rdplot
import matplotlib.pyplot as plt

# rdplot creates binned scatter with polynomial fit
rdplot(
    y=df['outcome'],
    x=df['x_centered'],
    c=0,
    nbins=[20, 20],  # bins on each side
    binselect='es'   # evenly spaced
)

# Custom plot
fig, ax = plt.subplots(figsize=(10, 6))

# Scatter with bins
df['bin'] = pd.cut(df['x_centered'], bins=40)
binned = df.groupby('bin').agg({
    'x_centered': 'mean',
    'outcome': 'mean'
}).dropna()
ax.scatter(binned['x_centered'], binned['outcome'], alpha=0.7)

# Fit lines on each side
import numpy as np
for side, data in df.groupby(df['x_centered'] >= 0):
    z = np.polyfit(data['x_centered'], data['outcome'], 1)
    p = np.poly1d(z)
    x_range = np.linspace(data['x_centered'].min(), data['x_centered'].max(), 100)
    ax.plot(x_range, p(x_range), 'r-', linewidth=2)

ax.axvline(x=0, color='gray', linestyle='--')
ax.set_xlabel('Running Variable (centered)')
ax.set_ylabel('Outcome')
plt.show()
* Stata: RDD Plot

* rdplot creates optimal binned scatter plot
rdplot outcome x_centered, c(0)

* With options
rdplot outcome x_centered, c(0) ///
    nbins(20 20) ///
    graph_options(title("RDD: Effect of Scholarship") ///
                  xtitle("Test Score (centered)") ///
                  ytitle("College Enrollment"))

* CI shade plot
rdplot outcome x_centered, c(0) ci(95) shade
# R: RDD Plot
library(rdrobust)

# rdplot creates binned scatter with polynomial fit
rdplot(y = df$outcome,
       x = df$x_centered,
       c = 0,
       title = "RDD: Effect of Scholarship",
       x.label = "Test Score (centered)",
       y.label = "College Enrollment")

# Custom ggplot version
library(ggplot2)

ggplot(df, aes(x = x_centered, y = outcome)) +
  geom_point(alpha = 0.3) +
  geom_smooth(data = subset(df, x_centered < 0),
              method = "lm", se = TRUE, color = "blue") +
  geom_smooth(data = subset(df, x_centered >= 0),
              method = "lm", se = TRUE, color = "blue") +
  geom_vline(xintercept = 0, linetype = "dashed") +
  labs(x = "Running Variable (centered)", y = "Outcome")
Python Output
RD Plot with evenly-spaced bins

Cutoff c = 0                         Number of Obs = 2847
Number of bins:  20 (left)  20 (right)
Bin selection:   Evenly spaced
Polynomial order: 4 (fit)  4 (CI)

                Left of c    Right of c
Number of Obs.       1423          1424
Eff. Obs.            1423          1424
Bin Length           2.50          2.50
IMSE-optimal bins      17            19

[Plot displayed: Binned scatter plot showing clear discontinuity at cutoff]

Left side mean (at cutoff):  45.23
Right side mean (at cutoff): 48.08
Visual gap at cutoff: ~2.85 points
Stata Output
RD Plot with evenly-spaced bins.

      Cutoff c = 0 | Left of c  Right of c       Number of obs =       2847
-------------------+----------------------       Kernel        =     Uniform
     Number of obs |      1423        1424
 Eff. Number of obs|      1423        1424
     Number of bins|        20          20
      Bin Length   |      2.50        2.50
Poly. Order (fit)  |         4           4
Poly. Order (CI)   |         4           4

[Graph saved as rdd_plot.gph]

Visual inspection shows clear discontinuity:
  - Left limit at cutoff:  45.23
  - Right limit at cutoff: 48.08
  - Estimated jump:         2.85
R Output
RD Plot

Cutoff c = 0                      Number of Obs = 2847
Number of bins: 20 (left), 20 (right)
Bin selection: Evenly spaced
Polynomial order: 4 (fit), 4 (CI)

                  Left of c    Right of c
Number of Obs.         1423          1424
Eff. Obs.              1423          1424
Bin Length             2.50          2.50
IMSE-optimal bins        17            19

[Plot: Binned scatter with fitted polynomial on each side of cutoff]
Title: "RDD: Effect of Scholarship"
X-axis: Test Score (centered)
Y-axis: College Enrollment

Visual discontinuity estimate:
  Left limit:  0.452 (enrollment rate)
  Right limit: 0.481 (enrollment rate)
  Jump:        0.029 (2.9 percentage points)
Local Average Treatment Effect

RDD identifies the effect only at the cutoff—the Local Average Treatment Effect (LATE). This may not generalize to units far from the cutoff. Be careful about extrapolation.

References
  • Calonico, S., Cattaneo, M., & Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica.
  • Cattaneo, M., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs. Cambridge.
  • Lee, D. & Lemieux, T. (2010). "Regression Discontinuity Designs in Economics." JEL.