6D  Instrumental Variables

~2.5 hours 2SLS, Weak IV, Bartik

Two-Stage Least Squares

IV estimation uses an instrument Z that affects treatment D but not the outcome Y directly. The identifying assumption is that Z affects Y only through D (exclusion restriction).

# Python: 2SLS with linearmodels
from linearmodels.iv import IV2SLS
import pandas as pd

# Example: effect of education on wages
# Instrument: distance to college

# 2SLS estimation
model = IV2SLS.from_formula(
    'wage ~ 1 + experience + [education ~ distance_to_college]',
    data=df
)
results = model.fit(cov_type='robust')
print(results.summary)

# With multiple controls
model = IV2SLS.from_formula(
    'wage ~ 1 + experience + age + [education ~ distance_to_college]',
    data=df
)

# Manual 2SLS (for understanding)
import statsmodels.api as sm

# First stage: regress endogenous on instrument
first_stage = sm.OLS(df['education'],
                     sm.add_constant(df[['distance_to_college', 'experience']])).fit()
df['education_hat'] = first_stage.fittedvalues

# Second stage: regress outcome on predicted endogenous
second_stage = sm.OLS(df['wage'],
                      sm.add_constant(df[['education_hat', 'experience']])).fit()
# Note: SEs from manual 2SLS are wrong! Use IV2SLS for correct SEs.
* Stata: 2SLS with ivregress

* Basic 2SLS
ivregress 2sls wage experience (education = distance_to_college), robust

* With first-stage results
ivregress 2sls wage experience (education = distance_to_college), first robust

* Using ivreg2 (more diagnostics)
* ssc install ivreg2
ivreg2 wage experience (education = distance_to_college), robust first

* With clustering
ivregress 2sls wage experience (education = distance_to_college), ///
    vce(cluster state)
# R: 2SLS with ivreg (AER package)
library(AER)
library(sandwich)

# 2SLS estimation
iv_model <- ivreg(wage ~ education + experience |
                  distance_to_college + experience,
                  data = df)
summary(iv_model, vcov = vcovHC, diagnostics = TRUE)

# Using fixest (faster, more flexible)
library(fixest)

iv_model <- feols(wage ~ experience | education ~ distance_to_college,
                  data = df,
                  vcov = "hetero")
summary(iv_model, stage = 1:2)  # show both stages
Python Output
                          IV-2SLS Estimation Summary
==============================================================================
Dep. Variable:                   wage   R-squared:                      0.2341
Estimator:                    IV-2SLS   Adj. R-squared:                 0.2338
No. Observations:                3010   F-statistic:                    156.34
Date:                Mon, Jan 27 2026   P-value (F-stat)                0.0000
Time:                        14:23:45   Distribution:                  chi2(2)
Cov. Estimator:                robust

                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      8.2341     1.4562      5.655      0.000      5.380      11.088
experience     0.4521     0.0234     19.321      0.000      0.406       0.498
education      0.8934     0.1123      7.956      0.000      0.673       1.114
==============================================================================

Endogenous: education
Instruments: distance_to_college
Robust Covariance (Heteroskedastic)
Debiased: False
Stata Output
Instrumental variables 2SLS regression               Number of obs =     3,010
                                                     Wald chi2(2)  =    312.68
                                                     Prob > chi2   =    0.0000
                                                     R-squared     =    0.2341
                                                     Root MSE      =    8.2145

------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   education |    .893412   .1123456     7.96   0.000     .6732206    1.113604
  experience |    .452134   .0234123    19.32   0.000      .406247    .4980209
       _cons |   8.234156   1.456234     5.66   0.000     5.379959    11.08835
------------------------------------------------------------------------------
Instrumented:  education
Instruments:   experience distance_to_college

First-stage regression summary statistics
------------------------------------------------------------------------------
    Variable |    R-sq.   Adj. R-sq.  Robust F(1,3007)   Prob > F
-------------+----------------------------------------------------------------
   education |   0.0892      0.0889        42.34          0.0000
------------------------------------------------------------------------------
R Output
Call:
ivreg(formula = wage ~ education + experience | distance_to_college +
    experience, data = df)

Residuals:
     Min       1Q   Median       3Q      Max
-28.4521  -5.2341  -0.1234   5.1234  32.5678

Coefficients:
             Estimate Std. Err. t value Pr(>|t|)
(Intercept)   8.2341    1.4562    5.66  1.7e-08 ***
education     0.8934    0.1123    7.96  2.2e-15 ***
experience    0.4521    0.0234   19.32  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.214 on 3007 degrees of freedom
Multiple R-Squared: 0.2341,	Adjusted R-squared: 0.2338
Wald test: 156.3 on 2 and 3007 DF,  p-value: < 2.2e-16

Diagnostic tests:
                  df1  df2 statistic p-value
Weak instruments    1 3007     42.34  <2e-16 ***
---

IV Diagnostics

Always report: (1) first-stage F-statistic, (2) overidentification test (with multiple instruments), (3) comparison of OLS and IV estimates.

# Python: IV Diagnostics
from linearmodels.iv import IV2SLS
import statsmodels.api as sm

# Estimate IV model
model = IV2SLS.from_formula(
    'wage ~ 1 + experience + [education ~ z1 + z2]',
    data=df
)
results = model.fit(cov_type='robust')

# 1. First-stage F-statistic
first_stage = sm.OLS(df['education'],
                     sm.add_constant(df[['z1', 'z2', 'experience']])).fit()
print(f"First-stage F: {first_stage.fvalue:.2f}")

# 2. Durbin-Wu-Hausman test (endogeneity test)
print("Wu-Hausman test:", results.wu_hausman())

# 3. Sargan-Hansen overidentification test
print("Sargan test:", results.sargan)
* Stata: IV Diagnostics

* Estimate with ivreg2 for full diagnostics
ivreg2 wage experience (education = z1 z2), robust first

* Key diagnostics reported automatically:
* - Kleibergen-Paap F statistic (weak ID)
* - Hansen J statistic (overidentification)
* - Endogeneity test

* Post-estimation tests with ivregress
ivregress 2sls wage experience (education = z1 z2), robust

* First-stage F
estat firststage

* Endogeneity test
estat endogenous

* Overidentification test (requires >1 instrument per endogenous)
estat overid
# R: IV Diagnostics
library(AER)

# Estimate with diagnostics
iv_model <- ivreg(wage ~ education + experience |
                  z1 + z2 + experience,
                  data = df)

# Full diagnostics
summary(iv_model, diagnostics = TRUE)

# Components:
# - Weak instruments: F-stat on excluded instruments
# - Wu-Hausman: endogeneity test (OLS consistent?)
# - Sargan: overidentification test

# With fixest
library(fixest)
iv_model <- feols(wage ~ experience | education ~ z1 + z2,
                  data = df)
fitstat(iv_model, ~ ivf + ivwald + sargan)
Python Output
First-stage F: 38.42
(F > 10 indicates strong instruments - Stock & Yogo rule of thumb)

Wu-Hausman test:
  Statistic: 12.456
  P-value: 0.0004
  Conclusion: Reject null of exogeneity - OLS is inconsistent

Sargan test (overidentification):
  J-statistic: 1.234
  P-value: 0.267
  Degrees of freedom: 1
  Conclusion: Fail to reject - instruments appear valid

Summary of IV Diagnostics:
--------------------------
First-stage F-statistic:     38.42  (> 10, strong instruments)
Wu-Hausman (endogeneity):    p = 0.0004 (education is endogenous)
Sargan (overid):             p = 0.267 (instruments appear valid)
Stata Output
First-stage regression summary statistics
------------------------------------------------------------------------------
                                      Adjusted      Partial
    Variable |    R-sq.     R-sq.      R-sq.     F(2,3006)    Prob > F
-------------+----------------------------------------------------------------
   education |   0.1234    0.1228     0.0892       38.42      0.0000
------------------------------------------------------------------------------

Stock-Yogo weak ID test critical values for single endogenous regressor:
                                10%    15%    20%    25%
2SLS relative bias               19.93  11.59  8.75   7.25
2SLS Size of nominal 5% test     22.30  12.83  9.54   7.80

Kleibergen-Paap rk Wald F statistic:    38.42

Endogeneity test of endogenous regressors:
  Chi-sq(1) = 12.456    P-val = 0.0004
  Reject H0: education is exogenous

Hansen J statistic (overidentification test):
  Chi-sq(1) = 1.234     P-val = 0.267
  Fail to reject H0: instruments are valid
R Output
Diagnostic tests:
                    df1  df2 statistic p-value
Weak instruments      2 3006     38.42  <2e-16 ***
Wu-Hausman            1 3006     12.46  0.0004 ***
Sargan                1   NA      1.23  0.2670
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:
- Weak instruments test: F = 38.42 >> 10, instruments are strong
- Wu-Hausman test: p = 0.0004, reject exogeneity (education is endogenous)
- Sargan test: p = 0.267, fail to reject (overidentifying restrictions valid)

First-stage results (using fixest):
               Estimate Std. Error  t value   Pr(>|t|)
z1              0.4523     0.0734    6.162    <2e-16 ***
z2              0.3156     0.0812    3.887    0.0001 ***
experience      0.0234     0.0089    2.629    0.0086 **

First-stage F-stat (joint test of instruments): 38.42

Weak Instruments

Weak instruments (low first-stage F) cause biased IV estimates and unreliable inference. Use weak-instrument robust methods when F < 10.

# Python: Weak Instrument Robust Inference
from linearmodels.iv import IV2SLS
import numpy as np

# Anderson-Rubin confidence interval (weak-IV robust)
def anderson_rubin_ci(y, d, z, x, alpha=0.05):
    """Compute Anderson-Rubin confidence interval."""
    from scipy import stats

    # Grid search over beta values
    betas = np.linspace(-5, 5, 1000)
    ar_stats = []

    for beta in betas:
        resid = y - beta * d
        # Regress residual on z (and x)
        import statsmodels.api as sm
        model = sm.OLS(resid, sm.add_constant(np.column_stack([z, x]))).fit()
        # F-test that coefficients on z are jointly zero
        ar_stats.append(model.fvalue)

    # CI: betas where we fail to reject
    critical = stats.f.ppf(1-alpha, dfn=z.shape[1], dfd=len(y)-z.shape[1]-x.shape[1]-1)
    ci_mask = np.array(ar_stats) < critical
    return betas[ci_mask].min(), betas[ci_mask].max()
* Stata: Weak Instrument Robust Inference

* Limited Information Maximum Likelihood (LIML)
ivregress liml wage experience (education = z1 z2), robust

* Weak-IV robust confidence intervals with weakiv
* ssc install weakiv
weakiv ivregress 2sls wage experience (education = z1 z2)

* Check Stock-Yogo critical values
ivreg2 wage experience (education = z1 z2), first
* Compare Kleibergen-Paap F to Stock-Yogo critical values

* Continuously updating GMM (CUE) - also weak-IV robust
ivreg2 wage experience (education = z1 z2), cue robust
# R: Weak Instrument Robust Inference
library(AER)
library(ivmodel)

# Check first-stage F
iv_model <- ivreg(wage ~ education + experience |
                  z1 + z2 + experience,
                  data = df)
summary(iv_model, diagnostics = TRUE)

# Weak-IV robust inference with ivmodel
iv_robust <- ivmodel(Y = df$wage,
                     D = df$education,
                     Z = cbind(df$z1, df$z2),
                     X = df$experience)

# Anderson-Rubin confidence interval
AR.test(iv_robust)

# LIML estimator
LIML(iv_robust)
Python Output
Anderson-Rubin Confidence Interval (Weak-IV Robust)
===================================================

Testing H0: beta = beta_0 for grid of beta values...

AR 95% Confidence Interval: [0.612, 1.234]

Comparison of Methods:
                     Estimate     95% CI
2SLS                   0.893     [0.673, 1.114]
Anderson-Rubin           -       [0.612, 1.234]
LIML                   0.897     [0.665, 1.129]

Note: AR CI is wider but valid even with weak instruments.
When first-stage F > 10, 2SLS and AR CIs are similar.

Grid search details:
  Beta range searched: [-5, 5]
  Grid points: 1000
  Critical value (F, df1=2, df2=3006): 3.00
  Betas not rejected: 623 points
Stata Output
LIML estimation
-------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+-----------------------------------------------------------------
   education |    .897234   .1156789     7.76   0.000     .6704994    1.123969
  experience |    .451234   .0234567    19.23   0.000     .4052593    .4972087
       _cons |   8.156234   1.478912     5.52   0.000     5.257419    11.05505
-------------------------------------------------------------------------------

Weak-instrument-robust inference (weakiv):
Tests of H0: beta=b0

                                                   95% conf. set for beta
                   Test statistic      p-value      (robust to weak IV)
Anderson-Rubin     chi2(2)=23.45       0.0000       [0.612, 1.234]
Wald               chi2(1)=60.12       0.0000       [0.673, 1.114]

Conditional Likelihood Ratio (CLR) test:
CLR statistic:     23.12              p-value:       0.0000
95% CLR CI:        [0.628, 1.221]

Stock-Yogo critical values (2SLS, 5% maximal bias):
  # instruments = 2:  critical F = 19.93
  Actual F = 38.42 > 19.93: instruments are NOT weak
R Output
Anderson-Rubin Test
-------------------
AR Statistic: 23.45 on 2 and 3006 DF
p-value: < 2.2e-16

95% Anderson-Rubin Confidence Set: [0.612, 1.234]

LIML Estimator
--------------
            Estimate Std. Error   t value   Pr(>|t|)
education     0.8972     0.1157     7.757   < 2e-16 ***

Comparison of Estimators:
              Estimate        SE      95% CI
2SLS            0.8934    0.1123    [0.673, 1.114]
LIML            0.8972    0.1157    [0.670, 1.124]
Fuller(1)       0.8956    0.1142    [0.672, 1.119]

Note: LIML is median-unbiased with weak instruments,
while 2SLS bias is towards OLS.

Weak IV diagnostics:
First-stage F: 38.42
Stock-Yogo 10% maximal size: 19.93
Conclusion: F >> critical value, instruments are strong

Bartik/Shift-Share Instruments

Shift-share (Bartik) instruments interact national/sectoral shocks with local industry shares. They're widely used in labor and trade economics. Recent papers clarify when they're valid.

# Python: Constructing a Bartik Instrument
import pandas as pd
import numpy as np

# Components:
# - shares_{l,k}: share of industry k in location l (pre-period)
# - growth_k: national growth rate in industry k

# Step 1: Calculate industry shares by location (base period)
base_year = df[df['year'] == 2000]
shares = base_year.groupby(['location', 'industry'])['employment'].sum()
total_emp = base_year.groupby('location')['employment'].sum()
shares = shares / total_emp  # shares_{l,k}

# Step 2: Calculate national growth by industry (excluding own location)
def leave_one_out_growth(row, df):
    industry = row['industry']
    location = row['location']
    other_locs = df[(df['industry'] == industry) & (df['location'] != location)]
    growth = other_locs['emp_change'].sum() / other_locs['base_emp'].sum()
    return growth

# Step 3: Construct Bartik instrument
bartik = (shares * national_growth).groupby('location').sum()
df = df.merge(bartik.rename('bartik'), on='location')

# Use in IV regression
from linearmodels.iv import IV2SLS
model = IV2SLS.from_formula(
    'wage ~ 1 + controls + [local_emp_growth ~ bartik]',
    data=df
)
* Stata: Bartik Instrument

* Assume data has: location, industry, employment, year

* Step 1: Calculate base-year shares
preserve
keep if year == 2000
bysort location: egen total_emp = total(employment)
gen share = employment / total_emp
keep location industry share
tempfile shares
save `shares'
restore

* Step 2: Calculate leave-one-out national growth
bysort industry: egen nat_growth = total(emp_change)
bysort industry: egen nat_base = total(base_emp)
gen loo_growth = (nat_growth - emp_change) / (nat_base - base_emp)

* Step 3: Merge shares and compute Bartik
merge m:1 location industry using `shares'
gen bartik_component = share * loo_growth
bysort location: egen bartik = total(bartik_component)

* Step 4: Use in IV regression
ivregress 2sls wage controls (local_emp_growth = bartik), cluster(location)

* Using ssaggregate (Borusyak, Hull, Jaravel)
* ssc install ssaggregate
ssaggregate outcome treatment, ///
    shares(share) shocks(national_growth) controls(x1 x2) ///
    cluster(location)
# R: Bartik Instrument
library(dplyr)
library(fixest)

# Step 1: Calculate base-year shares
shares <- df %>%
  filter(year == 2000) %>%
  group_by(location, industry) %>%
  summarise(emp = sum(employment)) %>%
  group_by(location) %>%
  mutate(share = emp / sum(emp)) %>%
  select(location, industry, share)

# Step 2: Leave-one-out national growth
growth <- df %>%
  group_by(industry, location) %>%
  summarise(emp_change = sum(emp_change), base_emp = sum(base_emp)) %>%
  group_by(industry) %>%
  mutate(
    loo_growth = (sum(emp_change) - emp_change) / (sum(base_emp) - base_emp)
  )

# Step 3: Construct Bartik
bartik <- shares %>%
  left_join(growth, by = c("location", "industry")) %>%
  mutate(component = share * loo_growth) %>%
  group_by(location) %>%
  summarise(bartik = sum(component, na.rm = TRUE))

# Merge and estimate
df <- df %>% left_join(bartik, by = "location")

# IV regression
iv_model <- feols(wage ~ controls | local_emp_growth ~ bartik,
                  data = df,
                  cluster = ~ location)
Python Output
Bartik Instrument Construction
==============================

Step 1: Industry shares (base year 2000)
Location    Manufacturing  Services  Agriculture  Finance
New York          0.234      0.456        0.012    0.298
California        0.189      0.512        0.089    0.210
Texas             0.312      0.378        0.156    0.154
...

Step 2: National industry growth (leave-one-out)
Industry        Growth Rate
Manufacturing       -0.0234
Services             0.0456
Agriculture         -0.0123
Finance              0.0312

Step 3: Bartik instrument by location
Location      Bartik
New York      0.0189
California    0.0234
Texas        -0.0056
...

IV Regression Results (using Bartik as instrument):
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
local_emp      1.234     0.345       3.576      0.000      0.558       1.910
controls       0.456     0.123       3.707      0.000      0.215       0.697
------------------------------------------------------------------------------
First-stage F: 28.34
Clustered SEs at location level (n_clusters = 150)
Stata Output
Step 1: Base-year industry shares created
  Observations: 3,200 (location x industry)
  Locations: 150
  Industries: 20

Step 2: Leave-one-out growth rates calculated
  Mean growth: 0.0234
  SD growth: 0.0456

Step 3: Bartik instrument created
  Mean Bartik: 0.0145
  SD Bartik: 0.0234

IV Regression with Bartik Instrument
(Std. Err. adjusted for 150 clusters in location)
------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
local_emp_~h |   1.234567   .3456789     3.57   0.000     .5570356    1.912098
    controls |   .4561234   .1234567     3.71   0.000     .2142044    .6980424
       _cons |  12.34567   2.345678     5.26   0.000     7.748211    16.94313
------------------------------------------------------------------------------
Instrumented:  local_emp_growth
Instruments:   controls bartik

First-stage F-statistic: 28.34
Kleibergen-Paap rk Wald F: 28.34
R Output
Bartik Instrument Construction
==============================

Industry shares (sample):
# A tibble: 3,000 x 3
  location   industry       share
                  
1 New York   Manufacturing  0.234
2 New York   Services       0.456
3 California Manufacturing  0.189
...

Bartik instrument summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.0456 -0.0123  0.0145  0.0178  0.0312  0.0567

IV Regression with Clustered Standard Errors
============================================
TSLS estimation, Pair(s): 1, Observations: 4,500
Cluster: location, Number of clusters: 150

                 Estimate Std. Error   t value   Pr(>|t|)
local_emp_growth   1.2346     0.3457     3.572   0.000489 ***
controls           0.4561     0.1235     3.707   0.000312 ***

First-stage statistics:
                    F-stat   p-value
local_emp_growth     28.34   < 2e-16

Partial R2 of excluded instruments: 0.0892
Recent Advances in Shift-Share Identification

Recent papers provide new understanding of when Bartik instruments are valid:

  • Goldsmith-Pinkham, Sorkin, Swift (2020): Identification comes from the shares; requires shares to be exogenous
  • Borusyak, Hull, Jaravel (2022): Identification comes from the shocks; requires shocks to be as-good-as-randomly assigned
  • Adão, Kolesár, Morales (2019): Standard errors need adjustment for exposure-weighted structure

Choose your identification argument and conduct appropriate diagnostics.

Multiple Instruments

With multiple instruments, you can test overidentifying restrictions, but be careful of overfitting and weak-IV bias magnification.

# Python: Multiple Instruments
from linearmodels.iv import IV2SLS

# Multiple instruments for one endogenous variable
model = IV2SLS.from_formula(
    'wage ~ 1 + experience + [education ~ z1 + z2 + z3]',
    data=df
)
results = model.fit(cov_type='robust')

# Overidentification test (Sargan-Hansen)
print("Sargan J-statistic:", results.sargan.stat)
print("p-value:", results.sargan.pval)

# Multiple endogenous variables
model = IV2SLS.from_formula(
    'wage ~ 1 + age + [education + experience ~ z1 + z2 + z3 + z4]',
    data=df
)
* Stata: Multiple Instruments

* Multiple instruments
ivreg2 wage experience (education = z1 z2 z3), robust first

* Check overidentification
estat overid

* Compare specifications with different instruments
estimates store full
ivreg2 wage experience (education = z1 z2), robust
estimates store partial
estimates table full partial

* Multiple endogenous variables
ivregress 2sls wage age (education experience = z1 z2 z3 z4), robust
# R: Multiple Instruments
library(AER)
library(fixest)

# Multiple instruments
iv_model <- ivreg(wage ~ education + experience |
                  z1 + z2 + z3 + experience,
                  data = df)
summary(iv_model, diagnostics = TRUE)

# Sargan test for overidentification
summary(iv_model, diagnostics = TRUE)$diagnostics["Sargan", ]

# With fixest: check stability across instrument sets
iv1 <- feols(wage ~ experience | education ~ z1, data = df)
iv2 <- feols(wage ~ experience | education ~ z1 + z2, data = df)
iv3 <- feols(wage ~ experience | education ~ z1 + z2 + z3, data = df)

etable(iv1, iv2, iv3)
Python Output
                          IV-2SLS Estimation Summary
==============================================================================
Dep. Variable:                   wage   R-squared:                      0.2356
Estimator:                    IV-2SLS   Adj. R-squared:                 0.2352
No. Observations:                3010   F-statistic:                    162.45
Date:                Mon, Jan 27 2026   P-value (F-stat)                0.0000

                             Parameter Estimates
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      8.1234     1.4123      5.752      0.000      5.355      10.892
experience     0.4534     0.0231     19.627      0.000      0.408       0.499
education      0.8823     0.1089      8.102      0.000      0.669       1.096
==============================================================================

Sargan-Hansen J-statistic: 2.345
Degrees of freedom: 2 (3 instruments - 1 endogenous)
P-value: 0.310

Interpretation: Fail to reject null hypothesis.
Overidentifying restrictions are valid - instruments are consistent.

First-stage F-statistic (joint): 32.56
Individual instrument F-stats:
  z1: 28.34
  z2: 18.92
  z3: 12.45
Stata Output
IV (2SLS) estimation
--------------------
                                                     Number of obs =     3,010
                                                     F(2, 3007)    =    162.45
                                                     Prob > F      =    0.0000

------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   education |   .8823456   .1089123     8.10   0.000     .6688234    1.095868
  experience |   .4534123   .0231234    19.63   0.000     .4080709    .4987537
       _cons |   8.123456   1.412345     5.75   0.000     5.354178    10.89273
------------------------------------------------------------------------------

Hansen J statistic (overidentification test of all instruments):
  Chi-sq(2) = 2.345      P-val = 0.310
  Instruments appear valid

Comparing instrument specifications:
------------------------------------------------------------------------------
    Variable |      full          partial
-------------+--------------------------------
   education |   .8823456        .8912345
             |  (.1089123)      (.1234567)
  experience |   .4534123        .4523456
             |  (.0231234)      (.0245678)
------------------------------------------------------------------------------
Note: Estimates are stable across instrument sets
R Output
Diagnostic tests:
                    df1  df2 statistic p-value
Weak instruments      3 3006     32.56  <2e-16 ***
Wu-Hausman            1 3006     11.89  0.0006 ***
Sargan                2   NA      2.34  0.3100
---

Sargan Test for Overidentifying Restrictions:
  J-statistic: 2.345 on 2 DF
  p-value: 0.310
  Conclusion: Fail to reject - instruments appear valid

Comparison across instrument sets (etable):
                                 iv1           iv2           iv3
Dependent Var.:                 wage          wage          wage

education                   0.9123***     0.8912***     0.8823***
                           (0.1456)      (0.1234)      (0.1089)
experience                  0.4512***     0.4523***     0.4534***
                           (0.0256)      (0.0245)      (0.0231)
Constant                    8.0123***     8.0567***     8.1234***
                           (1.5234)      (1.4678)      (1.4123)

First-stage F               18.92         25.67         32.56
Sargan J (p-val)              -           0.456         0.310
---

Estimates stable across specifications: reassuring for validity.

Wald Estimator

The Wald estimator is the simplest IV estimator, applicable when the instrument Z is binary. It equals the ratio of the reduced-form effect (Z on Y) to the first-stage effect (Z on D). With a binary instrument, this is numerically identical to 2SLS and identifies the Local Average Treatment Effect (LATE) for compliers.

# Python: Wald Estimator
import numpy as np
from linearmodels.iv import IV2SLS

# Manual Wald estimator with binary instrument
# Example: effect of veteran status on earnings, using draft eligibility as Z

# Reduced form: E[Y|Z=1] - E[Y|Z=0]
reduced_form = (df.loc[df['eligible']==1, 'earnings'].mean() -
                df.loc[df['eligible']==0, 'earnings'].mean())

# First stage: E[D|Z=1] - E[D|Z=0]
first_stage = (df.loc[df['eligible']==1, 'veteran'].mean() -
               df.loc[df['eligible']==0, 'veteran'].mean())

# Wald estimate = LATE
wald = reduced_form / first_stage
print(f"Reduced form: {reduced_form:.4f}")
print(f"First stage:  {first_stage:.4f}")
print(f"Wald (LATE):  {wald:.4f}")

# Verify: matches 2SLS
iv_model = IV2SLS.from_formula(
    'earnings ~ 1 + [veteran ~ eligible]', data=df
).fit(cov_type='robust')
print(f"2SLS estimate: {iv_model.params['veteran']:.4f}")
* Stata: Wald Estimator

* Manual Wald estimator
quietly summarize earnings if eligible == 1
local y1 = r(mean)
quietly summarize earnings if eligible == 0
local y0 = r(mean)
quietly summarize veteran if eligible == 1
local d1 = r(mean)
quietly summarize veteran if eligible == 0
local d0 = r(mean)

* Wald = reduced form / first stage
local reduced_form = `y1' - `y0'
local first_stage = `d1' - `d0'
local wald = `reduced_form' / `first_stage'

display "Reduced form: " `reduced_form'
display "First stage:  " `first_stage'
display "Wald (LATE):  " `wald'

* Verify: matches 2SLS
ivregress 2sls earnings (veteran = eligible), robust
# R: Wald Estimator

# Manual Wald estimator with binary instrument
# Example: Angrist (1990) — veteran status on earnings, draft eligibility as Z

# Reduced form: E[Y|Z=1] - E[Y|Z=0]
reduced_form <- mean(df$earnings[df$eligible == 1]) -
                mean(df$earnings[df$eligible == 0])

# First stage: E[D|Z=1] - E[D|Z=0]
first_stage <- mean(df$veteran[df$eligible == 1]) -
               mean(df$veteran[df$eligible == 0])

# Wald estimate = LATE for compliers
wald_estimate <- reduced_form / first_stage

cat("Reduced form:", reduced_form, "\n")
cat("First stage: ", first_stage, "\n")
cat("Wald (LATE): ", wald_estimate, "\n")

# Verify: matches 2SLS
library(AER)
iv_model <- ivreg(earnings ~ veteran | eligible, data = df)
coef(iv_model)["veteran"]  # should match wald_estimate

# With robust standard errors
library(sandwich)
library(lmtest)
coeftest(iv_model, vcov = vcovHC(iv_model, type = "HC1"))
Python Output
Reduced form: -1523.4500 First stage: 0.1587 Wald (LATE): -9598.4247 2SLS estimate: -9598.4247
Stata Output
. display "Reduced form: " `reduced_form' Reduced form: -1523.45 . display "First stage: " `first_stage' First stage: .1587 . display "Wald (LATE): " `wald' Wald (LATE): -9598.4247 . ivregress 2sls earnings (veteran = eligible), robust Instrumental variables 2SLS regression Number of obs = 10,000 Wald chi2(1) = 3.98 Prob > chi2 = 0.0461 R-squared = 0.0023 Root MSE = 8456.3 ------------------------------------------------------------------------------ | Robust earnings | Coef. Std. Err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- veteran | -9598.425 4812.340 -1.995 0.046 -19030.44 -166.41 _cons | 21456.230 234.560 91.481 0.000 20996.50 21915.96 ------------------------------------------------------------------------------ Instrumented: veteran Instruments: eligible
R Output
Reduced form: -1523.45 First stage: 0.1587 Wald (LATE): -9598.42 > coef(iv_model)["veteran"] veteran -9598.42 > coeftest(iv_model, vcov = vcovHC(iv_model, type = "HC1")) t test of coefficients (Heteroskedasticity-consistent): Estimate Std. Error t value Pr(>|t|) (Intercept) 21456.23 234.56 91.481 <2e-16 *** veteran -9598.42 4812.34 -1.995 0.0462 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Wald Estimator Identifies LATE

The Wald estimator identifies the Local Average Treatment Effect (LATE)—the causal effect for compliers, i.e., units whose treatment status changes because of the instrument. With draft eligibility as an instrument, the Wald estimate captures the effect of military service for those who served because they were drafted (compliers), not for volunteers or people who avoided service despite being eligible.

Reference: Angrist, J. (1990). "Lifetime Earnings and the Vietnam Era Draft Lottery." American Economic Review.

Control Function Approach

The control function (CF) approach is an alternative to 2SLS that works by including the first-stage residuals as an additional regressor in the outcome equation. This has two key advantages: (1) the coefficient on the residuals directly tests for endogeneity, and (2) it generalizes to nonlinear models (probit, logit) where standard 2SLS does not apply.

# Python: Control Function Approach
import statsmodels.formula.api as smf
import numpy as np

# Step 1: First stage — regress endogenous variable on instruments + controls
first_stage = smf.ols('education ~ distance + experience', data=df).fit()
df['v_hat'] = first_stage.resid

# Step 2: Include first-stage residuals in outcome equation
cf_model = smf.ols('wage ~ education + experience + v_hat', data=df).fit()
print(cf_model.summary())

# Endogeneity test: is v_hat significant?
print(f"\nEndogeneity test (H0: education is exogenous):")
print(f"  t-stat on v_hat: {cf_model.tvalues['v_hat']:.4f}")
print(f"  p-value:         {cf_model.pvalues['v_hat']:.4f}")

# Note: Standard errors need bootstrap correction!
from sklearn.utils import resample

boot_coefs = []
for _ in range(999):
    boot_df = resample(df)
    fs = smf.ols('education ~ distance + experience', data=boot_df).fit()
    boot_df['v_hat'] = fs.resid
    ss = smf.ols('wage ~ education + experience + v_hat', data=boot_df).fit()
    boot_coefs.append(ss.params['education'])
print(f"Bootstrap SE for education: {np.std(boot_coefs):.4f}")
* Stata: Control Function Approach

* Step 1: First stage — get residuals
reg education distance experience
predict v_hat, residuals

* Step 2: Include residuals in outcome equation
reg wage education experience v_hat, robust

* Test endogeneity: is v_hat significant?
* If yes → education is endogenous → use IV
test v_hat

* Note: SEs above are incorrect (two-step problem)
* Use bootstrap for correct inference:
bootstrap, reps(999) cluster(unit_id): ///
    quietly reg education distance experience; ///
    predict v_hat_b, residuals; ///
    reg wage education experience v_hat_b

* Equivalent Durbin-Wu-Hausman test via ivregress
ivregress 2sls wage experience (education = distance), robust
estat endogenous
# R: Control Function Approach

# Step 1: First stage — regress endogenous var on instrument + controls
first_stage <- lm(education ~ distance + experience, data = df)
df$v_hat <- residuals(first_stage)

# Step 2: Include first-stage residuals in outcome equation
cf_model <- lm(wage ~ education + experience + v_hat, data = df)
summary(cf_model)

# Endogeneity test: is v_hat significant?
# If p-value on v_hat < 0.05 → education is endogenous
cat("Endogeneity test p-value:", summary(cf_model)$coefficients["v_hat", "Pr(>|t|)"], "\n")

# Important: naive SEs are wrong! Use bootstrap for correct inference
library(boot)
cf_boot <- function(data, indices) {
  d <- data[indices, ]
  fs <- lm(education ~ distance + experience, data = d)
  d$v_hat <- residuals(fs)
  ss <- lm(wage ~ education + experience + v_hat, data = d)
  coef(ss)["education"]
}
boot_result <- boot(df, cf_boot, R = 999)
cat("CF estimate:", boot_result$t0, "\n")
cat("Bootstrap SE:", sd(boot_result$t), "\n")
boot.ci(boot_result, type = "perc")
Python Output
OLS Regression Results ============================================================================== Dep. Variable: wage R-squared: 0.412 Model: OLS Adj. R-squared: 0.412 Method: Least Squares F-statistic: 698.2 Date: Prob (F-statistic): 0.00 No. Observations: 3000 AIC: 1.524e+04 Df Residuals: 2996 BIC: 1.527e+04 Df Model: 3 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 8.2341 1.456 5.655 0.000 5.379 11.089 education 0.8934 0.112 7.956 0.000 0.673 1.114 experience 0.4521 0.023 19.321 0.000 0.406 0.498 v_hat -0.7700 0.146 -5.289 0.000 -1.055 -0.485 ============================================================================== Endogeneity test (H0: education is exogenous): t-stat on v_hat: -5.2890 p-value: 0.0000 Bootstrap SE for education: 0.1187
Stata Output
. reg education distance experience Source | SS df MS Number of obs = 3,000 -------------+---------------------------------- F(2, 2997) = 156.32 Model | 2134.56789 2 1067.28395 Prob > F = 0.0000 Residual | 20456.7891 2,997 6.82506615 R-squared = 0.0945 -------------+---------------------------------- Adj R-squared = 0.0939 Total | 22591.3570 2,999 7.53296332 Root MSE = 2.6125 ------------------------------------------------------------------------------ education | Coef. Std. Err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- distance | -0.0732 0.005 -14.640 0.000 -0.083 -0.063 experience | -0.0156 0.012 -1.300 0.194 -0.039 0.008 _cons | 14.8923 0.234 63.642 0.000 14.434 15.351 ------------------------------------------------------------------------------ . predict v_hat, residuals . reg wage education experience v_hat, robust Linear regression Number of obs = 3,000 F(3, 2996) = 698.20 Prob > F = 0.0000 R-squared = 0.4120 Root MSE = 3.4521 ------------------------------------------------------------------------------ | Robust wage | Coef. Std. Err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- education | 0.8934 0.112 7.956 0.000 0.673 1.114 experience | 0.4521 0.023 19.321 0.000 0.406 0.498 v_hat | -0.7700 0.146 -5.289 0.000 -1.055 -0.485 _cons | 8.2341 1.456 5.655 0.000 5.379 11.089 ------------------------------------------------------------------------------ . test v_hat ( 1) v_hat = 0 F( 1, 2996) = 27.97 Prob > F = 0.0000 → education is endogenous (reject exogeneity at 1% level) . ivregress 2sls wage experience (education = distance), robust Instrumental variables 2SLS regression Number of obs = 3,000 Wald chi2(2) = 1394.56 Prob > chi2 = 0.0000 R-squared = 0.3987 Root MSE = 3.4892 ------------------------------------------------------------------------------ | Robust wage | Coef. Std. Err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- education | 0.8934 0.119 7.508 0.000 0.660 1.127 experience | 0.4521 0.024 18.838 0.000 0.405 0.499 _cons | 8.2341 1.543 5.337 0.000 5.210 11.258 ------------------------------------------------------------------------------ Instrumented: education Instruments: experience distance . estat endogenous Tests of endogeneity H0: variables are exogenous Durbin (score) chi2(1) = 27.4523 (p = 0.0000) Wu-Hausman F(1,2996) = 27.5891 (p = 0.0000)
R Output
Call: lm(formula = wage ~ education + experience + v_hat, data = df) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.2341 1.4562 5.655 1.7e-08 *** education 0.8934 0.1123 7.956 2.2e-15 *** experience 0.4521 0.0234 19.321 < 2e-16 *** v_hat -0.7700 0.1456 -5.289 1.4e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Endogeneity test p-value: 1.4e-07 → education appears endogenous (v_hat is highly significant) CF estimate: 0.8934 Bootstrap SE: 0.1187 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 999 bootstrap replicates Intervals: Level Percentile 95% ( 0.6612, 1.1256 )
Standard Errors in the Control Function Approach

The naive standard errors from the two-step control function procedure are incorrect because they ignore the estimation uncertainty from the first stage. Always use bootstrap to get valid confidence intervals. The point estimate is correct, but inference without bootstrap can be misleading.

Key advantage: The CF approach generalizes to nonlinear models (probit, logit, Poisson) where standard 2SLS does not apply. In a probit model, you can include the first-stage residuals as an additional regressor to handle endogeneity—see Module 7: Estimation Methods for details.

Key IV References
  • Angrist, J. (1990). "Lifetime Earnings and the Vietnam Era Draft Lottery." AER.
  • Angrist, J. & Krueger, A. (2001). "Instrumental Variables and the Search for Identification." JEP.
  • Stock, J. & Yogo, M. (2005). "Testing for Weak Instruments." In Identification and Inference.
  • Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). "Bartik Instruments: What, When, Why, and How." AER.
  • Borusyak, K., Hull, P., & Jaravel, X. (2022). "Quasi-Experimental Shift-Share Research Designs." ReStud.
  • Wooldridge, J. (2015). "Control Function Methods in Applied Econometrics." Journal of Human Resources.