6D Instrumental Variables
Table of Contents
Two-Stage Least Squares
IV estimation uses an instrument Z that affects treatment D but not the outcome Y directly. The identifying assumption is that Z affects Y only through D (exclusion restriction).
# Python: 2SLS with linearmodels
from linearmodels.iv import IV2SLS
import pandas as pd
# Example: effect of education on wages
# Instrument: distance to college
# 2SLS estimation
model = IV2SLS.from_formula(
'wage ~ 1 + experience + [education ~ distance_to_college]',
data=df
)
results = model.fit(cov_type='robust')
print(results.summary)
# With multiple controls
model = IV2SLS.from_formula(
'wage ~ 1 + experience + age + [education ~ distance_to_college]',
data=df
)
# Manual 2SLS (for understanding)
import statsmodels.api as sm
# First stage: regress endogenous on instrument
first_stage = sm.OLS(df['education'],
sm.add_constant(df[['distance_to_college', 'experience']])).fit()
df['education_hat'] = first_stage.fittedvalues
# Second stage: regress outcome on predicted endogenous
second_stage = sm.OLS(df['wage'],
sm.add_constant(df[['education_hat', 'experience']])).fit()
# Note: SEs from manual 2SLS are wrong! Use IV2SLS for correct SEs.
* Stata: 2SLS with ivregress
* Basic 2SLS
ivregress 2sls wage experience (education = distance_to_college), robust
* With first-stage results
ivregress 2sls wage experience (education = distance_to_college), first robust
* Using ivreg2 (more diagnostics)
* ssc install ivreg2
ivreg2 wage experience (education = distance_to_college), robust first
* With clustering
ivregress 2sls wage experience (education = distance_to_college), ///
vce(cluster state)
# R: 2SLS with ivreg (AER package)
library(AER)
library(sandwich)
# 2SLS estimation
iv_model <- ivreg(wage ~ education + experience |
distance_to_college + experience,
data = df)
summary(iv_model, vcov = vcovHC, diagnostics = TRUE)
# Using fixest (faster, more flexible)
library(fixest)
iv_model <- feols(wage ~ experience | education ~ distance_to_college,
data = df,
vcov = "hetero")
summary(iv_model, stage = 1:2) # show both stages
IV-2SLS Estimation Summary
==============================================================================
Dep. Variable: wage R-squared: 0.2341
Estimator: IV-2SLS Adj. R-squared: 0.2338
No. Observations: 3010 F-statistic: 156.34
Date: Mon, Jan 27 2026 P-value (F-stat) 0.0000
Time: 14:23:45 Distribution: chi2(2)
Cov. Estimator: robust
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
Intercept 8.2341 1.4562 5.655 0.000 5.380 11.088
experience 0.4521 0.0234 19.321 0.000 0.406 0.498
education 0.8934 0.1123 7.956 0.000 0.673 1.114
==============================================================================
Endogenous: education
Instruments: distance_to_college
Robust Covariance (Heteroskedastic)
Debiased: FalseInstrumental variables 2SLS regression Number of obs = 3,010
Wald chi2(2) = 312.68
Prob > chi2 = 0.0000
R-squared = 0.2341
Root MSE = 8.2145
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .893412 .1123456 7.96 0.000 .6732206 1.113604
experience | .452134 .0234123 19.32 0.000 .406247 .4980209
_cons | 8.234156 1.456234 5.66 0.000 5.379959 11.08835
------------------------------------------------------------------------------
Instrumented: education
Instruments: experience distance_to_college
First-stage regression summary statistics
------------------------------------------------------------------------------
Variable | R-sq. Adj. R-sq. Robust F(1,3007) Prob > F
-------------+----------------------------------------------------------------
education | 0.0892 0.0889 42.34 0.0000
------------------------------------------------------------------------------Call:
ivreg(formula = wage ~ education + experience | distance_to_college +
experience, data = df)
Residuals:
Min 1Q Median 3Q Max
-28.4521 -5.2341 -0.1234 5.1234 32.5678
Coefficients:
Estimate Std. Err. t value Pr(>|t|)
(Intercept) 8.2341 1.4562 5.66 1.7e-08 ***
education 0.8934 0.1123 7.96 2.2e-15 ***
experience 0.4521 0.0234 19.32 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.214 on 3007 degrees of freedom
Multiple R-Squared: 0.2341, Adjusted R-squared: 0.2338
Wald test: 156.3 on 2 and 3007 DF, p-value: < 2.2e-16
Diagnostic tests:
df1 df2 statistic p-value
Weak instruments 1 3007 42.34 <2e-16 ***
---IV Diagnostics
Always report: (1) first-stage F-statistic, (2) overidentification test (with multiple instruments), (3) comparison of OLS and IV estimates.
# Python: IV Diagnostics
from linearmodels.iv import IV2SLS
import statsmodels.api as sm
# Estimate IV model
model = IV2SLS.from_formula(
'wage ~ 1 + experience + [education ~ z1 + z2]',
data=df
)
results = model.fit(cov_type='robust')
# 1. First-stage F-statistic
first_stage = sm.OLS(df['education'],
sm.add_constant(df[['z1', 'z2', 'experience']])).fit()
print(f"First-stage F: {first_stage.fvalue:.2f}")
# 2. Durbin-Wu-Hausman test (endogeneity test)
print("Wu-Hausman test:", results.wu_hausman())
# 3. Sargan-Hansen overidentification test
print("Sargan test:", results.sargan)
* Stata: IV Diagnostics
* Estimate with ivreg2 for full diagnostics
ivreg2 wage experience (education = z1 z2), robust first
* Key diagnostics reported automatically:
* - Kleibergen-Paap F statistic (weak ID)
* - Hansen J statistic (overidentification)
* - Endogeneity test
* Post-estimation tests with ivregress
ivregress 2sls wage experience (education = z1 z2), robust
* First-stage F
estat firststage
* Endogeneity test
estat endogenous
* Overidentification test (requires >1 instrument per endogenous)
estat overid
# R: IV Diagnostics
library(AER)
# Estimate with diagnostics
iv_model <- ivreg(wage ~ education + experience |
z1 + z2 + experience,
data = df)
# Full diagnostics
summary(iv_model, diagnostics = TRUE)
# Components:
# - Weak instruments: F-stat on excluded instruments
# - Wu-Hausman: endogeneity test (OLS consistent?)
# - Sargan: overidentification test
# With fixest
library(fixest)
iv_model <- feols(wage ~ experience | education ~ z1 + z2,
data = df)
fitstat(iv_model, ~ ivf + ivwald + sargan)
First-stage F: 38.42 (F > 10 indicates strong instruments - Stock & Yogo rule of thumb) Wu-Hausman test: Statistic: 12.456 P-value: 0.0004 Conclusion: Reject null of exogeneity - OLS is inconsistent Sargan test (overidentification): J-statistic: 1.234 P-value: 0.267 Degrees of freedom: 1 Conclusion: Fail to reject - instruments appear valid Summary of IV Diagnostics: -------------------------- First-stage F-statistic: 38.42 (> 10, strong instruments) Wu-Hausman (endogeneity): p = 0.0004 (education is endogenous) Sargan (overid): p = 0.267 (instruments appear valid)
First-stage regression summary statistics
------------------------------------------------------------------------------
Adjusted Partial
Variable | R-sq. R-sq. R-sq. F(2,3006) Prob > F
-------------+----------------------------------------------------------------
education | 0.1234 0.1228 0.0892 38.42 0.0000
------------------------------------------------------------------------------
Stock-Yogo weak ID test critical values for single endogenous regressor:
10% 15% 20% 25%
2SLS relative bias 19.93 11.59 8.75 7.25
2SLS Size of nominal 5% test 22.30 12.83 9.54 7.80
Kleibergen-Paap rk Wald F statistic: 38.42
Endogeneity test of endogenous regressors:
Chi-sq(1) = 12.456 P-val = 0.0004
Reject H0: education is exogenous
Hansen J statistic (overidentification test):
Chi-sq(1) = 1.234 P-val = 0.267
Fail to reject H0: instruments are validDiagnostic tests:
df1 df2 statistic p-value
Weak instruments 2 3006 38.42 <2e-16 ***
Wu-Hausman 1 3006 12.46 0.0004 ***
Sargan 1 NA 1.23 0.2670
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
- Weak instruments test: F = 38.42 >> 10, instruments are strong
- Wu-Hausman test: p = 0.0004, reject exogeneity (education is endogenous)
- Sargan test: p = 0.267, fail to reject (overidentifying restrictions valid)
First-stage results (using fixest):
Estimate Std. Error t value Pr(>|t|)
z1 0.4523 0.0734 6.162 <2e-16 ***
z2 0.3156 0.0812 3.887 0.0001 ***
experience 0.0234 0.0089 2.629 0.0086 **
First-stage F-stat (joint test of instruments): 38.42Weak Instruments
Weak instruments (low first-stage F) cause biased IV estimates and unreliable inference. Use weak-instrument robust methods when F < 10.
# Python: Weak Instrument Robust Inference
from linearmodels.iv import IV2SLS
import numpy as np
# Anderson-Rubin confidence interval (weak-IV robust)
def anderson_rubin_ci(y, d, z, x, alpha=0.05):
"""Compute Anderson-Rubin confidence interval."""
from scipy import stats
# Grid search over beta values
betas = np.linspace(-5, 5, 1000)
ar_stats = []
for beta in betas:
resid = y - beta * d
# Regress residual on z (and x)
import statsmodels.api as sm
model = sm.OLS(resid, sm.add_constant(np.column_stack([z, x]))).fit()
# F-test that coefficients on z are jointly zero
ar_stats.append(model.fvalue)
# CI: betas where we fail to reject
critical = stats.f.ppf(1-alpha, dfn=z.shape[1], dfd=len(y)-z.shape[1]-x.shape[1]-1)
ci_mask = np.array(ar_stats) < critical
return betas[ci_mask].min(), betas[ci_mask].max()
* Stata: Weak Instrument Robust Inference
* Limited Information Maximum Likelihood (LIML)
ivregress liml wage experience (education = z1 z2), robust
* Weak-IV robust confidence intervals with weakiv
* ssc install weakiv
weakiv ivregress 2sls wage experience (education = z1 z2)
* Check Stock-Yogo critical values
ivreg2 wage experience (education = z1 z2), first
* Compare Kleibergen-Paap F to Stock-Yogo critical values
* Continuously updating GMM (CUE) - also weak-IV robust
ivreg2 wage experience (education = z1 z2), cue robust
# R: Weak Instrument Robust Inference
library(AER)
library(ivmodel)
# Check first-stage F
iv_model <- ivreg(wage ~ education + experience |
z1 + z2 + experience,
data = df)
summary(iv_model, diagnostics = TRUE)
# Weak-IV robust inference with ivmodel
iv_robust <- ivmodel(Y = df$wage,
D = df$education,
Z = cbind(df$z1, df$z2),
X = df$experience)
# Anderson-Rubin confidence interval
AR.test(iv_robust)
# LIML estimator
LIML(iv_robust)
Anderson-Rubin Confidence Interval (Weak-IV Robust)
===================================================
Testing H0: beta = beta_0 for grid of beta values...
AR 95% Confidence Interval: [0.612, 1.234]
Comparison of Methods:
Estimate 95% CI
2SLS 0.893 [0.673, 1.114]
Anderson-Rubin - [0.612, 1.234]
LIML 0.897 [0.665, 1.129]
Note: AR CI is wider but valid even with weak instruments.
When first-stage F > 10, 2SLS and AR CIs are similar.
Grid search details:
Beta range searched: [-5, 5]
Grid points: 1000
Critical value (F, df1=2, df2=3006): 3.00
Betas not rejected: 623 pointsLIML estimation
-------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+-----------------------------------------------------------------
education | .897234 .1156789 7.76 0.000 .6704994 1.123969
experience | .451234 .0234567 19.23 0.000 .4052593 .4972087
_cons | 8.156234 1.478912 5.52 0.000 5.257419 11.05505
-------------------------------------------------------------------------------
Weak-instrument-robust inference (weakiv):
Tests of H0: beta=b0
95% conf. set for beta
Test statistic p-value (robust to weak IV)
Anderson-Rubin chi2(2)=23.45 0.0000 [0.612, 1.234]
Wald chi2(1)=60.12 0.0000 [0.673, 1.114]
Conditional Likelihood Ratio (CLR) test:
CLR statistic: 23.12 p-value: 0.0000
95% CLR CI: [0.628, 1.221]
Stock-Yogo critical values (2SLS, 5% maximal bias):
# instruments = 2: critical F = 19.93
Actual F = 38.42 > 19.93: instruments are NOT weakAnderson-Rubin Test
-------------------
AR Statistic: 23.45 on 2 and 3006 DF
p-value: < 2.2e-16
95% Anderson-Rubin Confidence Set: [0.612, 1.234]
LIML Estimator
--------------
Estimate Std. Error t value Pr(>|t|)
education 0.8972 0.1157 7.757 < 2e-16 ***
Comparison of Estimators:
Estimate SE 95% CI
2SLS 0.8934 0.1123 [0.673, 1.114]
LIML 0.8972 0.1157 [0.670, 1.124]
Fuller(1) 0.8956 0.1142 [0.672, 1.119]
Note: LIML is median-unbiased with weak instruments,
while 2SLS bias is towards OLS.
Weak IV diagnostics:
First-stage F: 38.42
Stock-Yogo 10% maximal size: 19.93
Conclusion: F >> critical value, instruments are strongBartik/Shift-Share Instruments
Shift-share (Bartik) instruments interact national/sectoral shocks with local industry shares. They're widely used in labor and trade economics. Recent papers clarify when they're valid.
# Python: Constructing a Bartik Instrument
import pandas as pd
import numpy as np
# Components:
# - shares_{l,k}: share of industry k in location l (pre-period)
# - growth_k: national growth rate in industry k
# Step 1: Calculate industry shares by location (base period)
base_year = df[df['year'] == 2000]
shares = base_year.groupby(['location', 'industry'])['employment'].sum()
total_emp = base_year.groupby('location')['employment'].sum()
shares = shares / total_emp # shares_{l,k}
# Step 2: Calculate national growth by industry (excluding own location)
def leave_one_out_growth(row, df):
industry = row['industry']
location = row['location']
other_locs = df[(df['industry'] == industry) & (df['location'] != location)]
growth = other_locs['emp_change'].sum() / other_locs['base_emp'].sum()
return growth
# Step 3: Construct Bartik instrument
bartik = (shares * national_growth).groupby('location').sum()
df = df.merge(bartik.rename('bartik'), on='location')
# Use in IV regression
from linearmodels.iv import IV2SLS
model = IV2SLS.from_formula(
'wage ~ 1 + controls + [local_emp_growth ~ bartik]',
data=df
)
* Stata: Bartik Instrument
* Assume data has: location, industry, employment, year
* Step 1: Calculate base-year shares
preserve
keep if year == 2000
bysort location: egen total_emp = total(employment)
gen share = employment / total_emp
keep location industry share
tempfile shares
save `shares'
restore
* Step 2: Calculate leave-one-out national growth
bysort industry: egen nat_growth = total(emp_change)
bysort industry: egen nat_base = total(base_emp)
gen loo_growth = (nat_growth - emp_change) / (nat_base - base_emp)
* Step 3: Merge shares and compute Bartik
merge m:1 location industry using `shares'
gen bartik_component = share * loo_growth
bysort location: egen bartik = total(bartik_component)
* Step 4: Use in IV regression
ivregress 2sls wage controls (local_emp_growth = bartik), cluster(location)
* Using ssaggregate (Borusyak, Hull, Jaravel)
* ssc install ssaggregate
ssaggregate outcome treatment, ///
shares(share) shocks(national_growth) controls(x1 x2) ///
cluster(location)
# R: Bartik Instrument
library(dplyr)
library(fixest)
# Step 1: Calculate base-year shares
shares <- df %>%
filter(year == 2000) %>%
group_by(location, industry) %>%
summarise(emp = sum(employment)) %>%
group_by(location) %>%
mutate(share = emp / sum(emp)) %>%
select(location, industry, share)
# Step 2: Leave-one-out national growth
growth <- df %>%
group_by(industry, location) %>%
summarise(emp_change = sum(emp_change), base_emp = sum(base_emp)) %>%
group_by(industry) %>%
mutate(
loo_growth = (sum(emp_change) - emp_change) / (sum(base_emp) - base_emp)
)
# Step 3: Construct Bartik
bartik <- shares %>%
left_join(growth, by = c("location", "industry")) %>%
mutate(component = share * loo_growth) %>%
group_by(location) %>%
summarise(bartik = sum(component, na.rm = TRUE))
# Merge and estimate
df <- df %>% left_join(bartik, by = "location")
# IV regression
iv_model <- feols(wage ~ controls | local_emp_growth ~ bartik,
data = df,
cluster = ~ location)
Bartik Instrument Construction
==============================
Step 1: Industry shares (base year 2000)
Location Manufacturing Services Agriculture Finance
New York 0.234 0.456 0.012 0.298
California 0.189 0.512 0.089 0.210
Texas 0.312 0.378 0.156 0.154
...
Step 2: National industry growth (leave-one-out)
Industry Growth Rate
Manufacturing -0.0234
Services 0.0456
Agriculture -0.0123
Finance 0.0312
Step 3: Bartik instrument by location
Location Bartik
New York 0.0189
California 0.0234
Texas -0.0056
...
IV Regression Results (using Bartik as instrument):
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
local_emp 1.234 0.345 3.576 0.000 0.558 1.910
controls 0.456 0.123 3.707 0.000 0.215 0.697
------------------------------------------------------------------------------
First-stage F: 28.34
Clustered SEs at location level (n_clusters = 150)Step 1: Base-year industry shares created
Observations: 3,200 (location x industry)
Locations: 150
Industries: 20
Step 2: Leave-one-out growth rates calculated
Mean growth: 0.0234
SD growth: 0.0456
Step 3: Bartik instrument created
Mean Bartik: 0.0145
SD Bartik: 0.0234
IV Regression with Bartik Instrument
(Std. Err. adjusted for 150 clusters in location)
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
local_emp_~h | 1.234567 .3456789 3.57 0.000 .5570356 1.912098
controls | .4561234 .1234567 3.71 0.000 .2142044 .6980424
_cons | 12.34567 2.345678 5.26 0.000 7.748211 16.94313
------------------------------------------------------------------------------
Instrumented: local_emp_growth
Instruments: controls bartik
First-stage F-statistic: 28.34
Kleibergen-Paap rk Wald F: 28.34Bartik Instrument Construction ============================== Industry shares (sample): # A tibble: 3,000 x 3 location industry share1 New York Manufacturing 0.234 2 New York Services 0.456 3 California Manufacturing 0.189 ... Bartik instrument summary: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.0456 -0.0123 0.0145 0.0178 0.0312 0.0567 IV Regression with Clustered Standard Errors ============================================ TSLS estimation, Pair(s): 1, Observations: 4,500 Cluster: location, Number of clusters: 150 Estimate Std. Error t value Pr(>|t|) local_emp_growth 1.2346 0.3457 3.572 0.000489 *** controls 0.4561 0.1235 3.707 0.000312 *** First-stage statistics: F-stat p-value local_emp_growth 28.34 < 2e-16 Partial R2 of excluded instruments: 0.0892
Recent papers provide new understanding of when Bartik instruments are valid:
- Goldsmith-Pinkham, Sorkin, Swift (2020): Identification comes from the shares; requires shares to be exogenous
- Borusyak, Hull, Jaravel (2022): Identification comes from the shocks; requires shocks to be as-good-as-randomly assigned
- Adão, Kolesár, Morales (2019): Standard errors need adjustment for exposure-weighted structure
Choose your identification argument and conduct appropriate diagnostics.
Multiple Instruments
With multiple instruments, you can test overidentifying restrictions, but be careful of overfitting and weak-IV bias magnification.
# Python: Multiple Instruments
from linearmodels.iv import IV2SLS
# Multiple instruments for one endogenous variable
model = IV2SLS.from_formula(
'wage ~ 1 + experience + [education ~ z1 + z2 + z3]',
data=df
)
results = model.fit(cov_type='robust')
# Overidentification test (Sargan-Hansen)
print("Sargan J-statistic:", results.sargan.stat)
print("p-value:", results.sargan.pval)
# Multiple endogenous variables
model = IV2SLS.from_formula(
'wage ~ 1 + age + [education + experience ~ z1 + z2 + z3 + z4]',
data=df
)
* Stata: Multiple Instruments
* Multiple instruments
ivreg2 wage experience (education = z1 z2 z3), robust first
* Check overidentification
estat overid
* Compare specifications with different instruments
estimates store full
ivreg2 wage experience (education = z1 z2), robust
estimates store partial
estimates table full partial
* Multiple endogenous variables
ivregress 2sls wage age (education experience = z1 z2 z3 z4), robust
# R: Multiple Instruments
library(AER)
library(fixest)
# Multiple instruments
iv_model <- ivreg(wage ~ education + experience |
z1 + z2 + z3 + experience,
data = df)
summary(iv_model, diagnostics = TRUE)
# Sargan test for overidentification
summary(iv_model, diagnostics = TRUE)$diagnostics["Sargan", ]
# With fixest: check stability across instrument sets
iv1 <- feols(wage ~ experience | education ~ z1, data = df)
iv2 <- feols(wage ~ experience | education ~ z1 + z2, data = df)
iv3 <- feols(wage ~ experience | education ~ z1 + z2 + z3, data = df)
etable(iv1, iv2, iv3)
IV-2SLS Estimation Summary
==============================================================================
Dep. Variable: wage R-squared: 0.2356
Estimator: IV-2SLS Adj. R-squared: 0.2352
No. Observations: 3010 F-statistic: 162.45
Date: Mon, Jan 27 2026 P-value (F-stat) 0.0000
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
Intercept 8.1234 1.4123 5.752 0.000 5.355 10.892
experience 0.4534 0.0231 19.627 0.000 0.408 0.499
education 0.8823 0.1089 8.102 0.000 0.669 1.096
==============================================================================
Sargan-Hansen J-statistic: 2.345
Degrees of freedom: 2 (3 instruments - 1 endogenous)
P-value: 0.310
Interpretation: Fail to reject null hypothesis.
Overidentifying restrictions are valid - instruments are consistent.
First-stage F-statistic (joint): 32.56
Individual instrument F-stats:
z1: 28.34
z2: 18.92
z3: 12.45IV (2SLS) estimation
--------------------
Number of obs = 3,010
F(2, 3007) = 162.45
Prob > F = 0.0000
------------------------------------------------------------------------------
| Robust
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .8823456 .1089123 8.10 0.000 .6688234 1.095868
experience | .4534123 .0231234 19.63 0.000 .4080709 .4987537
_cons | 8.123456 1.412345 5.75 0.000 5.354178 10.89273
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):
Chi-sq(2) = 2.345 P-val = 0.310
Instruments appear valid
Comparing instrument specifications:
------------------------------------------------------------------------------
Variable | full partial
-------------+--------------------------------
education | .8823456 .8912345
| (.1089123) (.1234567)
experience | .4534123 .4523456
| (.0231234) (.0245678)
------------------------------------------------------------------------------
Note: Estimates are stable across instrument setsDiagnostic tests:
df1 df2 statistic p-value
Weak instruments 3 3006 32.56 <2e-16 ***
Wu-Hausman 1 3006 11.89 0.0006 ***
Sargan 2 NA 2.34 0.3100
---
Sargan Test for Overidentifying Restrictions:
J-statistic: 2.345 on 2 DF
p-value: 0.310
Conclusion: Fail to reject - instruments appear valid
Comparison across instrument sets (etable):
iv1 iv2 iv3
Dependent Var.: wage wage wage
education 0.9123*** 0.8912*** 0.8823***
(0.1456) (0.1234) (0.1089)
experience 0.4512*** 0.4523*** 0.4534***
(0.0256) (0.0245) (0.0231)
Constant 8.0123*** 8.0567*** 8.1234***
(1.5234) (1.4678) (1.4123)
First-stage F 18.92 25.67 32.56
Sargan J (p-val) - 0.456 0.310
---
Estimates stable across specifications: reassuring for validity.- Angrist, J. & Krueger, A. (2001). "Instrumental Variables and the Search for Identification." JEP.
- Stock, J. & Yogo, M. (2005). "Testing for Weak Instruments." In Identification and Inference.
- Goldsmith-Pinkham, P., Sorkin, I., & Swift, H. (2020). "Bartik Instruments: What, When, Why, and How." AER.
- Borusyak, K., Hull, P., & Jaravel, X. (2022). "Quasi-Experimental Shift-Share Research Designs." ReStud.