Applied Class #4 - Testing the LATE Assumptions
Introduction
This practical session is based on Huber & Mellace (2015). You may find it helpful to consult the paper and or my lecture notes.
Exercises
Write an R function that uses
rmvnorm()
from themvtnorm
package to simulaten
iid draws from the model given below, with argumentsn
,alpha
andbeta
. Your function should return a data frame with named columnsD
,Z
, andY
. \[ \begin{aligned} Y &= D + \beta Z + U\\ D &= 1\{\alpha Z + \epsilon > 0\}\\ \begin{bmatrix} U \\ \epsilon \end{bmatrix} &\sim \text{Normal}(0, \Sigma), \quad \Sigma = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}\\ Z&\sim \text{Bernoulli}(0.5), \, \text{indep. of } (U, \epsilon) \end{aligned} \]Answer the following questions about the model from the preceding part.
- Is the treatment \(D\) endogenous? How can you tell?
- What is the distribution of treatment effects? What is the LATE in this model?
- What is the role of \(\beta\)?
- What is the role of \(\alpha\)?
- Which of the LATE assumptions does the model satisfy?
Write a function called
get_theta()
to compute the sample analogues of \(\theta_1, \theta_2, \theta_3, \theta_4\) defined in Equation (7) of Huber & Mellace (2015). Your function should take a single input argument: a data frame (or tibble) with columns namedD
,Z
, andY
corresponding to the model from above. It should return a vector with four named elements:theta1
,theta2
,theta3
, andtheta4
.Check your function from the preceding part by generating 100,000 observations from the model in part 1 with parameter values \(\alpha = 0.6\) and \(\beta = 1\). You should detect a violation of the LATE assumptions. Calculate the Wald estimand. Does it equal the LATE? Repeat for \(\beta = 0\). How do you results change?
Repeat the preceding part for a variety of values of \(\beta\) until you find one for which the LATE assumptions are violated but you cannot detect a violation of the inequalities from the paper. Why is this possible?
Load the
wooldridge
dataset and read the documentation for thecard
dataset. Once you understand the contents of the dataset, carry out the following steps to construct a data frame (or tibble) calledcard_dat
:- Define the instrument
Z
as a dummy variable for living near a 4-year college in 1966. (The idea here is that living near a college reduces your costs of attending in a way that doesn’t affect wages.) - Define the outcome
Y
as the log of weekly earnings in 1976. - Construct the treatment
D
as a dummy variable that equals one if a person has completed 16 years of education or more by 1976. This is effectively a proxy for “has a four-year degree.”
- Define the instrument
Apply your function
get_theta()
tocard_dat
. Do you detect any violations of the LATE model? Re-read the documentation forcard
to see if you can find any potential explanation for your results. Interpret the IV estimate forcard_dat
in light of this.Bonus Question: If you found the preceding parts too easy, here’s a challenge for you! We did not consider statistical significance when looking for a violation of the LATE model in the preceding part. Use the function
boot()
from the R packageboot
, along with your functionget_theta()
from above to implement the “simple bootstrap with Bonferroni adjustment” described on page 402 of Huber & Mellace (2015) and apply it tocard_dat
. Briefly discuss your findings.