Tutorial #5 - The Minimum Wage and Unemployment

This problem uses a dataset called minwage.dta, drawn from a famous study of the effects of minimum wages by Card & Kreuger (1994: AER). You can download a copy from the data directory of my website at https://ditraglia.com/data/minwage.dta. This minwage.dta dataset is contains information collected from fast food restaurants in New Jersey and eastern Pennsylvania during two interview waves: the first in March of 1992 and the second in November-December of the same year. Between these two interview waves – on April 1st to be precise – the New Jersey minimum wage increased by just under 19%, from $4.25 to $5.05 per hour. The minimum wage in Pennsylvania was unchanged during this period: $4.25 per hour. In the exercises that follow, you’ll apply a difference-in-differences approach to this dataset to explore the effects of raising the minimum wage.

Here is a description of the variables from minwage.dta that you will need to complete the problem. When you see a pair of variables in the table below, e.g. fte / fte2, both measure the same thing but the one with the 2 is based on the second survey wave, while the one without the 2 is based on the first survey wave. Each row corresponds to a restaurant:

Name Description
state Dummy variable = 1 for NJ, = 0 for PA
wage_st / wage_st2 Starting wage in dollars/hour at the restaurant
fte / fte2 Full-time equiv. employment = #(Full time employees) + #(Part-time Employees)/2. Excludes managers.
chain Categorical variable taking values in \(\{1, 2, 3, 4\}\) to indicate the four chains in the dataset: Burger King, KFC, Roy Rogers, and Wendy’s
co_owned Dummy variable = 1 if restaurant is company-owned, =0 if franchised
sample Dummy variable = 1 if wage and employment data are available for both survey waves at this restaurant
  1. Preliminaries:
    1. Download the data and load it in R using an appropriate package.
    2. Restrict the sample to only those restaurants with sample equal to 1 to ensure that we are making an apples-to-apples comparison throughout the remainder of this exercise.
    3. Rename the column state to treat.
    4. Create a new column called state that equals PA if treat is 0 and NJ if treat is 1.
    5. Create a column called low_wage that takes the value 1 if wage_st is less than 5.
  2. Baseline Diff-in-Diff Estimate: starting wages
    1. Calculate the average wage in each survey wave separately for each state.
    2. Calculate the within-state time-differences based on (a).
    3. Calculate the between-state difference-in-differences based on (c).
    4. Interpret your findings from (c). What do they tell us about the causal effect of increasing the minimum wage? What assumptions are required for this interpretation to be valid?
  3. Baseline Diff-in-Diff Estimate: full time equivalent employment
    1. Repeat question 2 but using full-time equivalent employment as the outcome variable rather than starting wages.
  4. Reshape minwage for Diff-in-Diff regression estimation. You should end up with a tibble called both_waves with a row for each restaurant-time period combination. In addition to the columns state, treat, wage_st, fte, chain, co_owned, it should include a restaurant id variable and a dummy variable called post that indicates whether the observation is from before or after the minimum wage increase in NJ.
  5. Diff-in-Diff Regression Estimates:
    1. Consider the following regression model using the variables treat and post constructed above: \[Y_{i,s,t} = \beta_0 + \beta_1 (\texttt{treat}_{i,s}) + \beta_2 (\texttt{post}_t) + \beta_3 (\texttt{treat}_{i,s} \times \texttt{post}_t) + \epsilon_{i,s,t}\] where \(i\) indexes restaurants, \(s\) indexes states, and \(t\) indexes time periods, i.e. the two survey waves. Explain the meaning of each of the four regression coefficients. Which one gives the Regression differences-in-differences effect?
    2. Estimate the regression from part (a) based on both_waves using wage_st as the outcome variable. Summarize your results, including appropriate statistical inference. If you choose to cluster, at what level do you cluster and why? How do your results compare to those that you calculated in question 2 above?
    3. Estimate the regression from part (a) based on both_waves using fte as the outcome variable. Summarize your results, including appropriate statistical inference. How do they compare to those that you calculated in question 3 above?
    4. An advantage of the regression-based formulation of differences-in-differences is that it allows us to control for other variables that might affect wages and employment. Repeat parts (b) and (c) adding co_owned and dummy variables for each of the four restaurant chains to your regression.
    5. How do your results from part (d) compare with those of parts (b) and (c)?
  6. Probing the Diff-in-Diff Assumptions:
    1. What assumptions are required for the diff-in-diff approach to provide a valid causal estimate of the effects of New Jersey raising its minimum wage?
    2. An alternative to the comparison of NJ and PA restaurants is a within NJ comparison. The key insight here is that only restaurants with starting wages below $5 per hour in the first wave will be affected by the change in minimum wages. Use the variable low_wage to run this alternative to the regression from 5(a) using only observations from NJ. Discuss your findings.
    3. What assumptions are needed for the DD estimate from (b) to be reliable? How plausible is this assumption compared to the assumption from (a)?
    4. Repeat part (b) but restrict attention to restaurants in PA where there was no change in minimum wages. Discuss your findings. What do these results suggest about the plausibility of the diff-in-diff assumptions in part (b)?

Solutions

Solution to Part 1 - Preliminaries

library(tidyverse)
library(haven)
minwage <- read_dta('https://ditraglia.com/data/minwage.dta')
minwage <- minwage |>  filter(sample == 1) |> 
  rename(treat = state) |> 
  mutate(state = case_when(treat == 0 ~ 'PA',
                           treat == 1 ~ 'NJ'),
         low_wage = 1 * (wage_st < 5))

Solution to Part 2 - Baseline Diff-in-Diff Starting Wages

DinD_wage <- minwage |>  group_by(state) |>  
  summarize(mean_wage_st = mean(wage_st), 
            mean_wage_st2 = mean(wage_st2)) |> 
  mutate(diff = mean_wage_st2 - mean_wage_st)
DinD_wage
# A tibble: 2 × 4
  state mean_wage_st mean_wage_st2    diff
  <chr>        <dbl>         <dbl>   <dbl>
1 NJ            4.61          5.08  0.469 
2 PA            4.65          4.62 -0.0348
with(DinD_wage, diff[1] - diff[2])
[1] 0.5040066

Solution to Part 3 - Baseline Diff-in-Diff FTE Employment

DinD_emp <- minwage |>  group_by(state) |>  
  summarize(mean_fte = mean(fte), 
            mean_fte2 = mean(fte2)) |> 
  mutate(diff = mean_fte2 - mean_fte)
DinD_emp
# A tibble: 2 × 4
  state mean_fte mean_fte2   diff
  <chr>    <dbl>     <dbl>  <dbl>
1 NJ        17.3      17.6  0.287
2 PA        20.1      18.1 -2.02 
with(DinD_emp, diff[1] - diff[2])
[1] 2.301994

Solution to Part 4 - Reshape dataset

wave1 <- minwage |>  
  select(state, treat, wage_st, fte, chain, co_owned, low_wage) |> 
  mutate(post = 0, id = 1:n())
wave2 <- minwage |>  
  select(state, treat, wage_st2, fte2, chain, co_owned, low_wage) |> 
  mutate(post = 1, id = 1:n()) |> 
  rename(wage_st = wage_st2, fte = fte2)
both_waves <- bind_rows(wave1, wave2) |> 
  arrange(id)

Solution to Part 5 - Diff-in-Diff Regression Results

library(modelsummary)
library(estimatr)

# We don't appear to have 

reg_wage1 <- lm_robust(wage_st ~ treat + post + treat:post, both_waves, 
                       clusters = id)
reg_emp1 <- lm_robust(fte ~ treat + post + treat:post, both_waves, 
                      clusters = id)

#---------- (d) control for co_owned and chain 
both_waves <- both_waves |>  mutate(chain = as.factor(chain))
reg_wage2 <- lm_robust(wage_st ~ treat + post + treat:post + co_owned + chain, 
                       both_waves, clusters = id)
reg_emp2 <- lm_robust(fte ~ treat + post + treat:post + co_owned + chain, 
                      both_waves,  clusters = id)

results <- list(reg_wage1, reg_wage2, reg_emp1, reg_emp2)
reg_table <- modelsummary(results, 
                          fmt = 2, 
                          output = 'gt',
                          stars = TRUE,
                          gof_omit = 'Log.Lik|R2|R2 Adj.|AIC|BIC|F|RMSE', 
                          title = 'Diff-in Diff Regression Estimates')
library(gt)
reg_table |> 
  tab_spanner(label = 'Starting Wage', columns = 2:3) |> 
  tab_spanner(label = 'Full-time Equiv. Employment', columns = 4:5)  
Diff-in Diff Regression Estimates
Starting Wage Full-time Equiv. Employment
(1) (2) (3) (4)
(Intercept) 4.65*** 4.59*** 20.11*** 22.56***
(0.04) (0.04) (1.50) (1.52)
treat -0.04 -0.04 -2.84+ -2.14
(0.05) (0.05) (1.59) (1.42)
post -0.03 -0.03 -2.02 -2.02
(0.05) (0.05) (1.38) (1.38)
treat × post 0.50*** 0.50*** 2.30 2.30
(0.05) (0.05) (1.46) (1.46)
co_owned 0.07** -1.01
(0.03) (0.74)
chain2 0.02 -10.16***
(0.03) (0.70)
chain3 0.05 -1.35
(0.03) (1.04)
chain4 0.12** -1.37
(0.04) (1.17)
Num.Obs. 702 702 702 702
Std.Errors by: id by: id by: id by: id
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Solution to Part 6 - Probing the Diff-in-Diff Assumptions

nj_only <- both_waves |>  filter(state == 'NJ')
pa_only <- both_waves |>  filter(state == 'PA')

# Within NJ comparison
nj_wage <- lm_robust(wage_st ~ low_wage + post + low_wage:post, nj_only, 
                     clusters = id)
nj_fte <- lm_robust(fte ~ low_wage + post + low_wage:post, nj_only, 
                    clusters = id)

# Within PA comparison
pa_wage <- lm_robust(wage_st ~ low_wage + post + low_wage:post, pa_only,
              clusters = id)
pa_fte <- lm_robust(fte ~ low_wage + post + low_wage:post, pa_only, 
             clusters = id)

state_results <- list(nj_wage, pa_wage, nj_fte, pa_fte)
state_table <- modelsummary(state_results,  
                            fmt = 2,  
                            output = 'gt', 
                            stars = TRUE, 
                            gof_omit = 'Log.Lik|R2|R2 Adj.|AIC|BIC|F|RMSE',  
                            title = 'Within State Results')

state_table |> 
  tab_spanner(label = 'Starting Wage', columns = 2:3) |> 
  tab_spanner(label = 'Full-time Equiv. Employment', columns = 4:5) |> 
  cols_label(`(1)` = 'NJ',
             `(2)` = 'PA',
             `(3)` = 'NJ',
             `(4)` = 'PA')
Within State Results
Starting Wage Full-time Equiv. Employment
NJ PA NJ PA
(Intercept) 5.11*** 5.07*** 18.99*** 20.70***
(0.02) (0.03) (1.15) (3.06)
low_wage -0.65*** -0.63*** -2.23+ -0.89
(0.03) (0.05) (1.29) (3.47)
post 0.00 -0.27*** -2.25* -3.85
(0.03) (0.06) (0.99) (2.64)
low_wage × post 0.62*** 0.35*** 3.30** 2.81
(0.03) (0.08) (1.12) (3.08)
Num.Obs. 570 132 570 132
Std.Errors by: id by: id by: id by: id
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001