library(tidyverse)
work <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/laborforce1976.csv")
This dataset is adapted from the Wooldridge
econometrics text. It comes from a 1976 survey of 526 working
adults, so I’ll call it work
. It contains each person’s
hourly wage rate (in 1976 dollars), the number of years of education
they had completed, their tenure (the number of years they had been at
their current job), their race, their sex, and whether they were
married.
Answer the following questions either with a plot or by reporting summary statistics (or both).
received in this dataset.
years of education, or did men have more education?
Interpret the estimates for \(\beta_0\) and \(\beta_1\). Use
geom_smooth(method = lm)
in your visualization. On top of
omitted variable bias, we might also be worried about heteroskedasticity
in a model like this because people with higher levels of educational
attainment may have the flexibility to follow their passions, whether
that brings them toward higher or lower paying work. But people with low
educational attainment may find that their only options are minimum wage
jobs. Does visual inspection point to heteroskedasticity in this
model?
Does the Goldfeld-Quandt test find heteroskedasticity in the model above?
One way to do this is to create two variables:
# ssr1 <- __
# ssr2 <- __
which will hold the values for the sum of the squared residuals from
the model fit with the first 3/8 of the data, and then the last 3/8 (the
data must be sorted from low education to high education of course).
Then all you need to do is to compare ssr1 to ssr2 to find which is
bigger, and calculate the test statistic and critical value. For the
critical value, use
qf(.999, df1 = n_star - 2, df2 = n_star - 2)
where n_star
is 3/8 of the number of rows of the dataset, rounded to the nearest
integer. K is 2 in the case of a simple regression because the intercept
counts as a parameter in this test.
You can earn up to 2 points of extra credit by writing a function
goldfeld_quandt
that takes 4 arguments: a tibble, a formula
for the model in quotes (like “wage ~ educ”), an x variable which is the
explanatory variable for which we’re concerned about heteroskedasticity
(educ in this case), and K: the number of explanatory variables in the
model including the intercept. The function should be able to take
any tibble and model and run the Goldfeld-Quandt test for
heteroskedasticity on it. You can still use assignment inside the body
of a function:
# ssr1 <- __
# ssr2 <- __
It might also be a good idea to make a variable n_star
inside the function.
# n_star <- __
You should be able to call your function like this, and it should return a TRUE if heteroskedasticity is detected and a FALSE if not.
# goldfeld_quandt(work, "wage ~ educ", educ, k = 2)
For the sake of an example, here’s a function that implements the White test for heteroskedasticity. Your goldfeld quandt function will of course be pretty different because the two tests are pretty different.
white <- function(tb, formula, x) {
tb %>%
mutate(
resid = residuals(lm(formula, data = tb))
) %>%
lm(paste0("resid ~ ", x, " + I(", x, "^2)"), data = .) %>%
broom::glance() %>%
select(r.squared) %>%
mutate(
test_statistic = r.squared * nrow(tb),
critical_value = qchisq(.999, df = 2),
heteroskedasticity = test_statistic > critical_value
)
}
white(work, "wage ~ educ", "educ")
## # A tibble: 1 × 4
## r.squared test_statistic critical_value heteroskedasticity
## <dbl> <dbl> <dbl> <lgl>
## 1 0.0434 22.8 13.8 TRUE
white(work, "wage ~ educ", "educ")
indicates that theWhite test does detect heteroskedasticity in the “linear-linear” model. Does it detect heteroskedasticity in the log-linear or log-log models? Does the goldfeld-quandt test detect heteroskedasticity in the log-linear or log-loge models?
Hint: taking log(educ)
will throw an error because there
are some people in the dataset with 0 years of education (and log(0) =
-Inf). Instead, just add 1 to educ
.
heteroskedasticity associated with educ
? If so, which
adding variables do the trick?