12  Models with Squared Terms

Reading: If you want a little more information about the topics in this chapter, take a look at Dougherty Chapter 4.1 and Chapter 4.3.

When you’re reading papers in applied economics, you’ll often see models with transformations of variables (squared, interacted with other variables, logs of variables). This chapter and the next one offers some explanation about why you’ll see those things. All of these models can be estimated using OLS because they are either linear in parameters \(beta\), or they can be transformed into a model that is linear in parameters.

There are lots of instances where nonlinear relationships are more plausible than linear ones. Consider the students dataset again. I’ll coerce study_time and alcohol to be numeric.

library(tidyverse)
students <- read_csv("https://raw.githubusercontent.com/cobriant/students_dataset/main/students.csv") %>%
  mutate(
    study_time = case_when(
      study_time == "less than 2H" ~ 1,
      study_time == "2 - 5H" ~ 3.5,
      study_time == "5 - 10H" ~ 7.5,
      study_time == "more than 10H" ~ 12),
    alcohol = case_when(
      alcohol == "very low" ~ 1,
      alcohol == "moderately low" ~ 2,
      alcohol == "medium" ~ 3,
      alcohol == "moderately high" ~ 4,
      alcohol == "very high" ~ 5
    ))

students %>%
    lm(final_grade ~ study_time, data = .) %>%
    broom::tidy()
# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   72.1       1.05      68.8  1.47e-213
2 study_time     0.411     0.210      1.95 5.14e-  2

Fitting the model \(final\_grade = \beta_0 + \beta_1 study\_time + u\) gives us an estimate for \(\beta_1\) of 0.411 which is significant at the 0.10 level. The interpretation is that, for every extra hour per week a student spends studying, their final grade is expected to increase by 0.411 percentage points. But if someone studied for 100 hours per week, would they continue to see just as high returns per hour of studying? As an economist, you might be doubtful: something like studying would likely have diminishing marginal returns. And even with very limited data, that’s what seems to be going on:

students %>%
  ggplot(aes(x = study_time, y = final_grade)) +
  geom_jitter() +
  geom_smooth(
    method = nls, # nls stands for "nonlinear least squares"
    formula = y ~ a + b * x + c * I(x^2), # you can specify a model here
    method.args = list(start = c(a = 1, b = 1, c = 1)), # you need to give the nls starting points for its search to find estimates
    se = F # turn off standard error ribbons
  )

Grades improve with study time, but after a while, the improvement starts slowing down. You can model diminishing marginal returns by adding a squared term to your model like this:

students %>%
    lm(final_grade ~ study_time + I(study_time^2), data = .) %>%
    broom::tidy()
# A tibble: 3 × 5
  term            estimate std.error statistic   p.value
  <chr>              <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)      71.8       1.68      42.7   3.02e-145
2 study_time        0.575     0.718      0.800 4.24e-  1
3 I(study_time^2)  -0.0138    0.0581    -0.238 8.12e-  1

The capital I in the I(study_time^2) stands for “inhibit”. It “inhibits the evaluation” of study_time^2 so that lm understands that we want to add the square of the variable to the model as an explanatory variable.

The interpretation of this model: grades rise with study time, but that effect eventually wears off (although neither parameter is statistically significant). Going from 0 to 1 hour of studying is expected to increase your grade by .575 - .0138 = .5612 percentage points, but going from 10 to 11 hours is only expected to increase your grade by .575 * (11 - 10) - .0138 * (11^2 - 10^2) = .285 percentage points. In fact, the marginal contribution of studying is \(0.575 - 2 \times .0138 \times study\_time\).

Exercise 1: When should we expect that x^2 is perfectly correlated with x?
a) Always
b) Whenever x is nonnegative
c) Whenever x only takes on the values 0 and 1
d) Never

Exercise 2: Suppose we estimated these coefficients for this model: \(final\_grade = 75 + .62 study\_hours - .04 study\_hours^2\). What would be the expected increase to your final grade when you go from 0 to 1 hours of studying per week?

Exercise 3: Continuing from the previous question, what would be the expected increase to your final grade when you go from 10 to 11 hours of studying per week?

Exercise 4: Continuing again, what would we be estimating the marginal contribution of studying to be?

Consider a new model with study_time, failures, absences, and alcohol, all with squared terms included:

students %>%
    lm(final_grade ~ study_time + I(study_time^2) + failures + I(failures^2) +
           absences + I(absences^2) + alcohol + I(alcohol^2), data = .) %>%
    broom::tidy()
# A tibble: 9 × 5
  term             estimate std.error statistic  p.value
  <chr>               <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)      76.8       3.04      25.2    4.69e-82
2 study_time        0.132     0.686      0.193  8.47e- 1
3 I(study_time^2)  -0.00315   0.0549    -0.0574 9.54e- 1
4 failures        -10.4       2.51      -4.14   4.24e- 5
5 I(failures^2)     1.87      0.966      1.94   5.32e- 2
6 absences          0.331     0.145      2.29   2.26e- 2
7 I(absences^2)    -0.00659   0.00290   -2.27   2.36e- 2
8 alcohol          -1.88      2.15      -0.872  3.84e- 1
9 I(alcohol^2)      0.248     0.389      0.638  5.24e- 1

Exercise 5: How should we interpret the estimates on failures and I(failures^2)?
a) Each class you failed the previous year increases your expected grade by 10.4 percentage points, and if you failed many classes the previous year, that effect increases.
b) Each class you failed the previous year increases your expected grade by 10.4 percentage points, but if you failed many classes the previous year, that effect starts to wear off.
c) Each class you failed the previous year brings your expected grade down by 10.4 percentage points, and if you failed many classes the previous year, the effect is worse.
d) Each class you failed the previous year brings your expected grade down by 10.4 percentage points, but if you failed many classes the previous year, that effect starts to wear off.

Exercise 6: How should we interpret the estimates on absences and I(absences^2)?
a) Each absence you have decreases your expected grade by 0.331 percentage points, and with lots of absences, that effect gets worse.
b) Each absence you have increases your expected grade by 0.331 percentage points, but with lots of absences, that effect starts to wear off.
c) Each absence you have increases your expected grade by 0.331 percentage points, and with lots of absences, that effect continues to increase.
d) Each absence you have decreases your expected grade by 0.331 percentage points, but with lots of absences, that effect starts to wear off.

12.1 Classwork