2.3 Deriving the Least Squares Estimators

Estimators of parameters of a linear model

Suppose we hypothesize that a random variable \(Y\) depends linearly on another random variable \(X\). This relationship can be expressed as: \(y_i = \beta_0 + \beta_1 x_i + u_i\).

Here, \(\beta_0\) (the intercept) and \(\beta_1\) (the slope) are parameters that describe the linear relationship between \(X\) and \(Y\), and \(u_i\) represents the unobservable disturbance term.

Given a sample of \(X\) and \(Y\), we aim to derive unbiased estimators for \(\beta_0\) and \(\beta_1\). These estimators allow us to combine observations of \(X\) and \(Y\) to estimate the underlying relationship between the two variables.

Exercise 1: Find the equation of the line (in slope-intercept form) through the points (x, y) = (0, 4) and (2, 6).

The method of least squares, published by Adrien Marie Legendre in 1805 (though Carl Friedrich Gauss may have discovered it earlier), revolutionized statistics by providing a way to combine multiple observations to find underlying relationships. It solved two major scientific challenges of the early 1800s: helping geodesists calculate Earth’s circumference by combining varied city distance and star angle measurements, and enabling astronomers to determine the orbit of the newly discovered Ceres from limited observations. This mathematical technique remains the foundation of modern statistical analysis.

Summation Rules

Exercise 2: If \(x = \{ 3, 7, 2, 5\}\), calculate \(\sum_i x_i\).

Exercise 3: True or false: \(\sum_i 3 x_i = 3 \sum_i x_i\).

Exercise 4: True or false: \(\sum_i 3 x_i y_i = 3 \sum_i x_i \sum_i y_i\).

Notation: Linear Models

Here’s a preview of the notation I’ll use in the next part of this chapter. Basically it’s just saying some true linear model exists between \(X\) and \(Y\), which is \(y_i = \beta_0 + \beta_1 x_i + u_i\). Then when I’m referring to our estimates of the intercept and slope, I’ll put hats over the \(\beta\)s and turn the disturbance \(u_i\) into residuals \(e_i\) because they turn out to be pretty different conceptually: \(y_i = \hat{\beta_0} + \hat{\beta_1}x_i + e_i\).

Symbol Meaning Example
\(\beta_0\) Intercept parameter in a linear model \(y_i = \beta_0 + \beta_1 x_i + u_i\)
\(\beta_1\) Slope parameter in a linear model see above
\(y_i\) dependent variable or outcome variable see above
\(x_i\) explanatory variable see above
\(u_i\) unobservable term, disturbance, shock see above
\(\hat{\beta}_0\) Estimate of the intercept \(y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + e_i\)
\(\hat{\beta}_1\) Estimate of the slope see above
\(e_i\) residuals see above

Exercise 5: Fill in the blanks by selecting one: OLS residuals are the (vertical/horizontal) distances between the observation and the (true/estimated) linear model.

Least Squares: Combining Observations

One reason the method of least squares is so popular is that it’s so simple and mathematically tractable: the entire procedure can be summed up in one statement: the method of least squares fits a linear model that minimizes the sum of the squared residuals.

In the next few videos, we’ll see that for a simple regression, we can take that statement of the method of least squares and derive:

\[\hat{\beta_0} = \bar{y} - \beta_1 \bar{x}\]

\[\hat{\beta_1} = \frac{\sum_i x_i y_i - \bar{x}\bar{y}n}{\sum_i x_i^2 - \bar{x}^2 n}\]

Deriving OLS Estimators \(\hat{\beta_0}\) and \(\hat{\beta_1}\)

Suppose we sampled random variables X and Y to get this data:

X Y
1 4
2 2
3 1

Exercise 6: Draw a plot of \(x_i\) and \(y_i\) and eyeball a line of best fit. What is the slope and intercept of that line (approximately)?

Exercise 7: Use the formulas we derived in this assignment to find the slope and intercept of the OLS line of best fit: \(y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + u_i\). Here are the formulas you can use: \(\hat{\beta_0} = \bar{y} - \hat{\beta_1}\bar{x}\) and \(\hat{\beta_1} = \frac{\sum_{i=1}^n (x_i y_i) - \bar{x}\bar{y}n}{\sum_{i = 1}^n (x_i^2) - \bar{x}^2n}\). Verify that your estimates for \(\beta_0\) and \(\beta_1\) are on track by comparing these answers to your answers to the previous question.

Exercise 8: Fill out the table below to calculate the OLS fitted values (\(\hat{y_i} = \hat{\beta_0} + \hat{\beta_1} x_i\)) and residuals (\(e_i = y_i - \hat{y_i}\)). Verify that the fitted values you calculate are correct by making sure that (\(x_i\), \(\hat{y_i}\)) lays on the line of best fit you drew in question 1.

x y \(\hat{y_i}\) \(e_i\)
1 4 \(\underline{\hspace{1cm}}\) \(\underline{\hspace{1cm}}\)
2 2 \(\underline{\hspace{1cm}}\) \(\underline{\hspace{1cm}}\)
3 1 \(\underline{\hspace{1cm}}\) \(\underline{\hspace{1cm}}\)

Exercise 9: Suppose that the true relationship between \(X\) and \(Y\) is actually this: \(y_i = 5 - x_i + u_i\). So the true values for \(\beta_0 = 5\) and \(\beta_1 = -1\). Calculate the true disturbance \(u_i\). Is it required that \(\sum_{i = 1}^n u_i = 0\) or \(\sum_{i = 1}^n x_i u_i = 0\) like it is with \(e_i\)?

x y \(u_i\)
1 4 \(\underline{\hspace{1cm}}\)
2 2 \(\underline{\hspace{1cm}}\)
3 1 \(\underline{\hspace{1cm}}\)

Exercise 10: consider the linear model without an intercept \(y = \beta_0 x + u\). Recall that OLS minimizes the sum of squared residuals. Write down what that means in this context mathematically, and then take first order conditions. Show that the OLS estimator is \(\hat{\beta_0} = \frac{\sum_i x_i y_i}{\sum_i x_i^2}.\)