Instructions: Open this R script in RStudio. Then hit Ctrl/Cmd Shift K to export this document to html and preview it (if Ctrl/Cmd Shift K doesn’t do anything, Tools > Modify Keyboard Shortcuts > reassign “Compile Notebook” to be Ctrl/Cmd Shift K). Upload your final html version onto Canvas by the time it is due.
lm()
Run the code below to get started. It uses the assignment operator
<-
a couple of times. For instance,
x <- 0:2
should be read “x gets 0:2”. When we run that
piece of code, we’re taking the vector 0:2
and giving it
the name x
. So after running x <- 0:2
, x +
1 will be equal to 1:3. Also notice that after you run the code below,
if you check the top right pane in RStudio, the Environment
tab should let you know that x now refers to 0:2
.
library(tidyverse)
x <- 0:2
u <- c(2, -1, 2)
y <- -3 + 3 * x + u
You now have access to 3 vectors of data: x
,
u
, and y
.
u
is random noise that’s drawn from a distribution with
mean = 0, but in any given sample may not necessarily have mean =
0.y
is generated from x
and u
according to the equation in the line that starts with
y <-
. This is called the true model or the
data generating process: it’s how y
was truly
generated.__
Take a look at the contents of x
, u
, and
y
:
x
## [1] 0 1 2
u
## [1] 2 -1 2
y
## [1] -1 -1 5
Notice that you can apply functions to your vectors like this:
x^2
## [1] 0 1 4
x/y
## [1] 0.0 -1.0 0.4
mean(x)
## [1] 1
Calculate OLS estimates \(\hat{\beta_0}\) and \(\hat{\beta_1}\) using only these functions:
+
, -
*
, /
^2
mean()
, sum()
, and
sqrt()
for square root.# b1 <- __
#
# print(b1)
#
# b0 <- __
#
# print(b0)
After doing those calculations, check that they match the
lm()
results:
tibble(x, y) %>%
lm(y ~ x, data = .) %>%
broom::tidy()
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -2 2.24 -0.894 0.535
## 2 x 3 1.73 1.73 0.333
Check that you have the correct answer:
tibble(x, y) %>%
lm(y ~ x, data = .) %>%
fitted.values()
## 1 2 3
## -2 1 4
tibble(x, y) %>%
lm(y ~ x, data = .) %>%
residuals()
## 1 2 3
## 1 -2 1
plot()
This problem is open-ended. You’ll create 3 visualizations of a small
dataset. You’ll use plot()
for drawing the graphs (or if
you’re already comfortable with ggplot
, you’re welcome to
use that instead).
Run this code to get started:
world_data <- tibble(
country = c("Brazil", "Brazil", "Brazil", "Brazil",
"Haiti", "Haiti", "Haiti", "Haiti",
"Puerto Rico", "Puerto Rico", "Puerto Rico", "Puerto Rico",
"Colombia", "Colombia", "Colombia", "Colombia"),
year = c(1952, 1972, 1992, 2007, 1952, 1972, 1992, 2007,
1952, 1972, 1992, 2007, 1952, 1972, 1992, 2007),
gdpPercap = c(2108.944, 4985.711, 6950.283, 9065.801,
1840.367, 1654.457, 1456.310, 1201.637,
3081.960, 9123.042, 14641.587, 19328.709,
2144.115, 3264.660, 5444.649, 7006.580),
lifeExp = c(50.9, 59.5, 67.1, 72.4, 37.6, 48, 55.1, 60.9,
64.3, 72.2, 73.9, 78.7, 50.6, 61.6, 68.4, 72.9)
)
Your data is called world_data
. It tells you the GDP per
capita and life expectancy for four different countries during four
different years.
print(world_data)
## # A tibble: 16 × 4
## country year gdpPercap lifeExp
## <chr> <dbl> <dbl> <dbl>
## 1 Brazil 1952 2109. 50.9
## 2 Brazil 1972 4986. 59.5
## 3 Brazil 1992 6950. 67.1
## 4 Brazil 2007 9066. 72.4
## 5 Haiti 1952 1840. 37.6
## 6 Haiti 1972 1654. 48
## 7 Haiti 1992 1456. 55.1
## 8 Haiti 2007 1202. 60.9
## 9 Puerto Rico 1952 3082. 64.3
## 10 Puerto Rico 1972 9123. 72.2
## 11 Puerto Rico 1992 14642. 73.9
## 12 Puerto Rico 2007 19329. 78.7
## 13 Colombia 1952 2144. 50.6
## 14 Colombia 1972 3265. 61.6
## 15 Colombia 1992 5445. 68.4
## 16 Colombia 2007 7007. 72.9
Read the qelp docs for qplot()
to find out how to use
it:
?qelp::qplot
Rubric:
color
or fill
at least one
time.Write your code for your 3 plots here: