The Tidy Econometrics Workbook

An Undergraduate’s Guide to Causal Inference, Instrumental Variables, and Time Series with R

Author

Colleen O’Briant

Published

August 14, 2022

Course Objectives

This workbook is a companion text for my EC421 Econometrics course. EC421 is the second and final course in the UO’s undergraduate Econometrics sequence. By the end of this course, you will:

  • Learn to program in R in order to solve all kinds of data analytics and simulation problems
  • Solidify your understanding of probability, statistics, and the versatility of the method of least squares
  • Understand how the crucial assumption of exogeneity allows OLS estimators to have a causal interpretation
  • Learn about the difference between estimators that are consistent and estimators that are unbiased
  • Understand what happens when some of our standard OLS assumptions are violated (like homoskedasticity and no autocorrelation)
  • Learn about major themes in time series modeling like random walks and spurious regressions
  • Understand some strategies to be able to do causal inference when exogeneity cannot be assumed (instrumental variables and differences-in-differences).

Programming in R

Taking your time

We learn to program, same as we learn anything, by forming a mental model of how the framework works, and then trying to use that framework to solve our own problems. Sadly, there are lots of ways you can go wrong here: wrong syntax is pretty obvious because you’ll get errors, but wrong mental models can stick around for a lifetime and they make programming a real uphill battle.

I’ve spent a lot of time over the past couple (and I’ve taken a lot of time from the smartest programmers and R people I know) to get this part of the course right. Hopefully you’ll find that solving problems using the tidyverse is simple, easy, and natural. But if you skim through the koans and do them as fast as possible, things will not work out! Remember, the koans aren’t busywork to be done for speed. They will help you build your mental model and learn the syntax, but only if you take your time, read them carefully, and reflect.

Declarative vs Imperative Programming

One wrong way of programming in the tidyverse is to mix programming paradigms (declarative and imperative). This is by far the most commonplace bad behavior that you’ll see in people’s R code. The tidyverse is declarative. But you’ll see a lot of R code online that’s imperative, which is written in base R. Mixing the two paradigms makes for confusing, sloppy, and complicated code.

Declarative vs imperative: what’s the difference? Imperative programming is relatively low-level: you think in terms of manipulating values using for loops and if statements. Declarative programming is programming at a higher abstraction level: you make use of handy functions (AKA abstractions) to manipulate large swaths of data at one time instead of going value-by-value.

A good metaphor for the difference between imperative and declarative programming is this: suppose I’m trying to help you drive from your house to school. Imperative programming is when I send you turn-by-turn directions, and declarative programming is when I tell you to just put “University of Oregon” into your GPS. With declarative programming, I can declare what I want you to do without telling you exactly how I want you to do it like with imperative programming. Telling you to put “University of Oregon” into your GPS has advantages over giving you turn-by-turn directions: the GPS may have information about traffic and road closures that I’m not aware of. And the declarative approach is much easier for me: I could help the whole class get from their houses to the university by telling everyone to put “University of Oregon” into their GPS’s, while sending each person their own set of turn-by-turn instructions would be a lot more work. Likewise, when you use the tidyverse’s abstractions like filter(), mutate(), map(), reduce(), and all of ggplot2’s great plotting functions, you’re taking advantage of the fact that the engineers who built those functions know tricks in R that you may not be aware of to make things run smoothly. And when you’re programming declaratively, you can continue thinking about your problem at a high level instead of getting weighed down by nitty-gritty details. When it comes to data analysis, declarative programming has a lot of huge benefits.

But under the hood, all these great tidyverse functions are just a few for loops and if statements. Imperative programming certainly has its time and place, and that time and place is when your problems include implementing an algorithm by hand. If you’re interested, I highly recommend Project Euler for teaching yourself imperative programming. But imperative programming is not something you’ll need in this workbook. You may have mixed declarative with imperative programming in previous classes, but we’ll stay strictly in the declarative territory for data analytics in this class.

Things to Avoid when Programming Declaratively in the Tidyverse

Use these only when you’re programming imperatively in base R:

  • for and while loops (we’ll use map() instead)
  • if statements (we’ll use the vectorized function from dplyr if_else())
  • matrix() or array() (our 2d data structure of choice is the tibble())
  • $ syntax for extracting a column vector from a tibble. We avoid this because our workflow goes like this: vectors go into tibbles and we do data analysis on tibbles. Going from tibbles to vectors (what $ lets you do) is the reverse of what we need, so we avoid it in this class. It just causes unnecessary headaches!

One more thing: I often see students using assignment <- wayyyy too much. If you’re creating a variable for something, and you only use that thing one other time, and naming that thing doesn’t help the readability of your code, why are you creating that variable? If you let your default be “no assignment” instead of “always assignment”, then your code will be much prettier and your global environment will stay clean, which prevents lots of confusion.

Setting up your workspace

Grab a Computer

First thing first, you should decide which computer you’d like to do your programming assignments on. It can be a Mac, Windows, or Linux machine: all are equally good. If you have a laptop, use that, so that you can bring it into class. I do absolutely everything on my little macbook air laptop. Please let me know ASAP if you don’t have a computer to program on (chromebooks and ipads won’t work but there are workarounds I can discuss with you).

Download and Install R

Do this even if you installed R on your computer for a previous class. Following these instructions again will just update your version, which is a good thing.

Go here: https://cran.r-project.org/ and follow the instructions to download R for your Linux, Windows, or Mac. You should download the latest release.

Mac users: make sure you know whether you have an Apple silicon mac or an older intel-based mac and make sure that you download the correct version of R. If you’re not sure, hit the apple symbol in the upper left of your screen, go to About This Mac, and it should have the information there. For instance, mine says “MacBook Air: M1 2020, Chip: Apple M1”. I have an M1 macbook, so I’d download the version of R for Apple silicon macs. If yours says anything about Intel, download the intel version of R.

Mac users (again): install xquartz: https://www.xquartz.org/.

An alternative: R and RStudio are both already installed on all academic workstations at UO. The downside is the limited hours, especially on weekends.

Having issues with this step? Try doing your downloads at home instead of on campus. The campus wifi can sometimes be too slow, corrupting the files you’re trying to download.

Install the Tidyverse

Open up RStudio and run these lines of code in your console to make sure you have the tidyverse installed and attached to your current session. If you aren’t sure how to do that, stop here and wait until the first day of class and I’ll talk about it.

install.packages("tidyverse", dependencies = TRUE)
library(tidyverse)

Install gapminder

You’ll use this package a lot in the koans.

install.packages("gapminder")
library(gapminder)

Install a few Packages we’ll use for Plots

install.packages("gganimate", dependencies = TRUE)
install.packages("hexbin")

Install qelp

qelp (quick help) is an alternative set of beginner friendly help docs I created (with contributions from previous EC421 students) for commonly used functions in R and the tidyverse. Once you have the package installed, you can access the help docs from inside RStudio.

 install.packages("Rcpp", dependencies = TRUE)
 install.packages("devtools", dependencies = TRUE)
 library(devtools)
 install_github("cobriant/qelp")

Now run:

?qelp::install.packages

If everything went right, the help docs I wrote on the function install.packages should pop up in the lower right hand pane. Whenever you want to read the qelp docs on a function, you type ?, qelp, two colons :: which say “I want the help docs on this function which is from the package qelp”, and then the name of the function you’re wondering about.

Install the Tidyverse Koans

Visit the koans on github.

Click on the green button that says Code and then hit Download ZIP.

Find the file (probably in your downloads folder). On Macs, opening the file will unzip it. On Windows, you’ll right-click and hit “extract”. Then navigate to the new folder named tidyverse_koans-main and double click on the R project tidyversekoans.Rproj. RStudio should open. If it doesn’t, open RStudio and go to File > Open Project and then find tidyversekoans.Rproj.

In RStudio, go to the lower righthand panel and hit the folder R. This takes you to a list of 20 exercises (koans) you’ll complete as homework over the course of the quarter. The first 3 (K01_vector, K02_tibble, and K03_pipe) will be due before the second week of class.

Open the first koan: K01_vector.R. Before you start, modify 2 keybindings:

First, make it so that you can hit Cmd/Ctrl Shift K to compile a notebook:

Macs: Tools > Modify keyboard shortcuts > filter for Compile Notebook > Cmd Shift K

Windows: Tools > > Modify keyboard shortcuts > filter for Compile Notebook > Ctrl Shift K

Second, make it so that you can hit Cmd/Ctrl Shift T to run the test for only the active koan instead of all the koans:

Macs: Tools > Modify keyboard shortcuts > Run a test file > Cmd Shift T

Windows: Tools > Modify keyboard shortcuts > Run a test file > Ctrl Shift T

Now hit Cmd/Ctrl Shift T (Cmd Shift T on a mac; Ctrl Shift T on windows). You’ve just tested the first koan. You should see:

[ FAIL 0 | WARN 0 | SKIP 10 | PASS 0 ]

What does this mean? If there are errors in your R script, the test will not complete. Since it completed, you know there are no errors. Since FAIL is 0, you also haven’t failed any of the questions yet. But PASS is also 0, so you haven’t passed the questions either. Since they’re blank right now, the test will skip them. That’s why SKIP is 10.

The tests are meant to help you figure out whether you’re on the right track, but they’re not perfect: if you keep failing the tests but you think your answer is correct, don’t spend too much time worrying about it. The tests are sometimes a little fragile… They’re a work in progress!

Go ahead and start working on the koans and learning about the tidyverse! There’s no need to wait until they’re due to start the koans. I find that the students who end up becoming the strongest programmers spend a lot of time making sure their koans are well done.

When you’re finished with a koan, make sure to run the tests one last time (Ctrl/Cmd Shift T) and then publish an html verson of the document (Ctrl/Cmd Shift K, and if that doesn’t do anything, change the keybinding for File > Compile Report to be Ctrl/Cmd Shift K). You’ll upload the html version to Canvas for me to grade.

One last thing: whenever you want to work on the koans, make sure you open RStudio by opening the tidyverse_koans-main project, not just the individual koan file. If you open the koans in a session that’s not associated with the tidyverse_koans-main project, the tests will fail to run. You can always see which project your current session is being associated with by looking at the upper right hand corner of RStudio: if you’re in the tidyverse_koans-main project, you’ll see tidyverse_koans-main up there. That’s good. If you’re in no project at all, you’ll see Project: (None) up there. That’s not good, especially if you want the tests to run. If you see Project: (None), just click that text and you’ll be able to switch over to the tidyverse_koans-main project.