install.packages("tidyverse", dependencies = TRUE)
library(tidyverse)
The Tidy Econometrics Workbook
An Undergraduate’s Guide to Causal Inference, Instrumental Variables, and Time Series with R
Course Objectives
This workbook is a companion text for my EC421 Econometrics course. EC421 is the second and final course in the UO’s undergraduate Econometrics sequence. By the end of this course, you will:
- Learn to program in R in order to solve all kinds of data analytics and simulation problems
- Solidify your understanding of probability, statistics, and the versatility of the method of least squares
- Understand how the crucial assumption of exogeneity allows OLS estimators to have a causal interpretation
- Learn about the difference between estimators that are consistent and estimators that are unbiased
- Understand what happens when some of our standard OLS assumptions are violated (like homoskedasticity and no autocorrelation)
- Learn about major themes in time series modeling like random walks and spurious regressions
- Understand some strategies to be able to do causal inference when exogeneity cannot be assumed (instrumental variables and differences-in-differences).
What Students Are Saying About This Course
“Excellent layout by reversing the way we did class, lectures from videos with mini quizzes outside of class and actual work in class where we can get help from the teacher. Perfect lay out for how Econometrics should be taught.”
“The organization of this course is”flipped” from traditional classes. This allows students to learn the material via video lectures prior to class and then put the topics students learned through lectures to use through in-class activities. This was extremely beneficial to my own learning and I feel the class structure allows every potential question to be answered.”
“I really enjoyed being able to work through the koans/watch the lectures on my own and then have those same concepts mirrored in the classworks was really helpful because I feel that it helped me to interact with the material, and meant that I benefited from the support of the instructor and my fellow classmates while working through complex subjects. I feel as though I have picked up concepts much quicker in this course, since lectures were directly followed by those active learning practices. I also really appreciated the lecture review sheet assignments- it was something that I likely would not have done on my own, but that helped me immensely as I was studying for the final and working to comprehend lecture material.”
“Prof O’Briant made sure that almost every time we were learning something, we were engaging with it actively. When learning to code in R, she gives us interactive assignments that teach us and then make us apply that new knowledge in a problem - the fastest and best way I’ve ever learned to code. When watching lectures, she rewards us for taking notes and distilling them into the most important info. And finally, every class period, we collaborate with a small group of other students on problem sets that are very challenging, but with enough collaboration with students and support from the prof, possible to do, and very growthful/educational.”
“This is my first time to have a flipped class and I really like it. I interact with the teachers way more frequently than in other classes. I also like the fact we can finish most of the group work in class time so we don’t have schedule extra meeting time outside class, which can be quite stressful. I also like the interactons with the teachers. They are very encouraging and helpful. Conceptual and technical problems are all explained very clearly.”
“I think this class helps me solidify my understanding of OLS and R. Group work during class time is a great arrangement because we don’t have to schedule for meetings outside class which can be stressful. I don’t particularly like group work but I agree that it helps solidify our knowledge when we ask and explain to each others.”
“I really liked that there was so much work (i.e. classworks) because it was a really good opportunity to engage with the material and really understand it before the midterm.”
“I want more koans, they were great!”
Programming in R
Taking your time
We learn to program, same as we learn anything, by forming a mental model of how the framework works, and then trying to use that framework to solve our own problems. Sadly, there are lots of ways you can go wrong here: wrong syntax is pretty obvious because you’ll get errors, but wrong mental models can stick around for a lifetime and they make programming a real uphill battle.
I’ve spent a lot of time over the past couple (and I’ve taken a lot of time from the smartest programmers and R people I know) to get this part of the course right. Hopefully you’ll find that solving problems using the tidyverse is simple, easy, and natural. But if you skim through the koans and do them as fast as possible, things will not work out! Remember, the koans aren’t busywork to be done for speed. They will help you build your mental model and learn the syntax, but only if you take your time, read them carefully, and reflect.
Declarative vs Imperative Programming
One wrong way of programming in the tidyverse is to mix programming paradigms (declarative and imperative). This is by far the most commonplace bad behavior that you’ll see in people’s R code. The tidyverse is declarative. But you’ll see a lot of R code online that’s imperative, which is written in base R. Mixing the two paradigms makes for confusing, sloppy, and complicated code.
Declarative vs imperative: what’s the difference? Imperative programming is relatively low-level: you think in terms of manipulating values using for loops and if statements. Declarative programming is programming at a higher abstraction level: you make use of handy functions (AKA abstractions) to manipulate large swaths of data at one time instead of going value-by-value.
A good metaphor for the difference between imperative and declarative programming is this: suppose I’m trying to help you drive from your house to school. Imperative programming is when I send you turn-by-turn directions, and declarative programming is when I tell you to just put “University of Oregon” into your GPS. With declarative programming, I can declare what I want you to do without telling you exactly how I want you to do it like with imperative programming. Telling you to put “University of Oregon” into your GPS has advantages over giving you turn-by-turn directions: the GPS may have information about traffic and road closures that I’m not aware of. And the declarative approach is much easier for me: I could help the whole class get from their houses to the university by telling everyone to put “University of Oregon” into their GPS’s, while sending each person their own set of turn-by-turn instructions would be a lot more work. Likewise, when you use the tidyverse’s abstractions like filter()
, mutate()
, map()
, reduce()
, and all of ggplot2’s great plotting functions, you’re taking advantage of the fact that the engineers who built those functions know tricks in R that you may not be aware of to make things run smoothly. And when you’re programming declaratively, you can continue thinking about your problem at a high level instead of getting weighed down by nitty-gritty details. When it comes to data analysis, declarative programming has a lot of huge benefits.
But under the hood, all these great tidyverse functions are just a few for loops and if statements. Imperative programming certainly has its time and place, and that time and place is when your problems include implementing an algorithm by hand. If you’re interested, I highly recommend Project Euler for teaching yourself imperative programming. But imperative programming is not something you’ll need in this workbook. You may have mixed declarative with imperative programming in previous classes, but we’ll stay strictly in the declarative territory for data analytics in this class.
Things to Avoid when Programming Declaratively in the Tidyverse
Use these only when you’re programming imperatively in base R:
- for and while loops (we’ll use
map()
instead) - if statements (we’ll use the vectorized function from dplyr
if_else()
) matrix()
orarray()
(our 2d data structure of choice is thetibble()
)$
syntax for extracting a column vector from a tibble. We avoid this because our workflow goes like this: vectors go into tibbles and we do data analysis on tibbles. Going from tibbles to vectors (what$
lets you do) is the reverse of what we need, so we avoid it in this class. It just causes unnecessary headaches!
One more thing: I often see students using assignment <-
wayyyy too much. If you’re creating a variable for something, and you only use that thing one other time, and naming that thing doesn’t help the readability of your code, why are you creating that variable? If you let your default be “no assignment” instead of “always assignment”, then your code will be much prettier and your global environment will stay clean, which prevents lots of confusion.
Setting up your workspace
Grab a Computer
First thing first, you should decide which computer you’d like to do your programming assignments on. It can be a Mac, Windows, or Linux machine: all are equally good. If you have a laptop, use that, so that you can bring it into class. I do absolutely everything on my little macbook air laptop. Please let me know ASAP if you don’t have a computer to program on (chromebooks and ipads won’t work but there are workarounds I can discuss with you).
Download and Install R
Do this even if you installed R on your computer for a previous class. Following these instructions again will just update your version, which is a good thing.
Go here: https://cran.r-project.org/ and follow the instructions to download R for your Linux, Windows, or Mac. You should download the latest release.
Mac users: make sure you know whether you have an Apple silicon mac or an older intel-based mac and make sure that you download the correct version of R. If you’re not sure, hit the apple symbol in the upper left of your screen, go to About This Mac, and it should have the information there. For instance, mine says “MacBook Air: M1 2020, Chip: Apple M1”. I have an M1 macbook, so I’d download the version of R for Apple silicon macs. If yours says anything about Intel, download the intel version of R.
Mac users (again): install xquartz: https://www.xquartz.org/.
An alternative: R and RStudio are both already installed on all academic workstations at UO. The downside is the limited hours, especially on weekends.
Having issues with this step? Try doing your downloads at home instead of on campus. The campus wifi can sometimes be too slow, corrupting the files you’re trying to download.
Install the Tidyverse
Open up RStudio and run these lines of code in your console to make sure you have the tidyverse installed and attached to your current session. If you aren’t sure how to do that, stop here and wait until the first day of class and I’ll talk about it.
Install gapminder
You’ll use this package a lot in the koans.
install.packages("gapminder")
library(gapminder)
Install a few Packages we’ll use for Plots
install.packages("gganimate", dependencies = TRUE)
install.packages("hexbin")
Install qelp
qelp
(quick help) is an alternative set of beginner friendly help docs I created (with contributions from previous EC421 students) for commonly used functions in R and the tidyverse. Once you have the package installed, you can access the help docs from inside RStudio.
install.packages("Rcpp", dependencies = TRUE)
install.packages("devtools", dependencies = TRUE)
library(devtools)
install_github("cobriant/qelp")
Now run:
::install.packages ?qelp
If everything went right, the help docs I wrote on the function install.packages
should pop up in the lower right hand pane. Whenever you want to read the qelp docs on a function, you type ?
, qelp
, two colons ::
which say “I want the help docs on this function which is from the package qelp”, and then the name of the function you’re wondering about.
Install the Tidyverse Koans
Visit the koans on github.
Click on the green button that says Code
and then hit Download ZIP
.
Find the file (probably in your downloads folder). On Macs, opening the file will unzip it. On Windows, you’ll right-click and hit “extract”. Then navigate to the new folder named tidyverse_koans-main
and double click on the R project tidyversekoans.Rproj
. RStudio should open. If it doesn’t, open RStudio and go to File > Open Project
and then find tidyversekoans.Rproj
.
In RStudio, go to the lower righthand panel and hit the folder R
. This takes you to a list of 20 exercises (koans) you’ll complete as homework over the course of the quarter. The first 3 (K01_vector
, K02_tibble
, and K03_pipe
) will be due before the second week of class.
Open the first koan: K01_vector.R
. Before you start, modify 2 keybindings:
First, make it so that you can hit Cmd/Ctrl Shift K
to compile a notebook:
Macs: Tools > Modify keyboard shortcuts > filter for Compile Notebook > Cmd Shift K
Windows: Tools > > Modify keyboard shortcuts > filter for Compile Notebook > Ctrl Shift K
Second, make it so that you can hit Cmd/Ctrl Shift T
to run the test for only the active koan instead of all the koans:
Macs: Tools > Modify keyboard shortcuts > Run a test file > Cmd Shift T
Windows: Tools > Modify keyboard shortcuts > Run a test file > Ctrl Shift T
Now hit Cmd/Ctrl Shift T
(Cmd Shift T on a mac; Ctrl Shift T on windows). You’ve just tested the first koan. You should see:
[ FAIL 0 | WARN 0 | SKIP 10 | PASS 0 ]
What does this mean? If there are errors in your R script, the test will not complete. Since it completed, you know there are no errors. Since FAIL
is 0, you also haven’t failed any of the questions yet. But PASS
is also 0, so you haven’t passed the questions either. Since they’re blank right now, the test will skip them. That’s why SKIP
is 10.
The tests are meant to help you figure out whether you’re on the right track, but they’re not perfect: if you keep failing the tests but you think your answer is correct, don’t spend too much time worrying about it. The tests are sometimes a little fragile… They’re a work in progress!
Go ahead and start working on the koans and learning about the tidyverse! There’s no need to wait until they’re due to start the koans. I find that the students who end up becoming the strongest programmers spend a lot of time making sure their koans are well done.
When you’re finished with a koan, make sure to run the tests one last time (Ctrl/Cmd Shift T
) and then publish an html verson of the document (Ctrl/Cmd Shift K
, and if that doesn’t do anything, change the keybinding for File > Compile Report
to be Ctrl/Cmd Shift K
). You’ll upload the html version to Canvas for me to grade.
One last thing: whenever you want to work on the koans, make sure you open RStudio by opening the tidyverse_koans-main
project, not just the individual koan file. If you open the koans in a session that’s not associated with the tidyverse_koans-main
project, the tests will fail to run. You can always see which project your current session is being associated with by looking at the upper right hand corner of RStudio: if you’re in the tidyverse_koans-main
project, you’ll see tidyverse_koans-main
up there. That’s good. If you’re in no project at all, you’ll see Project: (None)
up there. That’s not good, especially if you want the tests to run. If you see Project: (None)
, just click that text and you’ll be able to switch over to the tidyverse_koans-main
project.