1.2 dplyr

In this assignment, you’ll learn to use the tidyverse package dplyr (pronounced “d-plier” - where ‘d’ stands for data, and ‘plier’ refers to tools like pliers and wrenches). The name suggests using dplyr functions like you would use tools from a toolbox to manipulate your data. Personally, I’m not a big fan of the name: I think it could have been called something like “SQL for R”, because it implements SQL (Structured Query Language) within R.

What is SQL? SQL (pronounced either “sequel” or “S-Q-L”) has been the standard language for querying databases since the 1970s, allowing users to answer virtually any question about their data. With SQL, you can filter, sort, summarize, and combine data in powerful ways.

Why should dplyr have been called SQL for R? First, it would acknowledge that dplyr isn’t inventing something new - it’s bringing SQL’s time-tested capabilities to R. When you learn dplyr, you’re actually learning SQL concepts that transfer directly to database work. Second, the name would make it clear that whenever you have a question about your data, dplyr is the tool you should reach for, just as SQL is the go-to language for database queries. You’re not so much manipulating your data as you are writing queries for it.

Let’s look at some examples. If you have data about students’ study habits and grades, dplyr can answer questions like:

In this assignment, you’ll learn how to answer these questions and many more using dplyr.

In this assignment, you’ll learn how to answer these questions and more using dplyr. What makes dplyr (and SQL) remarkable is that just 7 core functions allow you to answer nearly any question about your data.

Select, filter, mutate

You’re now ready to do Koan 4, where you’ll practice using filter(), select(), and mutate().

Summarize and group by

Go ahead and complete Koan 5 on summarize() and group_by().

Arrange and slice

Finally, finish Koan 6 on arrange() and slice().