1.2 dplyr
In this assignment, you’ll learn to use the tidyverse package dplyr
(pronounced “d-plier” - where ‘d’ stands for data, and ‘plier’ refers to tools like pliers and wrenches). The name suggests using dplyr
functions like you would use tools from a toolbox to manipulate your data. Personally, I’m not a big fan of the name: I think it could have been called something like “SQL for R”, because it implements SQL (Structured Query Language) within R.
What is SQL? SQL (pronounced either “sequel” or “S-Q-L”) has been the standard language for querying databases since the 1970s, allowing users to answer virtually any question about their data. With SQL, you can filter, sort, summarize, and combine data in powerful ways.
Why should dplyr have been called SQL for R? First, it would acknowledge that dplyr isn’t inventing something new - it’s bringing SQL’s time-tested capabilities to R. When you learn dplyr, you’re actually learning SQL concepts that transfer directly to database work. Second, the name would make it clear that whenever you have a question about your data, dplyr is the tool you should reach for, just as SQL is the go-to language for database queries. You’re not so much manipulating your data as you are writing queries for it.
Let’s look at some examples. If you have data about students’ study habits and grades, dplyr can answer questions like:
- Who had the highest final grade?
- How much, on average, did “A” students study?
- Out of all the students who got a “B” on the midterm, how many of them turned it around to earn an “A” in the course?
In this assignment, you’ll learn how to answer these questions and many more using dplyr.
In this assignment, you’ll learn how to answer these questions and more using dplyr. What makes dplyr (and SQL) remarkable is that just 7 core functions allow you to answer nearly any question about your data.
Select, filter, mutate
You’re now ready to do Koan 4, where you’ll practice using filter()
, select()
, and mutate()
.
Summarize and group by
Go ahead and complete Koan 5 on summarize()
and group_by()
.
Arrange and slice
Finally, finish Koan 6 on
arrange()
and slice()
.