library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/person.csv")
people <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/drivers_license.csv")
drivers_license <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/income.csv")
income <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/crime_scene_report.csv")
crime_scene_report <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/facebook_event_checkin.csv")
facebook_event_checkin <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_checkin.csv")
get_fit_now_check_in <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_member.csv")
get_fit_now_member <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/interview.csv")
interview <- function(df, row, col) cat(str_wrap(df[row, col], width = 80)) print_all
2.3 Murder Mystery
Run this code to get started:
🕵️ A Crime Has Been Committed – Can You Help Solve It?
A crime has taken place, and the detective needs your help! You’ve been given special access to police data to crack the case. Let’s dive in and see if you can uncover the truth!
To help you navigate the data, here’s a database schema – a diagram that shows how the tables are connected. It’s like a map to guide you through the investigation:
👉 Tip: Click this link to open the schema in a new tab for easy reference as you work through the mystery.
Let’s start by exploring the data. Highlight the name of each data set below and hit Ctrl+Enter (Windows) or Cmd+Return (Mac) to see what’s inside.
people
income
crime_scene_report
drivers_license
Function toolbox:
- pipe
%>%
: takes the data from the left and passes it into the function on the right. Think of it as saying “and then”. filter()
: Use this to find rows that match specific conditions. For example:
income %>% filter(annual_income > 100000)
finds people earning over $100K.people %>% filter(name == "Christoper Peteuil")
finds the person with that exact name.
str_detect()
: Use this insidefilter()
for finding partial matches on character strings. For example:people %>% filter(str_detect(name, "Amy"))
finds everyone named Amy.arrange()
: Sorts data from lowest to highest. For example:
income %>% arrange(annual_income)
sorts people by income (lowest first).- Add
desc()
to reverse the order:income %>% arrange(desc(annual_income))
sorts from highest to lowest.
- Chaining Steps: Combine multiple steps using pipes. For example:
crime_scene_report %>% filter(city == "Duluth") %>% arrange(desc(date))
This means: “Find crimes in Duluth, and then sort them by date (most recent first).”
# Review Examples
# filter: find observations that fit any logical condition
%>%
income filter(annual_income > 100000)
%>%
people filter(name == "Christoper Peteuil")
# filter + str_detect: for partial matches
%>%
people filter(str_detect(name, "Amy"))
# arrange: sorts the data set from lowest to highest by the variable of your choice
%>%
income arrange(annual_income)
# arrange + desc: sort highest to lowest
%>%
income arrange(desc(annual_income))
# chain together multiple steps
%>%
crime_scene_report filter(city == "Duluth") %>%
arrange(desc(date))
👉 Tip: Click this link to open the examples in a new tab for easy reference.
🔍 Let’s Solve the Mystery!
QUESTION 1: Start with the Crime Scene Report
The detective tells you the crime was a murder that happened on January 15th, 2018, in “dplyr City”. I’ve started a query for you, but it’s not quite right. Can you fix it to find the correct report?
%>%
crime_scene_report filter(date == "2025-01-30", city == "Fayetteville")
# A tibble: 0 × 4
# ℹ 4 variables: date <date>, type <chr>, description <chr>, city <chr>
Note: are you annoyed by not being able to easiy read these long character strings because it truncates them? I made a function for you called print_all()
to help. It takes a tibble as the first argument, the (integer) row you want to print as the second argument, and the (integer) column you want to print as the third argument. For example, this will print the long string in row 10, column 3 of crime_scene_report
:
%>%
crime_scene_report print_all(10, 3)
The Gjallarhorn shoulder-mounted rocket system was forged from the armor of
Guardians who fell at the Twilight Gap. Gifted to the survivors of that terrible
battle, the Gjallarhorn is seen as a symbol of honor and survival.
QUESTION 2: Find the First Witness
Once you’ve read the crime scene report, you’ll have clues to identify the witnesses. Write a query to find the first witness.
# people %>%
# filter() %>%
# arrange()
QUESTION 3: Find the Second Witness
Now, write a query to find the second witness.
# people %>%
# filter()
QUESTION 4: Review the Witness Interviews
Next, let’s see what the witnesses had to say. Write two queries to pull the interview transcripts for both witnesses.
# interview %>%
# filter()
# interview %>%
# filter()
QUESTION 5: Solve the Mystery!
You’ve got all the tools and clues you need. Now it’s time to crack the case! Write your final queries to solve the mystery.
🎯 Tip: Use left_join() to Connect the Dots
To create a clean and elegant solution here, try using left_join()
. This function lets you combine tables based on shared keys. For example, to find driver’s license and income information for everyone named Amy, you could do this:
%>%
people filter(str_detect(name, "Amy")) %>%
left_join(drivers_license, join_by(license_id == id)) %>%
left_join(income, join_by(ssn))
# A tibble: 2 × 15
id name license_id address_number address_street_name ssn age height
<dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 38995 Amy C… 537719 3664 Bad Munstereifel B… 1.33e8 54 60
2 57354 Amy S… 238318 208 Bush Lake Dr 3.98e8 45 57
# ℹ 7 more variables: eye_color <chr>, hair_color <chr>, gender <chr>,
# plate_number <chr>, car_make <chr>, car_model <chr>, annual_income <dbl>
Check your final answer by filling out this survey (DO NOT SKIP!!!): https://forms.gle/K7rZ5VykcA7Bb1Qf7