2.3 Murder Mystery

Run this code to get started:

library(tidyverse)
people <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/person.csv")
drivers_license <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/drivers_license.csv")
income <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/income.csv")
crime_scene_report <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/crime_scene_report.csv")
facebook_event_checkin <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/facebook_event_checkin.csv")
get_fit_now_check_in <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_checkin.csv")
get_fit_now_member <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_member.csv")
interview <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/interview.csv")
print_all <- function(df, row, col) cat(str_wrap(df[row, col], width = 80))

🕵️ A Crime Has Been Committed – Can You Help Solve It?

A crime has taken place, and the detective needs your help! You’ve been given special access to police data to crack the case. Let’s dive in and see if you can uncover the truth!

To help you navigate the data, here’s a database schema – a diagram that shows how the tables are connected. It’s like a map to guide you through the investigation:

👉 Tip: Click this link to open the schema in a new tab for easy reference as you work through the mystery.

Let’s start by exploring the data. Highlight the name of each data set below and hit Ctrl+Enter (Windows) or Cmd+Return (Mac) to see what’s inside.

people

income

crime_scene_report

drivers_license

Function toolbox:

  1. pipe %>%: takes the data from the left and passes it into the function on the right. Think of it as saying “and then”.
  2. filter(): Use this to find rows that match specific conditions. For example:
  • income %>% filter(annual_income > 100000) finds people earning over $100K.
  • people %>% filter(name == "Christoper Peteuil") finds the person with that exact name.
  1. str_detect(): Use this inside filter() for finding partial matches on character strings. For example: people %>% filter(str_detect(name, "Amy")) finds everyone named Amy.
  2. arrange(): Sorts data from lowest to highest. For example:
  • income %>% arrange(annual_income) sorts people by income (lowest first).
  • Add desc() to reverse the order: income %>% arrange(desc(annual_income)) sorts from highest to lowest.
  1. Chaining Steps: Combine multiple steps using pipes. For example: crime_scene_report %>% filter(city == "Duluth") %>% arrange(desc(date)) This means: “Find crimes in Duluth, and then sort them by date (most recent first).”
# Review Examples

# filter: find observations that fit any logical condition
income %>% 
  filter(annual_income > 100000)

people %>% 
  filter(name == "Christoper Peteuil")

# filter + str_detect: for partial matches
people %>% 
  filter(str_detect(name, "Amy"))

# arrange: sorts the data set from lowest to highest by the variable of your choice
income %>% 
  arrange(annual_income)

# arrange + desc: sort highest to lowest
income %>% 
  arrange(desc(annual_income))

# chain together multiple steps
crime_scene_report %>% 
  filter(city == "Duluth") %>% 
  arrange(desc(date))

👉 Tip: Click this link to open the examples in a new tab for easy reference.

🔍 Let’s Solve the Mystery!

QUESTION 1: Start with the Crime Scene Report

The detective tells you the crime was a murder that happened on January 15th, 2018, in “dplyr City”. I’ve started a query for you, but it’s not quite right. Can you fix it to find the correct report?

crime_scene_report %>%
  filter(date == "2025-01-30", city == "Fayetteville")
# A tibble: 0 × 4
# ℹ 4 variables: date <date>, type <chr>, description <chr>, city <chr>

Note: are you annoyed by not being able to easiy read these long character strings because it truncates them? I made a function for you called print_all() to help. It takes a tibble as the first argument, the (integer) row you want to print as the second argument, and the (integer) column you want to print as the third argument. For example, this will print the long string in row 10, column 3 of crime_scene_report:

crime_scene_report %>%
  print_all(10, 3)
The Gjallarhorn shoulder-mounted rocket system was forged from the armor of
Guardians who fell at the Twilight Gap. Gifted to the survivors of that terrible
battle, the Gjallarhorn is seen as a symbol of honor and survival.

QUESTION 2: Find the First Witness

Once you’ve read the crime scene report, you’ll have clues to identify the witnesses. Write a query to find the first witness.

# people %>%
#   filter() %>%
#   arrange()

QUESTION 3: Find the Second Witness

Now, write a query to find the second witness.

# people %>%
#   filter()

QUESTION 4: Review the Witness Interviews

Next, let’s see what the witnesses had to say. Write two queries to pull the interview transcripts for both witnesses.

# interview %>%
#   filter()

# interview %>%
#   filter()

QUESTION 5: Solve the Mystery!

You’ve got all the tools and clues you need. Now it’s time to crack the case! Write your final queries to solve the mystery.

🎯 Tip: Use left_join() to Connect the Dots

To create a clean and elegant solution here, try using left_join(). This function lets you combine tables based on shared keys. For example, to find driver’s license and income information for everyone named Amy, you could do this:

people %>%
  filter(str_detect(name, "Amy")) %>%
  left_join(drivers_license, join_by(license_id == id)) %>%
  left_join(income, join_by(ssn))
# A tibble: 2 × 15
     id name   license_id address_number address_street_name    ssn   age height
  <dbl> <chr>       <dbl>          <dbl> <chr>                <dbl> <dbl>  <dbl>
1 38995 Amy C…     537719           3664 Bad Munstereifel B… 1.33e8    54     60
2 57354 Amy S…     238318            208 Bush Lake Dr        3.98e8    45     57
# ℹ 7 more variables: eye_color <chr>, hair_color <chr>, gender <chr>,
#   plate_number <chr>, car_make <chr>, car_model <chr>, annual_income <dbl>

Check your final answer by filling out this survey (DO NOT SKIP!!!): https://forms.gle/K7rZ5VykcA7Bb1Qf7