library(tidyverse)
<- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/person.csv")
people <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/drivers_license.csv")
drivers_license <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/income.csv")
income <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/crime_scene_report.csv")
crime_scene_report <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/facebook_event_checkin.csv")
facebook_event_checkin <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_checkin.csv")
get_fit_now_checkin <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/get_fit_now_member.csv")
get_fit_now_member <- read_csv("https://raw.githubusercontent.com/cobriant/dplyrmurdermystery/master/data/interview.csv")
interview <- function(df, row, col) cat(str_wrap(df[row, col], width = 80)) print_all
1.3 Murder Mystery
In this assignment, you’ll practice a few more dplyr
functions and then use them to solve a murder mystery.
Joins
Oftentimes in data analytics, we have multiple data sets from multiple sources and we need to join them together in certain ways to get the insights we’re after. This section introduces the concept of joins by studying bind_rows()
, bind_cols()
, and left_join()
, but there are others included in dplyr: inner_join()
, right_join()
, full_join()
, and more. Studying left_join()
gives you all the insight you need to be able to use any of these joins successfully if you’re interested in doing so later on.
bind_rows
and bind_cols
Sometimes you have two data sets and you just need to stack them to combine them into a single data set. In that case, you can use bind_rows
or bind_cols
. bind_rows
combines two data sets by stacking one on top of the other (think of each data set as being a row, and you bind the two rows). bind_cols
combines two data sets by putting them side by side (think of each data set as being a column, and you bind the two columns).
left_join
left_join
does something a little different: it takes two tibbles and makes a wider tibble by cross referencing the data and filling out the variables from the second data set with the variables from the first data set according to a key.
Go ahead and complete Koan 7 on left_join
before going any further.
Murder Mystery
In this part of the assignment, you’ll practice using dplyr
functions (especially left_join()
and str_detect()
) to solve a murder mystery. At the end of this document, you’ll find a google survey that you can use to check your final answer. Make sure you don’t skip it!
Double check that you have an internet connection, then run this to get started:
A crime has taken place and the detective needs your help! The detective has given you special access to police data to help them crack the case. All 8 datasets should be in your global environment now. Take a look at each of them.
This diagram will be useful: it shows you how variables in the tables can be linked to connect information from different tables. It’s called a database schema:
The schema is all accurate EXCEPT I had to change the table person
to be called people
, because person
is a reserved name in R.
First, we’ll get more comfortable with the data: How many people are in this database?
%>%
people nrow()
[1] 10011
- How many crime reports are there?
# __ %>%
# nrow()
Did any crimes happen on Christmas of 2017?
%>%
crime_scene_report filter(date == "2017-12-25")
# A tibble: 2 × 4
date type
<date> <chr>
1 2017-12-25 bribery
2 2017-12-25 bribery
description
<chr>
1 "a-piece all round.\n"
2 "Alice opened the door and found that it led into a small passage, not\n"
city
<chr>
1 St. Paul
2 Chattanooga
- How long do the crime scene reports go back? (Hint: use
arrange()
)
# __
- The detective asks you to start by reading the crime scene report. They tell you it was a murder that was filed on January 15th, 2018 in a place called “dplyr City”.
# __
Note: are you annoyed by not being able to easiy read these long character strings because it truncates them? I made a function for you called print_all()
to help. It takes a tibble as the first argument, the (integer) row you want to print as the second argument, and the (integer) column you want to print as the third argument. For example, this will print the long string in row 10, column 3 of crime_scene_report
:
%>%
crime_scene_report print_all(10, 3)
The Gjallarhorn shoulder-mounted rocket system was forged from the armor of
Guardians who fell at the Twilight Gap. Gifted to the survivors of that terrible
battle, the Gjallarhorn is seen as a symbol of honor and survival.
- After reading the correct crime scene report in question 3, you should have an idea about how to find the witnesses. Write a query that identifies the first witness.
# people %>%
# __
- Write a query that identifies the second witness.
# people %>%
# __
- Write two queries that show the interview transcripts for our two subjects.
# interview %>%
# __
# interview %>%
# __
- Go get ’em! Now you know enough to solve the mystery! Show the queries you wrote to solve it. Make sure to check your final answer by filling out this survey: (DON’T SKIP THE SURVEY EVEN IF YOU’RE SURE ABOUT YOUR ANSWER!!): https://forms.gle/K7rZ5VykcA7Bb1Qf7
# __