In this assignment, you’ll practice visualizing different aspects of a data set using the tidyverse package ggplot2.
Recipes
First, load a dataset students that includes 374 students and their sex, the number of hours they studied for a class on average, their grade after the first semester, and their final grade:
# A tibble: 374 × 4
sex study_time grade1 final_grade
<chr> <chr> <dbl> <dbl>
1 female less than 2H 66.3 69.4
2 female less than 2H 66.1 49.1
3 female 5 - 10H 80.2 76.7
4 female more than 10H 87.4 83.4
5 male 2 - 5H 84.1 80.1
6 female 2 - 5H 63.3 62.4
7 female 2 - 5H 71.9 73.2
8 female 2 - 5H 65.7 50.8
9 male 2 - 5H 77.2 79.0
10 male less than 2H 64.8 67.3
# ℹ 364 more rows
Note that sex and study_time are discrete/categorical variables, and grade1 and final_grade are continuous.
Bar plots and Histograms
Use a bar plot or a histogram when you want to visualize the distribution of a single variable. If the variable is catagorical/discrete, use a bar plot. If the variable is continuous, use a histogram.
Boxplots
Use a boxplot to visualize the relationship between one categorical/discrete variable and another continuous variable.
Scatterplots
Use a scatterplot to visualize the relationship between two continuous variables.
Faceting
With two discrete variables, you can use a bar plot and facet by or fill with the second variable.
Selecting the right ggplot
Summary
Aesthetic mappings are wrapped in aes() and they map variables in your tibble to aesthetics in your plot, like which variable gets drawn on the x-axis, which gets drawn on the y-axis, and which gets represented by color or fill.
Geoms are added to the plot + with layers.
Now complete Koans 8, 9, and 10 to practice using ggplot2 functions.