library(tidyverse)
library(MASS)
attach(Boston)
set.seed(1234)
<- as_tibble(Boston) %>%
boston mutate(train = sample(c(0, 1), prob = c(1/3, 2/3), replace = T, size = 506))
5.3 Bagging
Load a new data set Boston
containing housing values in 506 suburbs of Boston. Our dependent variable to predict will be medv
: median value of owner-occupied homes (measured in thousands). Here are the independent variables:
crim
: per capita crime rate by townzn
: proportion of residential land zoned for lots over 25,000 square feetindus
: proportion of non-retail business acres per townchas
: Charles River dummy variable (1 if the tract bounds the river; 0 if not)nox
: nitrogen oxides concentration (in parts per 10 million)rm
: average number of rooms per dwellingage
: proportion of owner-occupied units built prior to 1940dis
: weighted mean of distances to five Boston employment centresrad
: index of accessibility to radial highwaystax
: full-value property tax rate per $10,000ptratio
: pupil-teacher ratio by townlstat
: lower status of the population (percent)
- Use
tree
to create a decision tree for the training set.
library(tree)
# train <- ____
# boston.tree <- tree(medv ~ ., data = train)
#
# boston.tree
# plot(boston.tree)
# text(boston.tree)
- Find the mean squared error for the tree using the
test
data.
# test <- ___
# test %>%
# mutate(prediction = predict(___, newdata = test)) %>%
# summarize(MSE = ___)
- Observe the high variance of
tree
:
Run this code a couple of times to see how much the tree changes based on the training data.
# boston.tree <- train %>%
# slice_sample(n = 150) %>%
# tree(medv ~ ., data = .)
#
# boston.tree
# plot(boston.tree)
# text(boston.tree, pretty = 0)
# test %>%
# mutate(prediction = predict(___, newdata = test)) %>%
# summarize(MSE = ___)
Decision trees are known to have a very high variance, meaning that different training sets can produce significantly different decision trees. To address this issue, bagging (bootstrap aggregating) is used to reduce variance by averaging the predictions of multiple decision trees. The process involves taking many training sets from the population, building a separate prediction model for each set, and averaging the resulting predictions. However, since we generally do not have access to multiple training sets, bootstrapping is employed instead. This involves taking repeated samples from the single available training set. By constructing B regression trees using B bootstrapped training sets and averaging their predictions, the variance is reduced. These trees are grown deep and not pruned, resulting in each tree having high variance but low bias. Averaging the B trees helps to mitigate the variance, leading to a lower test error rate compared to using a single tree. Typically, B might be set to a large number like 100. While bagging improves prediction accuracy, it comes at the expense of interpretability, making it difficult to interpret the resulting model. Nonetheless, the importance of predictors can be assessed by recording the total amount by which the residual sum of squares (RSS) is decreased due to splits over a given predictor, averaged over all B trees. A large value in this context indicates an important predictor.
Bagging with randomForest
install.packages("randomForest")
# library(randomForest)
# set.seed(1234)
# bag.boston <- randomForest(medv ~ ., data = train, mtry = 12, importance = T)
#
# bag.boston
#
# importance(bag.boston)
mtry = 12
means all 12 predictors should be considered for each split in the tree.
- Find the mean squared error for the bagging approach using the
test
data.
# test %>%
# mutate(prediction = predict(___, newdata = test)) %>%
# summarize(MSE = ___)