#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#                   Intro to the Tidyverse by Colleen O'Briant
#                             Koan #20: accumulate()
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# In order to progress:
# 1. Read all instructions carefully.
# 2. When you come to an exercise, fill in the blank, un-comment the line
#    (Ctrl/Cmd Shift C), and execute the code in the console (Ctrl/Cmd Return).
#    If the piece of code spans multiple lines, highlight the whole chunk or
#    simply put your cursor at the end of the last line.
# 3. Save (Ctrl/Cmd S).
# 4. Test that your answers are correct (Ctrl/Cmd Shift T).

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# The last function we'll learn from the purrr family is accumulate().
# accumulate() works just like reduce() to sequentially apply a 2-argument
# function to `.x`, but the difference is that accumulate() outputs the
# intermediate results so that the output is the same length as `.x`, just like
# map(). The last element of the accumulate() output will be the same as the
# reduce() output.

# This function shows you the computations from the max() algorithm we discussed
# in the previous koan. Run this code to see the result:

accumulate(c(1, 3, 9, 2), ~ if_else(.x > .y, .x, .y))
## [1] 1 3 9 9
# The first element of the accumulate() output from above is a 1.
# `.f` hasn't been applied yet. In order to make the output the same length as
# the input, accumulate() just outputs the first element of `.x` as-is.

# The second element of the accumulate output is 3. That's because the maximum
# of 1 and 3 is 3.

# The third element of the accumulate output is 9. That's because the maximum of
# 3 and 9 is 9.

# The fourth element of the accumulate output is 9. That's because the maximum
# of 9 and 2 is 9.

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# Now consider the minimum function:

accumulate(1:10, ~ if_else(.x < .y, .x, .y))
##  [1] 1 1 1 1 1 1 1 1 1 1
# 1. Fill in the blanks so that this statement returns TRUEs: ------------------

#1@

# accumulate(c(__, 11, __, __), ~ if_else(.x < .y, .x, .y)) == c(7, 7, __, 2)

#@1

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# Consider the sum() function:

accumulate(1:10, `+`)
##  [1]  1  3  6 10 15 21 28 36 45 55
# 2. Fill in the blank so that this statement returns TRUEs: -------------------

#2@

# accumulate(c(__, __, __, __), ~ .x + .y) == c(3, 10, 0, 2)

#@2

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

In order to generate data that follows a random walk: \(y_t = y_{t-1} + u_t\), you can use an accumulated sum of \(u_t\):

accumulate(c(0, -1, 1, 2, -1), ~ .x + .y)
## [1]  0 -1  0  2  1
# This is what's happening to get the output from the above code:
c(
  0,
  0 + -1,
  0 + -1 + 1,
  0 + -1 + 1 + 2,
  0 + -1 + 1 + 2 + -1
)
## [1]  0 -1  0  2  1
# You can also generate u_t from the normal distribution:

accumulate(rnorm(n = 10), ~ .x + .y)
##  [1] -0.10974189  0.05083043 -0.21194498  0.08343908  1.98233410  2.46133820
##  [7]  2.19150990  2.38076189  1.72843275  2.23485936
# Let's visualize the random walk! Run the code below a couple of times to see
# that random walks aren't more likely to trend upward or downward. Time series
# like stocks or housing prices (whenever the data generating process includes a
# lot of speculation) are modeled well with random walks. That's because the
# best guess anyone can make about the value of the stock next period is its
# value this period.

tibble(
  t = 1:100,
  y = accumulate(rnorm(n = 100), ~ .x + .y)
) %>%
  ggplot(aes(x = t, y = y)) +
  geom_line()

# 3. Take the first difference of this random walk: ----------------------------

#3@

# tibble(
#   u = c(1, 0, 0, 1, -1, -2, 1, 0),
#   y = accumulate(u, ~ .x + .y)
# ) %>%
#   mutate(y_diff = __)

#@3

# Notice that the first difference will recover the vector u (without the first
# term). This happens because:

\(y_t = y_{t-1} + u_t\) \(y_t - y_{t-1} = u_t\)

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

You can also simulate other autocorrelated time series using accumulate(): Instead of a random walk: \(y_t = y_{t-1} + u_t\), Suppose we want to simulate: \(y_t = \beta_0 + \beta_1 y_{t-1} + u_t\). 4. Generate an autocorrelated time series: ———————————- \(y = .5 + .5*y_{t-1} + u_t\), u_t ~ iid N(0, \(\sigma^2\))

#4@

# accumulate(c(1, 0, __, -1), ~ .5 + .5*.x + .y) == c(1, 1, 3, __)

#@4

# Visualize the autocorrelated time series.

tibble(
  t = 1:100,
  y = accumulate(rnorm(n = 100), ~ .5 + (.5 * .x) + .y)
) %>%
  ggplot(aes(x = t, y = y)) +
  geom_line()

#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

# Wow! You've finished all 20 tidyverse koans, what a wonderful accomplishment.
# You've worked hard and learned so much.

# You may not feel like you've achieved tidyverse enlightenment yet, but as you
# keep practicing, you may be surprised to learn how powerful these concepts
# really are. And you may discover you are a much more powerful programmer than
# you ever thought you could be!