Rule-Based Robot

Prereqs

In this classwork, you will write imperative R code, meaning your code will update variables step by step over time using functions, conditionals, loops, and random draws.

Below are quick warmups. Each one is small, but together they cover everything you need for the GridWorld simulation.

Question 1: functions, return, and print

A function takes inputs, runs steps, and returns an output. print() displays something while the function is running as a side effect.

Write a function net_payoff() that takes two numbers, benefit and cost, prints both values, and returns benefit - cost.

# net_payoff <- function(___, ___) {
#   print(paste0("benefit: ", ___, " and cost: ", ___))
#   return(___)
# }
# 
# net_payoff(benefit = 5, cost = 2)

Question 2: sample()

sample() lets you simulate randomness. You can sample from values, and you can attach probabilities. Use sample to draw between T (true with probability 0.9) and F (false with probability 0.1) 10 times. Don’t put T and F in quotes: R will understand them as logicals. Run the code several times and you’ll get different results.

?qelp::sample

# sample(___)

Question 3: If/else Ladders

An if else ladder is how you encode rules: “if this, do that; else if that, do this; otherwise do something else.”

Write a function sign_label(x) that returns “positive” if x > 0, “zero” if x == 0, and “negative” if x < 0.

# sign_label <- function(x) {
#   if (___) {
#     return(___)
#   } else if (__) {
#     return(___)
#   } else {
#     return(___)
#   }
# }
# 
# sign_label(-5)

Question 4: for and while loops

A for loop runs a fixed number of times. A while loop runs until a condition becomes false.

Use a for loop to write a function coin_flips(n) that flips a fair coin n times using sample(c("H","T"), size = 1), printing “H” or “T”.

# coin_flips <- function(n) {
#   for(i in 1:___) {
#     print(sample(___, size = 1))
#   }
# }
# 
# coin_flips(5)

Use a while loop to write a function roll_until_6 that rolls a die until you get a 6, printing each outcome along the way. Initialize roll as something other than 6 at first to make sure the while loop begins.

# roll_until_six <- function() {
#   roll <- ___
#   while(roll != ___) {
#     roll <- sample(___)
#     print(___)
#   }
# }
# 
# roll_until_six()

Question 5

Often you will want to run a simulation many times and store the outcomes. Add to the function you just wrote roll_until_six: along with printing roll outcomes, return the number of dice rolls it took to get there. You’ll need to initialize a variable x for counting loops, increment x by 1 inside the loop, and then return x at the end.

# roll_until_six <- function() {
#   roll <- ___
#   x <- ___
#   while(roll != ___) {
#     roll <- sample(___)
#     x <- x + 1
#     print(___)
#   }
#   return(___)
# }
# 
# roll_until_six()

Question 6: Indexing a vector with []

In GridWorld, payoffs are stored in a vector, and the position (1 to 9) chooses which payoff you get. You can get a subset of a vector using square brackets.

Use square brackets to find the fifth element of x.

x <- c(0, 0, 0, 1, 2, 1, 0, 0, 0)
# x[___]

Rules on a Simple GridWorld

For this classwork, consider this GridWorld:

0 0 0
1 2 1
0 0 0

Rule-Based Robot

We’ll use the same move function you created in classwork 8:

move <- function(cell, action) {
  if (action == "stay") {
    return(cell)
  } else if (action == "south") {
    if (cell <= 6) {
      return(cell + 3)
    } else {
      return(cell)
    }
  } else if (action == "north") {
    if (cell >= 4) {
      return(cell - 3)
    } else {
      return(cell)
    }
  } else if (action == "east") {
    if (cell %in% c(1, 2, 4, 5, 7, 8)) {
      return(cell + 1)
    } else {
      return(cell)
    }
  } else if (action == "west") {
    if (cell %in% c(2, 3, 5, 6, 8, 9)) {
      return(cell - 1)
    } else {
      return(cell)
    }
  }
}

Question 7

Fill in the blanks for the policy function on this GridWorld.

Rules:

  • If in (1, 2, 3), move ___.
  • If in (7, 8, 9), move ___.
  • If in (4), move ___.
  • If in (6), move ___.
  • If in (5), ___.

Question 8

Write a function rule_simple(position) that takes a position on the GridWorld and returns “north”, “south”, etc. based on your answers to the previous question.

# rule_simple <- function(position) {
#   if (___) {
#     return("south")
#   } else if (___) {
#     return("north")
#   } else if (___) {
#     return("stay")
#   } else if (___) {
#     return("east")
#   } else if (___) {
#     return("west")
#   }
# }
# 
# library(testthat)
# test_that("rule_simple matches the intended rules", {
#   expect_equal(rule_simple(1), "south")
#   expect_equal(rule_simple(2), "south")
#   expect_equal(rule_simple(3), "south")
#   
#   expect_equal(rule_simple(7), "north")
#   expect_equal(rule_simple(8), "north")
#   expect_equal(rule_simple(9), "north")
#   
#   expect_equal(rule_simple(4), "east")
#   expect_equal(rule_simple(6), "west")
#   expect_equal(rule_simple(5), "stay")
# })

Question 9

Write a function payoffs_simple(position) that takes a position on the GridWorld and uses a vector with square brackets to return the payoff (0, 1, or 2) for being in that position.

# payoffs_simple <- function(position) {
#   ___
# }

# test_that("payoffs_simple matches the GridWorld", {
#   expect_equal(payoffs_simple(c(1, 2, 3)), c(0, 0, 0))
#   expect_equal(payoffs_simple(c(4, 5, 6)), c(1, 2, 1))
#   expect_equal(payoffs_simple(c(7, 8, 9)), c(0, 0, 0))
# })

Simulation

Question 10

Write a function simulation(rule, payoffs) that takes a rule function and payoff function, simulates an agent with a random starting position on the grid (1 to 9) acting in accordance to the rule function, and returns the sum of the payoffs they collect. For the discount rate, let there be a 10% chance the game ends at the end of each period and a 90% chance the game continues.

# simulation <- function(rule, payoffs) {
#   position <- sample(1:9, size = 1)
#   payoff_sum <- 0
#   game_continues <- T
#   
#   while(game_continues) {
#     payoff_sum <- payoffs(position) + ___
#     print(paste0("position: ", position, " and payoff: ", payoffs(position)))
#     game_continues <- sample(c(T, F), size = 1, prob = ___)
#     if (!game_continues) {
#       print("Game ends.")
#     } else {
#         position <- move(position, rule(position))
#       }
#   }
#   return(payoff_sum)
# }
# 
# simulation(rule_simple, payoffs_simple)

Question 11

Use a for loop to run the simulation 1000 times, saving the sum of payoffs in each iteration to a vector. What is the average payoff for 1000 iterations? It should approximate the average value of each cell from Classwork 8’s GridWorld work.

# payoff_list <- c()
# for (i in 1:1000) {
#   payoff_list <- c(payoff_list, ___)
# }
# mean(payoff_list)

Homework: position 1 is either a bag of coins or a monster

Now consider the same GridWorld, but in any game, position 1 is randomly +5 (bag of coins) or -5 (monster). The robot can see which one it is, and it stays at that value for the duration of the game.

Question 1: draw_shock

Write a function draw_shock() that takes no inputs and returns either 5 or -5.

# draw_shock <- function() {
#   ___
# }

payoffs_shock

Look back at your function payoffs_simple. This is a new version payoffs_shock(position, shock) that outputs the payoff for the position, given the shock drawn by draw_shock, like 5 or -5.

payoffs_shock <- function(position, shock) {
    c(0 + shock, 0, 0, .5, 1, .5, 0, 0, 0)[position]
}

Question 2: rule_shock_aware(position, shock)

Write a new version of rule_simple that takes a position and a shock (5 or -5, whatever draw_shock()) returned, and outputs the direction the agent should go (“north”, “south”, etc).

# rule_shock_aware <- function(position, shock) {
#   ___
# }
# 
# test_that("rule_shock_aware is at least aware of terminal states", {
#   expect_equal(rule_shock_aware(1, 5), "stay")
#   expect_equal(rule_shock_aware(5, -5), "stay")
# })

Question 3: Rewrite simulation(rule, payoffs) to work with shocks

You shouldn’t have to make many changes: just start the game by drawing a random shock, and then let rule() and payoffs() take a shock.

# simulation <- function(rule, payoffs) {
#   ___
# }
# 
# simulation(rule_shock_aware, payoffs_shock)

Question 4: Run the simulation with shocks 1000 times and report the average payoff.

Imagine what happens if there are 2, 3, or 4 cells that are subject to shocks? What if all the cells are subject to shocks? What if all the cells are subject to shocks and shocks change in each time step? Hard-coding rules for an agent to follow quickly becomes impossible to do. That’s where reinforcement learning comes in. Stay tuned!

Download this assignment

Here’s a link to download this assignment. No autograder for this one: just compile to html (File > Render Document), and upload the html file to Canvas (one copy per group for the classwork; one copy per individual for the homework).