Sequential Games

Course from a Bird’s Eye View

In the first part of this unit, we learned to program declaratively in R to work with data using tibbles, pipes, dplyr, and ggplot2. Next, we covered the probability and statistics behind fitting a linear model with OLS and a logistic regression (logit) using the method of maximum likelihood.

To wrap up Unit 1 this week, we’ll learn the fundamentals of sequential games and then I’ll introduce the main modeling environment we’ll use for the rest of the course: the GridWorld. The goal of the course is to use R to estimate logit models in which an agent makes sequential decisions over time under uncertainty. We’ll study these problems from both an Engineering perspective (Unit 2: Reinforcement Learning) and an Economics perspective (Unit 3: Dynamic Discrete Choice), and then bring everything together in Unit 4 for a final project.

Game Theory Overview

Game Theory is the study of strategic interactions between rational decision-makers. The ingredients:

Players: the decision-makers. There may be just one person playing against “nature” (shocks that are random according to some probability distribution). Or there might be two or more players. The players might represent individuals, firms, governments, etc.
Actions: what a player can choose to do at any moment.
Strategy: a complete plan of action (what the player would do in any situation).
Payoffs: can represent profit, happiness, political benefit, etc. They are numbers for each player that ranks how much the player likes the outcome.
Information: what do players know, and when do they know it?
Timing: games can be simultaneous-move games, where players make choices at the same time or without observing others, or they can be sequential-move games, where players move in a sequential order, observing the actions others have made previously. We’ll focus on sequential-move games for the purpose of this class.

We’ll “solve” sequential games by finding each player’s best responses: their best choice given what the other players might do. Then we’ll analyze best responses to find subgame perfect equilibria (SPE): equilibria that remain optimal in every subgame of a sequential game, meaning players’ strategies constitute best responses not only for the entire game but also in every smaller game that could be reached during play. You’ll see how this works in this classwork.

The key principle for finding SPEs is that when analyzing sequential decisions, you should work backward using backward induction. The power of backward induction comes from systematically analyzing how current decisions interact with rational future responses. This provides a rigorous framework for strategic planning in sequential situations.

Sequential Games

The Centipede Game

Suppose two players alternate choosing to “take” or “pass”, starting with player A.

If they “take”, the game ends and payoffs are distributed.
If they “pass”, the pot grows bigger and the other player gets a turn.

Payoffs grow as follows:

Stage 1: (2, 0) if take (player 1, player 2); continue if pass
Stage 2: (0, 4) if take; continue if pass
Stage 3: (6, 2) if take; continue if pass
Stage 4: (4, 8) if take; end

The game tree looks like this:

Start with analyzing best responses for each subgame using backward induction. A subgame starts at a single node (dot) and includes all subsequent successors. The first subgame:

In this subgame, player B makes a decision, but only has one available: to take and receive the payoff of 8. Player A gets a payoff of 4. So if this final node is reached, we can expect player B to “take”.

Now consider the second subgame:

Question 1

If player A “takes” in subgame 2, what will their payoff be? If player A “passes”, what will their payoff be? Based on these responses, what is player A’s best response for this subgame? Draw a strike through the path player A will not choose.

Now consider subgame 3:

Question 2

If player B “takes” in subgame 3, what will their payoff be, considering player A’s best response? If player B “passes”, what will their payoff be, considering player A’s best response? Based on these responses, what is player B’s best response for this subgame? Draw a strike through the path player B will not choose.

The final subgame is always the entire game:

Question 3

If player A “takes” in subgame 4, what will their payoff be, considering player B’s best response? If player A “passes”, what will their payoff be, considering player B’s best response? Based on these responses, what is player A’s best response for this subgame? Draw a strike through the path player A will not choose.

Question 4

Final analysis: what is the subgame perfect equilibrium (each player’s strategy, that is, a complete plan of action in any situation)? What payoffs does this indicate each player will receive if both are behaving rationally?

Question 5

Suppose instead the payoffs for taking in stage 3 are (X, Y) instead of (6, 2). Find the conditions on X and Y so that players will collaborate and grow the pot until stage 4.

Discount Rate

Which would you rather have: $50K today, or $50K in 20 years?

If you said $50K today, you’re revealing something important: a dollar now feels more valuable than a dollar later. That idea is the foundation of discounting.

Finance people justify discount rates in their models by saying: if you get $50K today, you can invest it immediately and earn interest. So even if the future payment is “the same” $50K, it’s actually less valuable because you’re giving up 20 years of growth.

In Economics, we justify discount rates by saying: people are impatient and want to smooth consumption over time. A dollar today helps you buy groceries, pay rent, or reduce stress right now. Waiting 20 years is costly, even if the dollar amount is identical.

In Engineering, discounting is sometimes explained as risk of interruption: at any stage, there’s some probability the “game ends” unexpectedly. For example, you might not be alive (or not in the same situation) to enjoy that $50K in 20 years, so future rewards get discounted simply because they are less likely to be received.

Question 6

Suppose I offer you $1 a year from now and $X right now. If money grows by 5% annually, X should be less than $1: $X \times (1.05) = 1$. Solve for X.

What you just calculated is called the present value, given the interest rate of 5%.

Question 7

What is the present value of $50K in 1 year, still assuming an annual interest rate of 5%?

Question 8

What is the present value of $50K in 20 years, still assuming an annual interest rate of 5%?

Now consider a revenue stream of $20K per year forever. What is the present value of that revenue stream? If money a year from now is worth only 90% to you compared to money today (discount rate of 10%), we have an infitine geometric series:

\[P = 20 + .9(20) + .9^2 (20) + .9^3 (20) + ...\]

Here’s a trick for solving for $P$: notice that $P - 20$ is the same as $.9 P$. Set them equal and solve:

\[\begin{align} P - 20 &= .9 P\\ .1 P &= 20\\ P &= 200 \end{align}\]

Question 9

What is the present value of a revenue stream of $100 per year forever, assuming an annual discount rate of 5%?

Question 10 Post-Graduation Decision Problem

For this problem, let all amounts be in thousands of dollars, so 100 means $100,000.

Maria is graduating college and discounts future income at a rate of 5% per year.

Option A: She has a job offer earning 75 per year as a Consultant. Compute the present value of this revenue stream.

Option B: She’s also been accepted into a graduate program in Economics. It starts out with a Master’s program, and then if she chooses, she can continue it to pursue her PhD.

With a Master’s, she estimates there’s a 50% chance she can become a Government Data Analyst (earning 80 per year), and a 50% chance she can become a Machine Learning Engineer (earning 140 per year). Calculate the present value of each of these streams and then divide by $1.05^2$: getting her Master’s will take 2 years to complete. Then calculate the expected present value of a Master’s given the probabilities she gets each of these jobs.
With a PhD, she estimates there’s a 30% chance she can become a Professor (earning 140 per year), and a 70% chance she can become a Government Economist (earning 100 per year). Calculate the present value of each of these streams and then divide by $1.05^5$: getting her PhD will take 5 years to complete. Then calculate the expected present value of a PhD given the probabilities she gets each of these jobs.

If she goes to graduate school, should she get her PhD? Should she even go to graduate school?

Autograder

Here’s a link to the autograder for this assignment.

Homework #7: Sequential Games and Present Values

Here’s a link to the homework due before next class.