Extra Credit Project: DDC Model on Taxi Drivers

This extra credit project can mean up to +5 added to your final grade. Note: this is not a group project. If two submissions are too similar, you will both receive a zero.

Option A: follow the outline below with the taxi driver problem.
Option B: follow the outline below, but with any DDC model idea you have. Make sure you can get access to relevant data describing choices experts make facing some measure of the state, and that decision is made over time to maximize long-run payoffs or to minimize long-run costs.

Overview

In this project, you will build and estimate a simple dynamic discrete choice (DDC) model using New York City taxi trip data.

The idea:

After completing a trip, a taxi driver must decide where to go to look for the next passenger. Different neighborhoods offer different expected earnings and different future opportunities.
Drivers are assumed to be forward-looking. When deciding where to go next, they consider both:
- the expected fare from the next pickup location
- the future value of ending up in different parts of the city afterward.
Your goal is to estimate a structural model describing these decisions.

Step 1: Get NYC Taxi Trip Data

You will need taxi trip data that includes at least:

a taxi identifier (driver or cab ID)
pickup location
dropoff location
pickup timestamp
dropoff timestamp
trip fare

The data must allow you to observe consecutive trips by the same taxi.

Step 2: Prepare the Data

Restricting the Sample

To study driver decisions in a consistent environment, restrict the data to a specific time window. For example: “weekday trips between 7:00AM and 11:00AM”.

This avoids mixing together very different market conditions such as rush hour, late night, and weekends.

Construct Consecutive Trips

Sort trips by taxi identifier and time. For each trip, construct a new variable: next_pickup_location. This is the pickup location of the next trip by the same taxi.

You’ll also want to drop observations where:

the taxi has no subsequent trip
the gap between trips is very long (for example more than 1 hour), which likely indicates the end of a shift rather than a repositioning decision.

Define the 15 taxi zones

Replace the detailed taxi locations with the following 15 zones:

Upper Manhattan (Washington Heights, Inwood, Hamilton Heights)
Harlem / Morningside Heights
Upper East Side
Upper West Side
Midtown East (Grand Central, Turtle Bay)
Midtown West / Times Square
Chelsea / Flatiron / Gramercy
Lower Manhattan (SoHo, Tribeca, Financial District, Chinatown)
North Brooklyn (Williamsburg, Greenpoint)
Downtown Brooklyn / DUMBO
South & Central Brooklyn
Long Island City / Astoria
Other Queens
Bronx
Airports (JFK and LaGuardia combined)

After mapping locations into these zones, drop any observations where the taxi leaves this 15-zone system.

Data set structure

After these steps, your data should look like this:

dropoff_location	next_pickup_location	next_dropoff_location
Upper Manhattan	Upper East Side	Upper East Side
Midtown West	Lower Manhattan	Bronx

The state is the dropoff_location.
The action is the next_pickup_location.
Get transition probabilities from next_dropoff_location.

Step 3: Estimate Non-Structural Inputs

Before estimating the structural model, compute two objects directly from the data:

Average fare by pickup zone
Markov transition probabilities

Markov transition probabilities

Estimate the probability that a trip beginning in zone \(a\) ends in zone \(z'\):

Then construct a 15x15 transition matrix where:

rows = pickup zones
columns = dropoff zones
entries = transition probabilities

Each row must sum exactly to 1.

This matrix represents the part of the environment that the driver does not control, but might be able to predict or have an intuitive understanding of from experience.

Step 4: Estimate the Structural Model

We assume drivers maximize expected discounted utility. Let the discount factor be 0.99.

Payoff function

Assume the driver’s per-period utility from choosing zone \(a\) is:

\[u(a) = \theta_1 \cdot \text{Fare}(a) + \theta_2 \cdot 1\{a = \text{Airport}\}\] Where:

\(\text{Fare}(a)\) is the average fare in zone \(a\)
\(1\{a=\text{Airports}\}\) is an indicator for the airport zone
\(\theta_1\) and \(\theta_2\) are parameters to estimate. They represent how strongly a driver prefers zones with higher expected fares, and the extra attractiveness of the airport zone.

Nested Fixed Point Estimation

Estimate the parameters \(\theta_1\) and \(\theta_2\) using the nested fixed point algorithm:

Guess values of \(\theta_1\) and \(\theta_2\)
Solve the Bellman equation for the value function
Compute model-implied choice probabilities
Evaluate the log likelihood of the observed actions
Update parameters to maximize the log likelihood

Step 5: Your Final Report

Your project report should include the following sections.

1. Data

Explain:

which taxi dataset you used
how you restricted the sample
how you defined consecutive trips
how you mapped locations into the 15 zones

2. Descriptive Statistics

Provide tables and plots describing:

number of observations by zone
average fare by zone
transition frequencies across zones

3. Model

Clearly define:

the state space
the choice set
the transition matrix
the payoff function
the estimation strategy

4. Results

Report:

the estimated parameters
the interpretation of those parameters
the model-implied choice probabilities

5. Discussion

Discuss:

whether drivers appear to prefer zones with higher fares
how the airport parameter compares to other zones
what the model captures well
what the model leaves out

Discuss limitations such as:

interpreting the next pickup zone as a deliberate decision
ignoring repositioning costs
ignoring congestion and travel time

Upload your final project to Canvas before Friday, March 20 at 11:59pm.