Extra Credit Project: DDC Model on Taxi Drivers

This extra credit project can mean up to +5 added to your final grade. Note: this is not a group project. If two submissions are too similar, you will both receive a zero.

Overview

In this project, you will build and estimate a simple dynamic discrete choice (DDC) model using New York City taxi trip data.

The idea:

  • After completing a trip, a taxi driver must decide where to go to look for the next passenger. Different neighborhoods offer different expected earnings and different future opportunities.
  • Drivers are assumed to be forward-looking. When deciding where to go next, they consider both:
    • the expected fare from the next pickup location
    • the future value of ending up in different parts of the city afterward.
  • Your goal is to estimate a structural model describing these decisions.

Step 1: Get NYC Taxi Trip Data

You will need taxi trip data that includes at least:

  • a taxi identifier (driver or cab ID)
  • pickup location
  • dropoff location
  • pickup timestamp
  • dropoff timestamp
  • trip fare

The data must allow you to observe consecutive trips by the same taxi.

Step 2: Prepare the Data

Restricting the Sample

To study driver decisions in a consistent environment, restrict the data to a specific time window. For example: “weekday trips between 7:00AM and 11:00AM”.

This avoids mixing together very different market conditions such as rush hour, late night, and weekends.

Construct Consecutive Trips

Sort trips by taxi identifier and time. For each trip, construct a new variable: next_pickup_location. This is the pickup location of the next trip by the same taxi.

You’ll also want to drop observations where:

  • the taxi has no subsequent trip
  • the gap between trips is very long (for example more than 1 hour), which likely indicates the end of a shift rather than a repositioning decision.

Define the 15 taxi zones

Replace the detailed taxi locations with the following 15 zones:

  1. Upper Manhattan (Washington Heights, Inwood, Hamilton Heights)
  2. Harlem / Morningside Heights
  3. Upper East Side
  4. Upper West Side
  5. Midtown East (Grand Central, Turtle Bay)
  6. Midtown West / Times Square
  7. Chelsea / Flatiron / Gramercy
  8. Lower Manhattan (SoHo, Tribeca, Financial District, Chinatown)
  9. North Brooklyn (Williamsburg, Greenpoint)
  10. Downtown Brooklyn / DUMBO
  11. South & Central Brooklyn
  12. Long Island City / Astoria
  13. Other Queens
  14. Bronx
  15. Airports (JFK and LaGuardia combined)

After mapping locations into these zones, drop any observations where the taxi leaves this 15-zone system.

Data set structure

After these steps, your data should look like this:

dropoff_location next_pickup_location next_dropoff_location
Upper Manhattan Upper East Side Upper East Side
Midtown West Lower Manhattan Bronx
  • The state is the dropoff_location.
  • The action is the next_pickup_location.
  • Get transition probabilities from next_dropoff_location.

Step 3: Estimate Non-Structural Inputs

Before estimating the structural model, compute two objects directly from the data:

  • Average fare by pickup zone
  • Markov transition probabilities

Markov transition probabilities

Estimate the probability that a trip beginning in zone \(a\) ends in zone \(z'\):

Then construct a 15x15 transition matrix where:

  • rows = pickup zones
  • columns = dropoff zones
  • entries = transition probabilities

Each row must sum exactly to 1.

This matrix represents the part of the environment that the driver does not control, but might be able to predict or have an intuitive understanding of from experience.

Step 4: Estimate the Structural Model

We assume drivers maximize expected discounted utility. Let the discount factor be 0.99.

Payoff function

Assume the driver’s per-period utility from choosing zone \(a\) is:

\[u(a) = \theta_1 \cdot \text{Fare}(a) + \theta_2 \cdot 1\{a = \text{Airport}\}\] Where:

  • \(\text{Fare}(a)\) is the average fare in zone \(a\)
  • \(1\{a=\text{Airports}\}\) is an indicator for the airport zone
  • \(\theta_1\) and \(\theta_2\) are parameters to estimate. They represent how strongly a driver prefers zones with higher expected fares, and the extra attractiveness of the airport zone.

Nested Fixed Point Estimation

Estimate the parameters \(\theta_1\) and \(\theta_2\) using the nested fixed point algorithm:

  1. Guess values of \(\theta_1\) and \(\theta_2\)
  2. Solve the Bellman equation for the value function
  3. Compute model-implied choice probabilities
  4. Evaluate the log likelihood of the observed actions
  5. Update parameters to maximize the log likelihood

Step 5: Your Final Report

Your project report should include the following sections.

1. Data

Explain:

  • which taxi dataset you used
  • how you restricted the sample
  • how you defined consecutive trips
  • how you mapped locations into the 15 zones

2. Descriptive Statistics

Provide tables and plots describing:

  • number of observations by zone
  • average fare by zone
  • transition frequencies across zones

3. Model

Clearly define:

  • the state space
  • the choice set
  • the transition matrix
  • the payoff function
  • the estimation strategy

4. Results

Report:

  • the estimated parameters
  • the interpretation of those parameters
  • the model-implied choice probabilities

5. Discussion

Discuss:

  • whether drivers appear to prefer zones with higher fares
  • how the airport parameter compares to other zones
  • what the model captures well
  • what the model leaves out

Discuss limitations such as:

  • interpreting the next pickup zone as a deliberate decision
  • ignoring repositioning costs
  • ignoring congestion and travel time

Upload your final project to Canvas before Friday, March 20 at 11:59pm.