Extra Credit Project: DDC Model on Taxi Drivers
This extra credit project can mean up to +5 added to your final grade. Note: this is not a group project. If two submissions are too similar, you will both receive a zero.
- Option A: follow the outline below with the taxi driver problem.
- Option B: follow the outline below, but with any DDC model idea you have. Make sure you can get access to relevant data describing choices experts make facing some measure of the state, and that decision is made over time to maximize long-run payoffs or to minimize long-run costs.
Overview
In this project, you will build and estimate a simple dynamic discrete choice (DDC) model using New York City taxi trip data.
The idea:
- After completing a trip, a taxi driver must decide where to go to look for the next passenger. Different neighborhoods offer different expected earnings and different future opportunities.
- Drivers are assumed to be forward-looking. When deciding where to go next, they consider both:
- the expected fare from the next pickup location
- the future value of ending up in different parts of the city afterward.
- Your goal is to estimate a structural model describing these decisions.
Step 1: Get NYC Taxi Trip Data
You will need taxi trip data that includes at least:
- a taxi identifier (driver or cab ID)
- pickup location
- dropoff location
- pickup timestamp
- dropoff timestamp
- trip fare
The data must allow you to observe consecutive trips by the same taxi.
Step 2: Prepare the Data
Restricting the Sample
To study driver decisions in a consistent environment, restrict the data to a specific time window. For example: “weekday trips between 7:00AM and 11:00AM”.
This avoids mixing together very different market conditions such as rush hour, late night, and weekends.
Construct Consecutive Trips
Sort trips by taxi identifier and time. For each trip, construct a new variable: next_pickup_location. This is the pickup location of the next trip by the same taxi.
You’ll also want to drop observations where:
- the taxi has no subsequent trip
- the gap between trips is very long (for example more than 1 hour), which likely indicates the end of a shift rather than a repositioning decision.
Define the 15 taxi zones
Replace the detailed taxi locations with the following 15 zones:
- Upper Manhattan (Washington Heights, Inwood, Hamilton Heights)
- Harlem / Morningside Heights
- Upper East Side
- Upper West Side
- Midtown East (Grand Central, Turtle Bay)
- Midtown West / Times Square
- Chelsea / Flatiron / Gramercy
- Lower Manhattan (SoHo, Tribeca, Financial District, Chinatown)
- North Brooklyn (Williamsburg, Greenpoint)
- Downtown Brooklyn / DUMBO
- South & Central Brooklyn
- Long Island City / Astoria
- Other Queens
- Bronx
- Airports (JFK and LaGuardia combined)
After mapping locations into these zones, drop any observations where the taxi leaves this 15-zone system.
Data set structure
After these steps, your data should look like this:
| dropoff_location | next_pickup_location | next_dropoff_location |
|---|---|---|
| Upper Manhattan | Upper East Side | Upper East Side |
| Midtown West | Lower Manhattan | Bronx |
- The state is the dropoff_location.
- The action is the next_pickup_location.
- Get transition probabilities from next_dropoff_location.
Step 3: Estimate Non-Structural Inputs
Before estimating the structural model, compute two objects directly from the data:
- Average fare by pickup zone
- Markov transition probabilities
Markov transition probabilities
Estimate the probability that a trip beginning in zone \(a\) ends in zone \(z'\):
Then construct a 15x15 transition matrix where:
- rows = pickup zones
- columns = dropoff zones
- entries = transition probabilities
Each row must sum exactly to 1.
This matrix represents the part of the environment that the driver does not control, but might be able to predict or have an intuitive understanding of from experience.
Step 4: Estimate the Structural Model
We assume drivers maximize expected discounted utility. Let the discount factor be 0.99.
Payoff function
Assume the driver’s per-period utility from choosing zone \(a\) is:
\[u(a) = \theta_1 \cdot \text{Fare}(a) + \theta_2 \cdot 1\{a = \text{Airport}\}\] Where:
- \(\text{Fare}(a)\) is the average fare in zone \(a\)
- \(1\{a=\text{Airports}\}\) is an indicator for the airport zone
- \(\theta_1\) and \(\theta_2\) are parameters to estimate. They represent how strongly a driver prefers zones with higher expected fares, and the extra attractiveness of the airport zone.
Nested Fixed Point Estimation
Estimate the parameters \(\theta_1\) and \(\theta_2\) using the nested fixed point algorithm:
- Guess values of \(\theta_1\) and \(\theta_2\)
- Solve the Bellman equation for the value function
- Compute model-implied choice probabilities
- Evaluate the log likelihood of the observed actions
- Update parameters to maximize the log likelihood
Step 5: Your Final Report
Your project report should include the following sections.
1. Data
Explain:
- which taxi dataset you used
- how you restricted the sample
- how you defined consecutive trips
- how you mapped locations into the 15 zones
2. Descriptive Statistics
Provide tables and plots describing:
- number of observations by zone
- average fare by zone
- transition frequencies across zones
3. Model
Clearly define:
- the state space
- the choice set
- the transition matrix
- the payoff function
- the estimation strategy
4. Results
Report:
- the estimated parameters
- the interpretation of those parameters
- the model-implied choice probabilities
5. Discussion
Discuss:
- whether drivers appear to prefer zones with higher fares
- how the airport parameter compares to other zones
- what the model captures well
- what the model leaves out
Discuss limitations such as:
- interpreting the next pickup zone as a deliberate decision
- ignoring repositioning costs
- ignoring congestion and travel time
Upload your final project to Canvas before Friday, March 20 at 11:59pm.