10 Hirshleifer and Shumway 2003 Part 1: Data

For this paper, we’ll need 2 sources of data:

  1. Weather Data from NOAA
  2. Daily stock returns from world indices from WRDS

Weather Data

We’ll start with getting the weather data, using the package worldmet to query NOAA.

install.packages("worldmet")
library(tidyverse)
library(worldmet)

weather_ny <- importNOAA(
  code = "725030-14732",
  year = 2015:2025
)

weather_ny %>% write_csv("weather_ny.csv")

Exploring the Weather Data

The NOAA data gives you, for any weather station, for every hour:

  • ws: wind speed
  • wd: wind direction in degrees
  • air_temp (in celsius)
  • atmos_pres
  • visibility (meters)
  • RH: relative humidity (0 to 100%)
  • dew_point (celsius)
  • ceil_hgt: the lowest cloud layer (meters)
  • cl: total cloud cover, reported on a scale from 0 to 10
  • cl_1, cl_2, cl_3: cloud cover for individual cloud layers
  • cl_1_height, cl_2_height, cl_3_height: heights of those cloud layers
  • precip_6: Precipitation accumulated over the previous 6 hours. Usually in millimeters.
  • pwc: present weather code. A coded weather description.
weather_ny <- read_csv("weather_ny.csv")
s
# Question 1: What was the hottest and coldest observation?


# Question 2: Visualize a time series of 5pm temperatures in 2015.


# Question 3: Visualize the distribution of ws (wind speed).


# Question 4: What are the most common present weather codes (pwc)?


# Question 5: Visualize the relationship between cloud cover
# and precipitation.


# Question 6: Visualize the seasonality of `cl`
# by calculating the average `cl` per day and
# plotting that over time.

Preparing the Weather Data

Let’s practice the steps for preparing the weather data described in the Hirshleifer and Shumway paper.

  • The NOAA data gives you dates in UTC (prime meridian). New York is UTC - 5 during standard time, and UTC - 4 during daylight savings time. I’ll create variables date_local, hour_local, and day_local using the correct time zone.
  • We’re interested in the average cloud cover and precipitation between 6AM and 4PM local time each day.
  • A problem with the data is that weather is very seasonal, and so are stock returns. We don’t want the results to be driven by the fact that winter is cloudy, and stock returns are lower in the winter. So we’ll try to deseasonalize the data as much as possible by taking the difference between the daily cloud cover and the average daily cloud cover for that week, across all years in our data. So if January 1, 2015 has a high cloud score, that indicates it had more cloud cover compared to January 1-7 in all years in our data.
weather_ny <- weather_ny %>%
  mutate(
    date_local = with_tz(date, tzone = "America/New_York"),
    hour_local = hour(date_local),
    day_local = as.Date(date_local)
  ) %>%
  # Question 7: Calculate the average cloud cover for each
  # day from 6AM to 4PM.
  filter(hour_local >= ___, hour_local <= ___) %>%
  select(station, day_local, cl, precip_6) %>%
  group_by(___) %>%
  summarize(
    clouds = mean(___, na.rm = T),
    precip = mean(___, na.rm = T)
  ) %>%
  # Question 8: deseasonalize according to the
  # instructions above.
  mutate(
    day_of_year = yday(___),
    wk = ((day_of_year - 1) %/% 7) + 1
    ) %>%
  group_by(___) %>%
  mutate(clouds = clouds - mean(clouds, na.rm = T)) %>%
  ungroup() %>%
  select(-day_of_year, -wk)

# Question 9: Check to see whether we've
# deseasonalized cloudiness with a time series
# of clouds over days.

Global Weather

I’ve queried NOAA not just for New York, but for all cities with stock exchanges from the Hirshleifer and Shumway paper. Then I calculated average cloud cover and precipitation for that day, then I deseasonalized. Here’s the data I’ve prepared for you:

weather <- read_csv("https://raw.githubusercontent.com/cobriant/teaching-datasets/refs/heads/main/world_weather.csv")

# Question 10: How many observations are there per city?


# Question 11: Draw a time series of `clouds` for
# Amsterdam. Is it deseasonalized?

Global Daily Index Returns

Next, we need WRDS data on stock markets from each of these countries. You could go to the WRDS web interface, but you would need to make 25 queries, one for each country with a stock exchange in the paper.

Instead, we’ll use the opportunity to learn how to query WRDS inside R using DBI::dbConnect and DBI::dbGetQuery.

stocks <- read_csv("world-stocks.csv")

# Question 12: Visualize `stocks` by drawing
# time series of each country's stock market
# returns. Use `facet_wrap(~country)`.

Download this assignment

Here’s a link to download this assignment.