File this under: there-has-to-be-a-simpler-way-to-do-this-in-pandas-but-I-haven’t-found-what-that-is
Recently, I’ve been playing with some financial data to get a better understanding of the yield curve. Related to yield and inverted yield curves are the periods of recession in the US economy. In my work, I wanted to first build a chart that indicated the periods of recession and ultimately overlay that with yield curve data. Little did I realize the challenge of just coding that first part.
I downloaded a dataset of recession data, which contains a record for every calendar quarter from the 1960s to present day and a 0 or 1 to indicate whether the economy was in recession for that quarter–“1” indicating that it was. What I need to do was pull all the records with a “1” indicator and find the start and end times for each of those ranges so that I could paint them onto a chart.
I’ve heard it said before that any time you have to write a loop over your pandas dataframe, you’re probably doing it wrong. I’m certainly doing a loop here and I have a nagging suspicion there’s probably a more elegant way to achieve the solution. Nevertheless, here’s what I came up with to solve my recession chart problem:
Step 1: Bring in the necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline # for easy chart display in jupyter notebook
Step 2: Load in my downloaded recession dataset and take a peek
# recession dates: https://fred.stlouisfed.org/series/JHDUSRGDPBR
df_recessions = pd.read_csv('./data/JHDUSRGDPBR_20220327.csv')
df_recessions['DATE'] = pd.to_datetime(df_recessions.DATE)
df_recessions.head()
df_recessions[df_recessions.JHDUSRGDPBR==1.0].head()
Step 3: Mark the start of every period of recession in the dataset
So, now I’m asking myself, “how do I extract the start and stop dates for every period of recession identified in the dataset? Let’s start with first just finding the start dates of recessions.” That shouldn’t be too difficult. If I can filter in just the recession quarters and calculate the date differences from one row to the next, if the difference is greater than three months (I estimated 93 days as three months), then I know there was a gap in quarters prior to the current record indicating that current record is the start of a new recession. Here’s what I came up with [one further note: my yield curve data only starts in 1990, so I filtered the recession data for 1990 to present]:
df_spans = df_recessions[(df_recessions.DATE.dt.year>=1990) & (df_recessions.JHDUSRGDPBR==1.0)].copy()
df_spans['days_elapsed'] = df_spans.DATE - df_spans.shift(1).DATE
df_spans['ind'] = df_spans.days_elapsed.dt.days.apply(lambda d: 's' if d > 93 else '')
df_spans.iloc[0, 3] = 's' # mark first row as a recession start
df_spans
Step 4: Find the end date of each recession
Here’s where my approach starts to go off the rails a little. The only way I could think to find the end dates of each recession is to:
- Loop through a list of the start dates
- In each loop, get the next start date and then grab the date of the record immediately before that one
- When I hit the last loop, just consider the last record to be the end date of the most recent recession
- With every stop date, add three months since the stop date is only the first day of the quarter and, presumably, the recession more or less lasts the entire quarter
Confusing? Here’s my code:
start_stop_dates = []
start_dates = df_spans.loc[df_spans.ind=='s', ].DATE.tolist()
for i, start_date in enumerate(start_dates):
if i < len(start_dates)-1:
stop_date = df_spans.loc[df_spans.DATE < start_dates[i+1]].iloc[-1].DATE
else:
stop_date = df_spans.iloc[-1].DATE
# add 3 months to the end of each stop date to stretch the value to the full quarter
start_stop_dates.append((start_date, stop_date + np.timedelta64(3,'M')))
start_stop_dates
Step 5: Build my chart
With that start/stop list, I can build my underlying recession chart:
fig, ax = plt.subplots(figsize=(12,6))
_ = ax.plot()
_ = ax.set_xlim([date(1990, 1, 1), date(2022, 4, 1)])
_ = ax.set_ylim([0, 10])
for st, sp in start_stop_dates:
_ = ax.axvspan(st, sp, alpha=0.2, color='gray')
Phew. All that work and I’m only at the starting point of my yield curve exploration, but that will have to wait for a future post. However, if you can think of a more elegant way to identify these date ranges without having to resort to looping, I’d love to hear it!
Recent Comments