When you’re dealing with event data, one neat visualization option is to depict your data over a twenty-four hour period on a radar chart built to look like the face of a clock. Matplotlib’s polar chart capabilities makes this relatively simple.
As an example, I’ll chart crime incident data from the city of Cincinnati.
Step 1: Bring in the data
For starters, set up your standard package import statements and read in the dataset:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta, date
# data from : https://data.cincinnati-oh.gov/Safety/PDI-Police-Data-Initiative-Crime-Incidents/k59e-2pvf
df_crime = pd.read_csv('./data/PDI__Police_Data_Initiative__Crime_Incidents.csv')
df_crime['DATE_REPORTED'] = pd.to_datetime(df_crime.DATE_REPORTED)
Let’s try to find the day with the highest number of incidents in 2019:
df_crime[df_crime.DATE_REPORTED.dt.year==2019][['DATE_REPORTED','INSTANCEID']].groupby(df_crime.DATE_REPORTED.dt.date).\
count().sort_values('INSTANCEID', ascending=False)
It looks like the most incidents took place on January 15. Now, when did those incidents occur over the course of the day?
df_crime[df_crime.DATE_REPORTED.dt.date==date(2019,1,15)][['DATE_REPORTED']].groupby(df_crime.DATE_REPORTED.dt.hour).count()
Grouping by hour, we can see that the vast majority of incidents occurred during the 9am hour. Now, let’s visualize that.
Step 2: Create a handy chart function
To make my work a little more portable, I created a “render_chart” function that takes as parameters the dataframe of hour data, the axis in which to place the chart, and the title:
def render_chart(df, axis, title):
theta = np.arange(df.shape[0])/float(df.shape[0]) * 2 * np.pi
_ = axis.bar(theta + theta[1]/2, df.event_count, width=theta[1], color='red')
ticklabels = [(timedelta(hours=h) + datetime(2021,1,1)).strftime('%#I%p').lower() for h in range(0,24)]
_ = axis.set_xticks(theta)
_ = axis.set_xticklabels(ticklabels)
_ = axis.set_yticklabels([])
_ = axis.set_title(title)
axis.set_theta_direction(-1)
axis.set_theta_zero_location('N')
Some things to note with my function:
- The function expects the dataframe to contain a column called “event_count” that is a count of events for each hour over the day
- Finding the right time format string so that I could display 1am instead of 01am was actually a bit difficult. That hash mark (#) did the trick.
- Matplotlib polar charts, by default, render counter-clockwise. Setting the set_theta_direction to -1 let’s you reverse that behavior. Setting the set_theta_zero_location to North (N) allows you to start rendering the chart like a clock, at the top.
Step 3: Pad your data
For the January 15 data, there are several hours of the day with no reported incidents (eg. from 4am to 6am). In order to get the chart to render correctly, I need to pad those empty periods with 0. I solved that problem by creating an “empty hours” dataframe–a dataframe of 24 hours with 0 event counts–and then merged the real data with the empty one:
# get my real data
df_chart = df_crime[df_crime.DATE_REPORTED.dt.date==date(2019,1,15)][['DATE_REPORTED']].\
groupby(df_crime.DATE_REPORTED.dt.hour).count().rename(columns={'DATE_REPORTED':'event_count'})
# create an "empty hour" dataframe
df_empty_hrs = pd.DataFrame(np.zeros(24), index=range(0, 24))
# merge the two together
df_chart = df_empty_hrs.join(df_chart, how='left').fillna(0).drop(columns=[0])
Step 4: Finally, render the chart
Now, we can produce the chart:
fig, ax = plt.subplots(figsize=(12, 7), subplot_kw={'projection': 'polar'})
render_chart(df_chart, ax, 'Cincinnati Crime Incidents: 15 Jan 2019')
It is interesting that an overwhelming majority of incidents on this day occurred at the 9am hour.
It would be further interesting to see what an average day in 2019 looked like: maybe weekday versus weekend. or Maybe average Monday through Sunday. Pandas makes it pretty simple to do these calculations and, with my function, you can easily visualize the results!
Recent Comments