A chart may be worth 1000 words, but sometimes embedding a few words in your chart can convey additional, helpful information. For example, take this chart that I built from the incredible COVID-19 data collected by Johns Hopkins University:
Pretty telling as it is. Now, let’s add some words to it:
Adding some words to the chart, 1) conveys additional, helpful information and 2) fills in some awkward whitespace. Seems like a win to me. For completeness sake, here’s what I did to build this chart:
Step 1: Import the packages
import pandas as pd
import matplotlib.pyplot as plt
Step 2: Load up the JHU dataset
df = pd.read_csv('./data/time_series_covid19_confirmed_US.csv')
Step 3: Trim the data down to just counties in Ohio
cols = [i for i, v in enumerate(df.columns) if v in ['Admin2', 'Province_State'] or v.endswith('2020')]
df_ohio = df[df.Province_State=='Ohio'].iloc[:,cols].copy()
df_ohio['county'] = df_ohio.Admin2 + ', ' + df_ohio.Province_State # combine county and State together in a field
df_ohio = df_ohio.drop(columns=['Admin2', 'Province_State']).set_index('county')
Step 4: Build the chart
fig, ax = plt.subplots(figsize=(12,10))
title = 'Top 10 Ohio Counties with Confirmed COVID-19 Cases as of ' + df.columns[-1]
worst_county, worst_co_cases = [(k, v) for k, v in df_ohio['3/30/2020'].sort_values().tail(1).items()][0]
inset = """
There are {0} counties and other Ohio
entities in this dataset. As of {1},
there are {2:,} confirmed cases of COVID-19.
{3} represents {4:.1f}% of that population.
""".format(df_ohio.shape[0], df.columns[-1], df_ohio['3/30/2020'].sum(), worst_county,
(worst_co_cases/df_ohio['3/30/2020'].sum())*100)
_ = df_ohio['3/30/2020'].sort_values().tail(10).plot(kind='barh', ax=ax, title=title)
_ = ax.set_ylabel('Ohio counties')
_ = ax.set_xlabel('Confirmed Cases')
# you have to experiment a little with the x, y positioning to get your word inset positioned just right
text = fig.text(0.30, 0.35, inset, va='center', ha='left', size=18)
Pretty darn slick!
Recent Comments