Thinning out your tick labels

Have you ever rendered a chart with Pandas and/or Matplotlib where one or both of your axes (axises?) rendered as a smear of overlapping, unreadable black text?

If you can read this, you don’t need glasses

As an example, let’s create a bar chart of COVID-19 data. [As an aside: I’ve noticed that line charts seem to automatically thin out any overlapping tick labels and tend not to fall prey to this problem.]

Load and clean up the data

After downloading the CSV data, I wrote the following code to load the data and prepare it for visualization:

df_covid_confirmed_us = pd.read_csv('./data/time_series_covid19_confirmed_US_20200720.csv')
df_covid_deaths_us = pd.read_csv('./data/time_series_covid19_deaths_US_20200720.csv')

cols_to_keep1 = [i for i, v in enumerate(df_covid_confirmed_us.columns) if v in ['Admin2', 'Province_State'] or v.endswith('20')]
cols_to_keep2 = [i for i, v in enumerate(df_covid_deaths_us.columns) if v in ['Admin2', 'Province_State'] or v.endswith('20')]
df_covid_confirmed_ohio = df_covid_confirmed_us[df_covid_confirmed_us.Province_State=='Ohio'].iloc[:,cols_to_keep1].copy()
df_covid_deaths_ohio = df_covid_deaths_us[df_covid_deaths_us.Province_State=='Ohio'].iloc[:,cols_to_keep2].copy()

df_covid_confirmed_ohio.head()

Tidy up the dataframes

The data is still a bit untidy, so I wrote this additional code to transform it into a more proper format:

date_cols = df_covid_confirmed_ohio.columns.tolist()[2:]
rename_cols_confirmed = {'variable': 'obs_date', 'value': 'confirmed_cases'}
rename_cols_deaths = {'variable': 'obs_date', 'value': 'deaths'}

df_covid_confirmed_ohio = pd.melt(df_covid_confirmed_ohio.reset_index(), id_vars=['Admin2', 'Province_State'], 
                                  value_vars=date_cols).rename(columns=rename_cols_confirmed)
df_covid_deaths_ohio = pd.melt(df_covid_deaths_ohio.reset_index(), id_vars=['Admin2', 'Province_State'], 
                               value_vars=date_cols).rename(columns=rename_cols_deaths)

df_covid_confirmed_ohio['obs_date'] = pd.to_datetime(df_covid_confirmed_ohio.obs_date)
df_covid_deaths_ohio['obs_date'] = pd.to_datetime(df_covid_deaths_ohio.obs_date)

print(df_covid_confirmed_ohio.head())
print(df_covid_deaths_ohio.head())

Concatenate the two dataframes together

I’d like to do a nice, side-by-side comparison, in bar chart form, of these two datasets. One way to do that is to concatenate both dataframes together and then render your chart from the single result. Here’s the code I wrote to concatenate both datasets together:

df_covid_confirmed_ohio['data_type'] = 'confirmed cases'
df_covid_confirmed_ohio['cnt'] = df_covid_confirmed_ohio.confirmed_cases
df_covid_deaths_ohio['data_type'] = 'deaths'
df_covid_deaths_ohio['cnt'] = df_covid_deaths_ohio.deaths
drop_cols = ['confirmed_cases', 'deaths', 'Admin2', 'Province_State']

df_combined_data = pd.concat([df_covid_confirmed_ohio[df_covid_confirmed_ohio.obs_date>='2020-5-1'], 
               df_covid_deaths_ohio[df_covid_deaths_ohio.obs_date>='2020-5-1']], sort=False).drop(columns=drop_cols)

Now, render the chart

Ok, I’m finally ready to create my chart:

fig, ax = plt.subplots(figsize=(12,8))
_ = df_combined_data.groupby(['obs_date', 'data_type']).sum().unstack().plot(kind='bar', ax=ax)

# draws the tick labels at an angle
fig.autofmt_xdate()

title = 'Number of COVID-19 cases/deaths in Ohio: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_combined_data.obs_date.min(), 
                                                                                      df_combined_data.obs_date.max())
_ = ax.set_title(title)
_ = ax.set_xlabel('Date')
_ = ax.set_ylabel('Count')

# clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(cnt, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend)

Wow! Those dates along the X axis are completely unreadable!

The X axis is a mess! Fortunately, there are a variety of ways to fix this problem: I particularly like the approach mentioned in this solution. Basically, I’m going to thin out the labels at a designated frequency. In my solution, I only show every fourth date/label. So, here’s my new code with my label fix highlighted:

fig, ax = plt.subplots(figsize=(12,8))
_ = df_combined_data.groupby(['obs_date', 'data_type']).sum().unstack().plot(kind='bar', ax=ax)

# draws the tick labels at an angle
fig.autofmt_xdate()

title = 'Number of COVID-19 cases/deaths in Ohio: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_combined_data.obs_date.min(), 
                                                                                     df_combined_data.obs_date.max())
_ = ax.set_title(title)
_ = ax.set_xlabel('Date')
_ = ax.set_ylabel('Count')

# clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(cnt, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend)

# tick label fix
tick_labels = [l.get_text().replace(' 00:00:00', '') for l in ax.get_xticklabels()]
new_tick_labels = [''] * len(tick_labels)
new_tick_labels[::4] = tick_labels[::4]
_ = ax.set_xticklabels(new_tick_labels)

That X axis is much more readable now thanks to the power of Python list slicing.

Thinning out your tick labels

Load and clean up the data

Tidy up the dataframes

Concatenate the two dataframes together

Now, render the chart

Recent Posts

Recent Comments

Archives

Meta

Thinning out your tick labels

Load and clean up the data

Tidy up the dataframes

Concatenate the two dataframes together

Now, render the chart

Recent Posts

Recent Comments

Archives

Tags

Meta