Here is the second installment in my mini-series on stacked bar charts.

Grouping in your stacked bar charts can be powerful and insightful. With time series data, grouping by the day of the week, by month, or even by year can provide an interesting perspective on your data.

Considering the email data I used in my previous post, I can use the following code to group my data by day of week:

fig, ax = plt.subplots(figsize=(12,8))
title = 'Email counts by day of week: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_email.email_dt.min(), df_email.email_dt.max())

_ = df_email[['email_dt','category','dow']].groupby(['dow','category']).count().unstack().\
    plot(stacked=True, kind='barh', title=title, ax=ax)
Just what are those numbers in the Y column?

Interesting: I certainly receive more email on days 2 and 3 but…wait…what are days 2 and 3?!

Days 2 and 3 correspond to Wednesday and Thursday, respectively. I know this because I used the pandas dayofweek function to get those values and that’s what those numbers translate to. I may know that, but the average viewer of my chart won’t. So, I need a way to change those labels to ones the viewer can understand. I can do that with the following code (with the most pertinent code highlighted):

fig, ax = plt.subplots(figsize=(12,8))
title = 'Email counts by day of week: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_email.email_dt.min(), df_email.email_dt.max())

df_email[['email_dt','category','dow']].groupby(['dow','category']).count().unstack().\
    plot(stacked=True, kind='barh', ax=ax)

_ = ax.set_title(title)
_ = ax.set_xlabel('Email Count')
_ = ax.set_ylabel('Day of Week')

# clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(email_dt, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend, title='Category')

# now, replace the day numbers with their names
day_labels = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
curr_ylabels = [t.label.get_text() for t in ax.yaxis.get_major_ticks()]
new_ylabels = [day_labels[int(l)] for l in curr_ylabels]
_ = ax.set_yticklabels(new_ylabels)
Ahhh: much better!

Interestingly, pandas does have a day_name function that returns the name of the day instead of its number. The nice thing about my approach–using the dayofweek numbers and then replacing the numbers with the friendly names–is that matplotlib automatically sorts my bars numerically, so my bars are already in a natural order. In this case: Monday through Sunday. Were I to use the day_name function instead, matplotlib would want to sort the bars alphabetically, from Friday to Wednesday. That would make for an oddly arranged bar chart.