I’m certainly a bigfan of visualizingdata. Often, I like to present multiple types of visualizations together to offer a variety of perspectives on the data. For example, I might provide both a bar chart and scatter plot together to provide deeper insight than a single visual would:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
%matplotlib inline
# data from: https://www.kaggle.com/mysarahmadbhat/imdb-top-1000-movies
df = pd.read_csv('./data/regex_imdb.csv').fillna(0)
fig, ax = plt.subplots(1, 2, figsize=(10,6))
d1 = df[df.Year==2019][['Name', 'Gross']].sort_values('Gross').tail(10)
_ = ax[0].barh(d1.Name, d1.Gross)
_ = ax[0].set_xlabel('Gross Earnings')
d2 = df[df.Year==2019][['Run_time', 'Gross', 'Genre']].copy()
d2['Genre'] = d2.Genre.apply(lambda g: g.split(',')[0])
_ = sns.scatterplot(data=d2, x='Run_time', y='Gross', hue='Genre', ax=ax[1])
_ = ax[1].set_xlabel('Runtime (minutes)')
_ = fig.suptitle('Analysis of Movies from 2019')
In this sort of work, I will target specific axes to display specific charts. Thus, in my above example, I explicitly pushed a bar chart to ax[0] and a scatter plot to ax[1].
However, on occasion, circumstances demand that I write the same type of chart to multiple subplots where I change one variable for each. For example, suppose I want to get a quick view of the top 10 movies by gross earnings from 2010 to 2019:
I could write code to target each of these axes explicitly, but that would mean a lot of code and a lot of copy/paste. Instead, I’d rather just write a loop to iterate through the years and write the appropriate bar chart to the appropriate axis.
Looping and rendering the charts comes relatively easy to me. What usually trips me up in these efforts is targeting the right row and column. I often spend most of my time trying to remember how I solved this problem in the past.
Well no more! Hopefully this post will serve as a reference any time I need to do this type of work in the future. Ultimately, my solution is just three lines of code:
nbr_of_rows = 5
nbr_of_cols = 2
coords = [(r, c) for r in range(nbr_of_rows) for c in range(nbr_of_cols)]
Here, I set the number of rows and columns I want in my visual and do some list comprehension to pair those coordinates together in a list. Now, I have a nice, pre-built list of coordinates to leverage in my loop:
fig, ax = plt.subplots(nbr_of_rows, nbr_of_cols, figsize=(12,12))
for i, yr in enumerate(range(2010, 2020)):
r,c = coords[i] # grab the pre-built coordinates
d = df[df.Year==yr][['Name', 'Gross']].sort_values('Gross').tail(10)
_ = ax[r][c].barh(d.Name, d.Gross)
_ = ax[r][c].set_title('Top 10 grossing movies in {0}'.format(yr))
fig.tight_layout()
For many years, as I’ve trolled the intertubes, I would occasionally run across creative résumés that would include timelines depicting different work events in the lives of those professionals. As I would happen upon these graphics, I would think to myself, “self, I’m no artist: is there a way to programmatically generate such timelines?”
Well, thanks to this recent article, here’s a neat way to use Python and matplotlib to just that.
Step 1: Do your imports
import matplotlib.pyplot as plt
from datetime import date
import numpy as np
%matplotlib inline # since I'm doing this work in a Jupyter Notebook
Step 2: Get your timeline data together
For simplicity, I’m just hard coding my dates and event labels in two different lists, but you could easily pull together data from a dataframe or other object. I’m also calculating a “minimum date” (min_date) where I get the earliest date from my dataset and subtract two years and a “maximum date” (max_date) where I get the newest date and add two years. I’m subtracting and adding years just to get some padding in my graphic. I’ll use these variables later on. (Note that I do use “\n” in my labels to wrap long text to a second line.)
# reference: https://mentalitch.com/key-events-in-rock-and-roll-history/
dates = [date(1954, 7, 19), date(1959, 2, 3), date(1964, 2, 9), date(1965, 7, 25), date(1967, 6, 1), date(1969, 8, 15)]
min_date = date(np.min(dates).year - 2, np.min(dates).month, np.min(dates).day)
max_date = date(np.max(dates).year + 2, np.max(dates).month, np.max(dates).day)
labels = ['Elvis appears on\nthe Ed Sullivan Show', 'Buddy Holly dies', 'The Beatles appear\non the Ed Sullivan Show',
'Bob Dylan goes electric', 'The Beatles release\nSgt. Pepper', 'Woodstock']
# labels with associated dates
labels = ['{0:%d %b %Y}:\n{1}'.format(d, l) for l, d in zip (labels, dates)]
Step 3: Set up my timeline and points
This is where it starts to get cool: I knew matplotlib had a horizontal line function, but it never occurred to me that I could use it as a timeline. Likewise, it never occurred to me to use the library’s scatter plot function to paint dots on a timeline.
Next, I can use the text function to add my event labels to the timeline. I did have to play around with my y-axis offsets for my labels to be nicely positioned above and below the timeline. I used Python list slicing to position labels with an even index above the line and labels with an odd index below.
label_offsets = np.zeros(len(dates))
label_offsets[::2] = 0.35
label_offsets[1::2] = -0.7
for i, (l, d) in enumerate(zip(labels, dates)):
_ = ax.text(d, label_offsets[i], l, ha='center', fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)
Step 5: Add lollipops
What a clever way to use matplotlib’s stem plot function! Here, we can create stems to link our labels to their associated dots on the timeline.
# hide lines around chart
for spine in ["left", "top", "right", "bottom"]:
_ = ax.spines[spine].set_visible(False)
# hide tick labels
_ = ax.set_xticks([])
_ = ax.set_yticks([])
_ = ax.set_title('Important Milestones in Rock and Roll', fontweight="bold", fontfamily='serif', fontsize=16,
color='royalblue')
And now, we have a pretty cool timeline:
This chart is using the default matplotlib style. I did try using other styles like XKCD, as the author highlighted in the article, but my chart just didn’t look very good. Your mileage may vary.
But, wait…there’s more!
What if I want to do a vertical timeline instead? Well, you can do that, as well, with some adjustments.
Additional import
To help better center my event labels, I’ll import the timedelta function:
from datetime import timedelta
Use the axvline function
For my vertical timeline, I’ll use the axvline function. I’ve also made a few other code adjustments you can see:
Adjust the dates used to position the event labels
Without the timedelta adjustment, the label positioning still doesn’t look too bad, but subtracting about 90 days from each date helps sort-of vertically center the labels:
label_offsets = np.repeat(2.0, len(dates))
label_offsets[1::2] = -2.0
for i, (l, d) in enumerate(zip(labels, dates)):
d = d - timedelta(days=90)
align = 'right'
if i % 2 == 0:
align = 'left'
_ = ax.text(label_offsets[i], d, l, ha=align, fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)
There doesn’t seem to be a stem function for horizontal lines
The documentation says you should be able to orient stem lines horizontally, but I never got that to work, so I opted to go with the hlines function, instead.
The message included a statement that my account needed to be part of the docker-users group in order to use Docker.
No problem. I logged into my machine with my admin account and opened up Computer Management to access the Local Users and Groups section so that I could add my developer account to the group. Only…the Local Users and Groups section wasn’t there! Er, what?
Recent Comments