I’ve used Matplotlib for years and yet I’m always discovering new features. Recently, I was working with some data with date ranges and thought maybe visualizing that data in something like a gantt chart might help me understand it better. When it came time to jazz up the chart with markers, I was really impressed at the options.
For illustration, suppose I have a dataset like this:
import pandas as pd
from datetime import timedelta
import matplotlib.pyplot as plt
data = [{'name': 'Larry', 'start_dt': '2023-01-05', 'end_dt': '2023-01-09'},
{'name': 'Moe', 'start_dt': '2023-01-07', 'end_dt': '2023-01-12'},
{'name': 'Curly', 'start_dt': '2023-01-02', 'end_dt': '2023-01-07'},
{'name': 'Al', 'start_dt': '2023-01-12', 'end_dt': '2023-01-15'},
{'name': 'Peggy', 'start_dt': '2023-01-04', 'end_dt': '2023-01-09'},
{'name': 'Kelly', 'start_dt': '2023-01-08', 'end_dt': '2023-01-12'},
{'name': 'Bud', 'start_dt': '2023-01-11', 'end_dt': '2023-01-14'}]
df = pd.DataFrame(data)
df['start_dt'] = pd.to_datetime(df.start_dt)
df['end_dt'] = pd.to_datetime(df.end_dt)
Is there an elegant way in pandas to expand the dataset to add a record for each day a person worked? I couldn’t think of any, so I just looped over the dataframe and did it the hard way:
fig, ax = plt.subplots(figsize=(10, 6))
new_ylabels = ['']
for i, r in df.sort_values('name').reset_index(drop=True).iterrows():
work_dates = pd.date_range(start=r['start_dt'], end=r['end_dt'], freq='1D')
df_temp = pd.DataFrame([{'name': r['name'], 'ypos': i+1}], index=work_dates)
_ = df_temp.plot(marker='d', markevery=[0,-1], markersize=5.0, ax=ax)
new_ylabels.append(r['name'])
_ = ax.get_legend().remove()
_ = ax.set(yticklabels=new_ylabels)
_ = ax.set_title('Lame gantt-type chart')
This produced a pretty nifty gantt-type chart with the timelines from my dataset:
The idea I want to highlight with this post is the work I did in the code above on line 7. I used three marker properties to craft the chart I was after:
- marker
- markersize
- markevery
Marker
You set the type of marker you want with the “marker” property–and there are tons of choices. I chose a lowercase “d” to get a “thin diamond”.
Markersize
The “markersize” property does what it says: sets the size of the marker. In my experience, I’ve just had to set a value, render the chart, and then adjust-and-repeat to get the size I want.
Markevery
I was actually pretty familiar with the “marker” and “markersize” properties–having used them extensively in the past–but I was pretty excited to learn about the “markevery” property. By default, a marker will appear at every datapoint in your chart. However, gantt charts normally only mark the beginning and end of a time range and not every point in between. With the “markevery” property, all I needed to do was pass it a list of [0, -1] to tell it to mark only the first and last points in each time range.
These properties really helped render the chart I wanted. It’s always great to learn more about the versatility of Matplotlib!
Recent Comments