I’m certainly a bigfan of visualizingdata. Often, I like to present multiple types of visualizations together to offer a variety of perspectives on the data. For example, I might provide both a bar chart and scatter plot together to provide deeper insight than a single visual would:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
%matplotlib inline
# data from: https://www.kaggle.com/mysarahmadbhat/imdb-top-1000-movies
df = pd.read_csv('./data/regex_imdb.csv').fillna(0)
fig, ax = plt.subplots(1, 2, figsize=(10,6))
d1 = df[df.Year==2019][['Name', 'Gross']].sort_values('Gross').tail(10)
_ = ax[0].barh(d1.Name, d1.Gross)
_ = ax[0].set_xlabel('Gross Earnings')
d2 = df[df.Year==2019][['Run_time', 'Gross', 'Genre']].copy()
d2['Genre'] = d2.Genre.apply(lambda g: g.split(',')[0])
_ = sns.scatterplot(data=d2, x='Run_time', y='Gross', hue='Genre', ax=ax[1])
_ = ax[1].set_xlabel('Runtime (minutes)')
_ = fig.suptitle('Analysis of Movies from 2019')
In this sort of work, I will target specific axes to display specific charts. Thus, in my above example, I explicitly pushed a bar chart to ax[0] and a scatter plot to ax[1].
However, on occasion, circumstances demand that I write the same type of chart to multiple subplots where I change one variable for each. For example, suppose I want to get a quick view of the top 10 movies by gross earnings from 2010 to 2019:
I could write code to target each of these axes explicitly, but that would mean a lot of code and a lot of copy/paste. Instead, I’d rather just write a loop to iterate through the years and write the appropriate bar chart to the appropriate axis.
Looping and rendering the charts comes relatively easy to me. What usually trips me up in these efforts is targeting the right row and column. I often spend most of my time trying to remember how I solved this problem in the past.
Well no more! Hopefully this post will serve as a reference any time I need to do this type of work in the future. Ultimately, my solution is just three lines of code:
nbr_of_rows = 5
nbr_of_cols = 2
coords = [(r, c) for r in range(nbr_of_rows) for c in range(nbr_of_cols)]
Here, I set the number of rows and columns I want in my visual and do some list comprehension to pair those coordinates together in a list. Now, I have a nice, pre-built list of coordinates to leverage in my loop:
fig, ax = plt.subplots(nbr_of_rows, nbr_of_cols, figsize=(12,12))
for i, yr in enumerate(range(2010, 2020)):
r,c = coords[i] # grab the pre-built coordinates
d = df[df.Year==yr][['Name', 'Gross']].sort_values('Gross').tail(10)
_ = ax[r][c].barh(d.Name, d.Gross)
_ = ax[r][c].set_title('Top 10 grossing movies in {0}'.format(yr))
fig.tight_layout()
For many years, as I’ve trolled the intertubes, I would occasionally run across creative résumés that would include timelines depicting different work events in the lives of those professionals. As I would happen upon these graphics, I would think to myself, “self, I’m no artist: is there a way to programmatically generate such timelines?”
Well, thanks to this recent article, here’s a neat way to use Python and matplotlib to just that.
Step 1: Do your imports
import matplotlib.pyplot as plt
from datetime import date
import numpy as np
%matplotlib inline # since I'm doing this work in a Jupyter Notebook
Step 2: Get your timeline data together
For simplicity, I’m just hard coding my dates and event labels in two different lists, but you could easily pull together data from a dataframe or other object. I’m also calculating a “minimum date” (min_date) where I get the earliest date from my dataset and subtract two years and a “maximum date” (max_date) where I get the newest date and add two years. I’m subtracting and adding years just to get some padding in my graphic. I’ll use these variables later on. (Note that I do use “\n” in my labels to wrap long text to a second line.)
# reference: https://mentalitch.com/key-events-in-rock-and-roll-history/
dates = [date(1954, 7, 19), date(1959, 2, 3), date(1964, 2, 9), date(1965, 7, 25), date(1967, 6, 1), date(1969, 8, 15)]
min_date = date(np.min(dates).year - 2, np.min(dates).month, np.min(dates).day)
max_date = date(np.max(dates).year + 2, np.max(dates).month, np.max(dates).day)
labels = ['Elvis appears on\nthe Ed Sullivan Show', 'Buddy Holly dies', 'The Beatles appear\non the Ed Sullivan Show',
'Bob Dylan goes electric', 'The Beatles release\nSgt. Pepper', 'Woodstock']
# labels with associated dates
labels = ['{0:%d %b %Y}:\n{1}'.format(d, l) for l, d in zip (labels, dates)]
Step 3: Set up my timeline and points
This is where it starts to get cool: I knew matplotlib had a horizontal line function, but it never occurred to me that I could use it as a timeline. Likewise, it never occurred to me to use the library’s scatter plot function to paint dots on a timeline.
Next, I can use the text function to add my event labels to the timeline. I did have to play around with my y-axis offsets for my labels to be nicely positioned above and below the timeline. I used Python list slicing to position labels with an even index above the line and labels with an odd index below.
label_offsets = np.zeros(len(dates))
label_offsets[::2] = 0.35
label_offsets[1::2] = -0.7
for i, (l, d) in enumerate(zip(labels, dates)):
_ = ax.text(d, label_offsets[i], l, ha='center', fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)
Step 5: Add lollipops
What a clever way to use matplotlib’s stem plot function! Here, we can create stems to link our labels to their associated dots on the timeline.
# hide lines around chart
for spine in ["left", "top", "right", "bottom"]:
_ = ax.spines[spine].set_visible(False)
# hide tick labels
_ = ax.set_xticks([])
_ = ax.set_yticks([])
_ = ax.set_title('Important Milestones in Rock and Roll', fontweight="bold", fontfamily='serif', fontsize=16,
color='royalblue')
And now, we have a pretty cool timeline:
This chart is using the default matplotlib style. I did try using other styles like XKCD, as the author highlighted in the article, but my chart just didn’t look very good. Your mileage may vary.
But, wait…there’s more!
What if I want to do a vertical timeline instead? Well, you can do that, as well, with some adjustments.
Additional import
To help better center my event labels, I’ll import the timedelta function:
from datetime import timedelta
Use the axvline function
For my vertical timeline, I’ll use the axvline function. I’ve also made a few other code adjustments you can see:
Adjust the dates used to position the event labels
Without the timedelta adjustment, the label positioning still doesn’t look too bad, but subtracting about 90 days from each date helps sort-of vertically center the labels:
label_offsets = np.repeat(2.0, len(dates))
label_offsets[1::2] = -2.0
for i, (l, d) in enumerate(zip(labels, dates)):
d = d - timedelta(days=90)
align = 'right'
if i % 2 == 0:
align = 'left'
_ = ax.text(label_offsets[i], d, l, ha=align, fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)
There doesn’t seem to be a stem function for horizontal lines
The documentation says you should be able to orient stem lines horizontally, but I never got that to work, so I opted to go with the hlines function, instead.
Simple Object Access Protocol, or SOAP, was the new hotness in web service technology…some 15 or 20 years ago. It was built around XML, Web Service Definition Language (WSDL), XML namespaces and other complex ideas.
But some things never die and, recently, I found myself elbow-deep into a number of SOAP APIs while trying to pull data from a vendor product. I wrote a Python client to interface with those APIs. While Python has a number of packages designed to work with the technology, I wanted to stick with just the requests package to keep my dependencies minimal. Ultimately, my client worked well and I wanted to share a few tidbits here that I learned along the way to get my requests code to successfully call SOAP web services.
1. The header can be tricky
Getting your header right is critical to successful service calls. I found two header elements essential for my code: SOAPAction and Content-Type. It was important that I set SOAPAction to a url corresponding with the particular web method I wished to call. The vendor documentation was pretty important here to determine what that url should be.
What’s interesting about Content-Type is that the web is full of valid suggestions for the proper value: text/xml and application/soap+xml are two that I’ve seen bandied about. In my case, neither value worked. Again from the vendor documentation, the value that made my calls work was application/x-www-form-urlencoded. So, my header dictionary looked roughly like this:
2. The post data doesn’t necessarily need to be XML
Crazy notion, right? Posting non-XML to a SOAP API? Early on in my work, I kept trying to format all my post arguments into a single XML document and tried to push that document to the web method with my requests call, but the code would never work. At some point, I stumbled upon a forum or discussion thread where one of the participants posted code that actually used a dictionary for his post data object–what you would normally do with a REST API. I was taken aback but gave it a go and, to my astonishment, it worked! Some web methods required simple parameters like strings and integers, ready made for Python dictionaries. A few did have a parameter or two of XML. For those, I simply had to push a string representation of a properly formatted XML document. My code looked something like this:
I “believe” the most appropriate way to deal with XML responses in the response object is through the content property. But, since the response is supposed to be XML, I wanted to run the content through ElementTree to get a proper XML document I could more easily process. In my early attempts, I passed the content value to ElementTree’s fromstring function to get back a proper XML document that I can process like any other XML document. Or so I thought.
The rub is that fromstring returns an XML element, not an XML document. You have to add one more line, a call to the ElementTree constructor itself, you get the proper XML document object you can use in the rest of your code. My response processing code then looked like this:
import xml.etree.ElementTree as ET
resp = requests.post(ws_url, data=post_data, headers=headers)
resp_elem = ET.fromstring(resp.content)
resp_doc = ET.ElementTree(resp_elem)
# now, you can use functions like find and findall with the resp_doc object
So, the next time you find yourself having to work with SOAP APIs–and hopefully you don’t–there are some handy tips and tricks to consider.
Recent Comments