Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Month: May 2019

Python bingo

Have a road trip planned this summer? Want to keep the kids from driving you crazy as you drive to Walley World? How about playing the ol’ standard License Plate game but with a twist: License Plate Bingo!

Run this code a couple of times and print out the chart on separate pieces of paper. There are your bingo cards. Give one to each kid and/or adult. Now, hit the road! If you see a license plate from, say, Texas, and you have a “Texas” square, mark it off on your bingo card. If you can mark off a row horizontally, vertically, or diagonally, you win!

Step 1: Import your packages

import matplotlib.pyplot as plt
import matplotlib.style as style
import numpy as np
import random

%matplotlib inline
style.use('seaborn-poster')

Step 2: Get your State names

# compliments of this forum: https://gist.github.com/JeffPaine/3083347
states = ["AL - Alabama", "AK - Alaska", "AZ - Arizona", "AR - Arkansas", "CA - California", "CO - Colorado",
"CT - Connecticut", "DC - Washington DC", "DE - Deleware", "FL - Florida", "GA - Georgia",
"HI - Hawaii", "ID - Idaho", "IL - Illinios", "IN - Indiana", "IA - Iowa",
"KS - Kansas", "KY - Kentucky", "LA - Louisiana", "ME - Maine", "MD - Maryland",
"MA - Massachusetts", "MI - Michigan", "MN - Minnesota", "MS - Mississippi",
"MO - Missouri", "MT - Montana", "NE - Nebraska", "NV - Nevada", "NH - New Hampshire",
"NJ - New Jersey", "NM - New Mexico", "NY - New York", "NC - North Carolina",
"ND - North Dakota", "OH - Ohio", "OK - Oklahoma", "OR - Oregon", "PA - Pennsylvania",
"RI - Rhode Island", "SC - South Carolina", "SD - South Dakota", "TN - Tennessee",
"TX - Texas", "UT - Utah", "VT - Vermont", "VA - Virgina", "WA - Washington", "WV - West Virginia",
"WI - Wisconsin", "WY - Wyoming"]
state_names = [s.split('-')[1].strip() for s in states]

Step 3: Generate your bingo card

random.shuffle(state_names)
rowlen= 4  # make any size card you'd like

fig = plt.figure()
ax = fig.gca()
ax.set_xticks(np.arange(0, rowlen + 1))
ax.set_yticks(np.arange(0, rowlen + 1))
plt.grid()

for i, word in enumerate(state_names[:rowlen**2]):
    x = (i % rowlen) + 0.4
    y = int(i / rowlen) + 0.5
    ax.annotate(word, xy=(x, y), xytext=(x, y))
    
plt.show()
Python Bingo, FTW!

Grab my full source code here.

Choosing the best coffee

Here’s another post in my quest to recreate many of the charts from Machine Learning Plus’s Top 50 matplotlib visualizations:

The perfect cup of coffee

Back in March, TowardsDataScience.com published an article that analyzed a coffee dataset from the Coffee Quality Institute (sounds like a great place to work!). Since I’m always looking for cool datasets to work with and since I love coffee, I thought this would be a great dataset to pull down and visualize in some fashion.

In the article, the author visualizes median coffee data from several countries around the world in polar charts. The polar charts worked well to get all 11 features on the chart at the same time, but every polar chart–from Ethiopia to the United States–looked the same. It was difficult to see how one country’s coffee differed from another’s. I wonder if there might be a better way to show the subtle variations among each country’s coffee? Enter in another article I talked about previously: Top 50 matplotlib Visualizations. I thought one chart in particular from that article, the Diverging Bars Chart, might do the trick.

Since each country can produce tens of different brands of coffee, I followed the lead of the original article and grabbed the median value from each country. I then applied the Diverging Bars technique to plot how far each country’s coffee varied from the mean.

One thing that puzzles me, though: in several of the categories, Papua New Guinea comes out on top. Yet if you look at the original article, the author lists the median Ethiopian coffee as coming out on top more often than not. What’s the reason for this discrepancy? I’m not really sure. I think I calculated the medians correctly–my Ethiopian values certainly match the author’s. Perhaps I’m working from a newer dataset than he did?

At any rate, I accomplished my main goal of creating some cool diverging bar charts. Enjoy with your favorite cup of java!

Step 1: Load the data

# https://github.com/jldbc/coffee-quality-database
df_coffee = pd.read_csv('./data/arabica_data_cleaned.csv')
df_coffee.head()

Step 2: Code the chart

Since the dataset has multiple features, each of which I’d like to chart, I decided to place my chart-generation code in a function so that I could easily reuse it from feature to feature:

def generate_chart(feature_to_chart, xlabel, title):
    df_chart = df_coffee.groupby('Country.of.Origin').median().loc[:, [feature_to_chart]].reset_index()
    df_chart['z'] = (df_chart[feature_to_chart] - df_chart[feature_to_chart].mean()) / df_chart[feature_to_chart].std()

    df_chart['colors'] = ['red' if x < 0 else 'green' for x in df_chart['z']]
    df_chart.sort_values('z', inplace=True)
    df_chart.reset_index(inplace=True)

    # draw plot
    plt.figure(figsize=(14,10), dpi=80)
    plt.hlines(y=df_chart.index, xmin=0, xmax=df_chart.z, color=df_chart.colors, alpha=0.4, linewidth=5)

    # decorations
    plt.gca().set(ylabel='$Country$', xlabel=xlabel)
    plt.yticks(df_chart.index, df_chart['Country.of.Origin'], fontsize=12)
    plt.title(title, fontdict={'size':20})
    plt.grid(linestyle='--', alpha=0.5)
    plt.show()

Step 3: Generate the chart

Finally, I can call my function and generate the chart:

feature_to_chart = 'Flavor'
xlabel = '${0}$ $Variation$'.format(feature_to_chart)
title = 'Diverging Bars of Median Coffee {0} Rating'.format(feature_to_chart)

generate_chart(feature_to_chart, xlabel, title)
Median Coffee Flavors

Two other interesting charts:

Divergence of the “balance” feature
Divergence of the “acidity” feature

Check out my complete code here and look for more cool charts to come!

Bash in your notebook

#!/bin/bash

At work recently, I had to call an internal REST API for some data I needed to process. To try the API out, I fired up Ubuntu in my Windows Subsystem for Linux and ran a cURL command to try out the interface. That went well, so I created a new Jupyter Notebook in which to call the API–I wanted to take the response data, load them into a pandas dataframe, and create a chart. Easy, right?

So, I called the API with requests and promptly received a SSL “bad handshake” error. Like many others, I struggled to resolve this error. Clearly, the server hosting that API was misconfigured in some way. However, I didn’t own the server and had no real recourse to get the issue fixed, so I decided to call cURL directly from my notebook; this led me to the bash magic command.

With the bash magic command, you can tell Jupyter Notebook to run all the commands in your cell as if you were executing them at a bash command prompt…even if you’re running Jupyter Notebook on a Windows operating system. How cool is that?! Furthermore, with the out argument, you can pipe all your cell output to a variable for easy processing. Check this out:

(Note: I’m using the free lyrics API from lyrics.ovh in my examples below)

%%bash --out lyrics1
curl https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/here%20comes%20the%20sun
The Bash magic command returns output as a String

The Bash magic command returns output as a String. Since the output is really JSON, I can easily convert it to JSON with the loads function:

import json
json_lyrics1 = json.loads(lyrics1)
print(json_lyrics1['lyrics'])
That JSON String easily converts to standard JSON

What if you want to call Bash commands in a loop?

Suppose I wanted to get the lyrics of multiple Beatles songs…now what do I do? Well, here’s one hack I came up with: do in-line calls to the shell.

song_list = ['yesterday', 'yellow%20submarine', 'eleanor%20rigby']
song_lyrics = []

for song in song_list:
    lyric = !wsl curl -s 'https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/{song}'
    song_lyrics.append(lyric)
    
song_lyrics

Here are a few things to note with my shell operation:

  • Since my operating system is Windows 10, I’m actually shelling out to the Windows command shell, not bash. However, since I have WSL installed on my machine, I can use wsl.exe to run commands in that shell. So, I’m basically calling a shell within a shell to ultimately execute my bash operation.
  • With the braces syntax, I can pass the value of my song variable to my shell command.
  • I pass the silent argument (-s) to cURL to suppress the noise cURL would normally send back to Jupyter Notebook. This allows me to pass just the JSON response to my variable lyric.

One challenge with this approach is that the shell command returns a SList. Basically, a list of strings. I should be able to join those lists together, though, and then convert them to JSON with the loads function:

song_list = ['yesterday', 'yellow%20submarine', 'eleanor%20rigby']
song_lyrics = []

for song in song_list:
    lyric = !wsl curl -s 'https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/{song}'
    json_lyrics = json.loads(''.join(lyric))
    song_lyrics.append(json_lyrics)
    
song_lyrics

And now I have a list of JSON objects (or dictionaries) to work with. Awesome!

For more on the bash magic command, check out this excellent article. Go here to get all my example code.

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑