September 2020 – DadOverflow.com

Maybe Mr. Weatherbee should read this post?

I admit that I have what you might call a short fuse and when that powder keg blows, I can let loose with some pretty colorful language. This is certainly not a good example for my family, so I need to do everything I can to change this behavior.

One way I’ve attempted to moderate my vocabulary is to replace some of the more modern expressions of profanity I’m tempted to use with old fashioned phrases–those likely to be more accepted in polite society. So, the next time you might be tempted to shout out something indecent, try using one of these phrases instead:

Ain’t that the berries (a phrase my dad still uses)
By all the saints
Cheese and crackers
Crimeny / Crime-a-nitly
Cripes
Dangnabit / dad-gummit
Dash it all / blast it all
Drats
Fiddlesticks
For all that’s holy
For crying out loud
For Pete’s sake
Fudgesicle
Gee whillikers
Geez / Geez-peez-o / Geez-o-Pete
Good golly / good gracious / good grief (commonly used by Charlie Brown) / good heavens / good lord
Great day in the mornin’
Heaven’s to Betsy / heavens to Murgatroyd (popularized by Snagglepuss)
Hogwallered
Holy moley / holy cow / holy smoke(s)
It went all to whaley
I’ll bread and butter you to pickles
Jeepers (favorite of the Scooby Doo gang) [Jinkies also works]
Jimminy Christmas
Jumpin’ Jehoshaphat
Kiss my grits (one of Flo’s go-tos)
Lord have mercy / mercy sakes / mercy
Not on your old lady’s tintype
Pickle! (an expression I stole from my uncle)
Shazbots (made famous by Mork from Ork)
Suffering succotash (thanks, Sylvester!)
That burns my pancakes
That frosts my cake / That really takes the cake
Well don’t that beat all?
What in the world? / What in (the) Sam Hill?
What the fork (Ok…a more recent phrase from The Good Place)
Why the face? (Ok, ok…another modern phrase compliments of Phil Dunphy)
You burned the beans
You can’t hornswoggle me
You’re as batty as bananas
You’ve got splinters in the windmill of your mind

Some of what I write about here is inspired by challenges I encounter at work. Often the hardest part in describing those challenges is substituting public data and scenarios for my work-specific ones. Sites like kaggle.com and wikipedia.org really come to the rescue.

Recently, I had a circumstance where I need to process a dataframe in which one of the columns contained a list of values for each field. I think I came up with a clever way of dealing with that obstacle and would like to discuss it in these pages, but first…what sort of public data can I collect to replicate my scenario? How about some English football?! Let’s take some some football titles–say, the UEFA Champions League champion, the UEFA Europa League champion, and the UEFA Super Cup champion–and build a dataframe that shows which English football clubs have ever won any of these titles.

Step 1: Collect the raw data for each of these titles

I’ve written a few times before on how to use Python to collect public data from the Internet. This time around, I went with a slightly more manual approach. I went to each of the three Wikipedia pages, hit F12 to bring up my developer tools, and used those tools to copy just the HTML code for the data tables I was interested in–the ones listing each club team and the number of times they won the particular title. I simply copied the HTML to three different data files for subsequent processing.

Step 2: Read in the data files

The pandas read_html function is such a time saver here. Here’s the code I came up with to read in my data files:

import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline  # for jupyter notebook
matplotlib.style.use('seaborn')

df_champions = pd.read_html('./data/uefa_champions.txt')[0]
df_champions = df_champions[df_champions.Titles>0]
df_champions['title'] = 'UEFA Champions League champ'

df_europa = pd.read_html('./data/uefa_europa_league.txt')[0]
df_europa = df_europa[df_europa.Winners>0]
df_europa['title'] = 'UEFA Europa League champ'

df_super = pd.read_html('./data/uefa_super.txt')[0]
df_super = df_super[df_super.Winners>0]
df_super['title'] = 'UEFA Super Cup champ'

The df_champions dataframe looks like this:

The last five records in the df_champions dataframe. Which of these are English teams?

These dataframes are looking good, but how do I know which teams are the English teams? Wikipedia identifies each team’s nationality with a flag icon, but pandas isn’t pulling in that data. Time for a little HTML parsing with BeautifulSoup.

Step 3: Collect the names of the English teams

Since pandas didn’t pull in the nationality information, I had to revisit each of the HTML data files and parse out that information with BeautifulSoup:

epl_teams = []
for filepath in ['./data/uefa_champions.txt', './data/uefa_europa_league.txt', './data/uefa_super.txt']:
    with open(filepath, 'r') as f:
        soup = BeautifulSoup(f)

        for th in soup.findAll('th'):
            span = th.find('span', {'class':'flagicon'})
            if span:
                a = span.find('a', {'title':'England'})
                if a:
                    epl_teams.append(th.text.strip())

Step 4: Filter each title list down to just the English teams

With the names of the English teams, I can filter my dataframes down accordingly:

df_champions = df_champions[df_champions.Club.isin(epl_teams)]
df_europa = df_europa[df_europa.Club.isin(epl_teams)]
df_super = df_super[df_super.Club.isin(epl_teams)]

Step 5: Merge the dataframes together

I need to merge my three dataframes into a single one. The pandas merge function did the trick:

df_combo = df_champions[['Club','title']].merge(df_europa[['Club','title']], on='Club', how='outer')
df_combo = df_combo.merge(df_super[['Club','title']], on='Club', how='outer')

And the results:

Merging dataframes with columns of the same name forces pandas to add suffices to those column names

Step 6: Combine the “title” columns together into one

The final step–to just get this public data into a shape to replicate my original problem–is to merge the three “title” columns into a single one. Two lines do the deed:

df_combo['title'] = df_combo.apply(lambda r: [t for t in [r.title_x, r.title_y, r.title] if t is not np.nan], axis=1)
df_combo = df_combo[['Club', 'title']]

A dataframe with a list of values in the “title” column

Phew. That was a fair amount of work just to pull together some public data to replicate one of my work datasets. And the actual code I wrote to analyze a dataframe containing lists in a column? Well, that will have to wait for a future post.

Month: September 2020

Cursing the old school way

Pulling public data into dataframes

Step 1: Collect the raw data for each of these titles

Step 2: Read in the data files

Step 3: Collect the names of the English teams

Step 4: Filter each title list down to just the English teams

Step 5: Merge the dataframes together

Step 6: Combine the “title” columns together into one

Recent Posts

Recent Comments

Archives

Meta

Cursing the old school way

Pulling public data into dataframes

Step 1: Collect the raw data for each of these titles

Step 2: Read in the data files

Step 3: Collect the names of the English teams

Step 4: Filter each title list down to just the English teams

Step 5: Merge the dataframes together

Step 6: Combine the “title” columns together into one

Recent Posts

Recent Comments

Archives

Tags

Meta