Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: python (Page 17 of 26)

Slope Charts in Python

I continue to explore the different charts from Machine Learning Plus’s Top 50 matplotlib visualizations post and look for good opportunities to recreate them with data sets I care about. Recently, I thought it might be interesting to create a slope chart where I simply match objects on one side of the chart to objects on the other side, without using the Y axis to convey any meaning. For my data set, I grabbed CollegeChoice.net’s 25 Best Colleges in Ohio. I didn’t dig into how they decide one college is better than another, although they do provide a description of their methodology. What I thought was interesting was that they provide the 4-5 most popular majors at each of the colleges. So, I thought I could create a slope chart where I write the top 10 Ohio Colleges on one side (all 25 would make the chart too cluttered), their most popular majors on the other side, and draw lines in between. How common are these majors among the top 10? My chart should be able to tell that story.

Step 1: Bring in all the packages I’ll need

Since I’m pulling in a parsing a web page for its data, requests and BeautifulSoup are in. numpy and math will help with spacing out the points in my chart and, of course, matplotlib will render the chart:

import requests
from bs4 import BeautifulSoup
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib import cm
import math

Step 2: Grab the page and parse out the data

# grab the page
result = requests.get("https://www.collegechoice.net/rankings/best-colleges-in-ohio/")
soup = BeautifulSoup(result.content, 'lxml')

# parse out just the top 10 schools and their popular majors
ranking_divs = soup.find_all('div', 'ranking-box')
top_10_schools = {}

for ranking_div in ranking_divs[:10]:
    school = ranking_div.select_one('div.rb-list-title h3').text
    majors = [maj.text for maj in ranking_div.select('div.rb-ranking-body ul li')]
    top_10_schools[school] = majors

Step 3: Do some data cleanup

As I scanned the results, I noticed that CollegeChoice used two slightly different names for the same major: Visual and Performing Arts and Visual & Performing Arts. I had to write some code to clean that up:

for school, majors in top_10_schools.items():
    if 'Visual and Performing Arts' in majors:
        top_10_schools[school] = ['Visual & Performing Arts' if maj=='Visual and Performing Arts' else maj for maj in majors]

Step 4: Finally, build the chart

Now, I can build the chart, and that Machine Learning Plus article really helped out. The one difference was that, in their slope chart, they used GDP dollars on their Y axis. My Y axis wouldn’t have any sort of meaning: just a list of colleges. So, I used the numpy and math packages to help me evenly space out my points along the axis. Here’s what I came up with:

# if, say, you have a count of 11 and you want to round up to the nearest 5, this will return 15
def roundupto(your_count, round_up_to_nearest):
    return int(math.ceil(your_count / round_up_to_nearest)) * round_up_to_nearest

# draws a line between points
def newline(p1, p2, color='black'):
    ax = plt.gca()
    l = mlines.Line2D([p1[0], p2[0]], [p1[1], p2[1]], color=color, marker='o', markersize=6)
    ax.add_line(l)
    return l
    

fig, ax = plt.subplots(1, 1, figsize=(14, 14))

# get school and major lists and calculate the scale of the chart
school_list = list(top_10_schools.keys())
school_list.reverse()  # matplotlib will then put the #1 school at the top of the chart
major_list = list(set(sum(top_10_schools.values(), [])))
major_list.sort(); major_list.reverse()  # to help matplotlib list majors alphabetically down the chart
scale = roundupto(max(len(school_list), len(major_list)), 5)

# write the vertical lines
ax.vlines(x=1, ymin=0, ymax=scale, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
ax.vlines(x=3, ymin=0, ymax=scale, color='black', alpha=0.7, linewidth=1, linestyles='dotted')

# plot the points; unlike the slope chart in the MachineLearningPlus.com article, my Y axis has no meaning, so
# I use numpy's linspace function to help me evenly space each point
school_y_vals = np.linspace(1, scale-1, num=len(school_list))
major_y_vals = np.linspace(1, scale-1, num=len(major_list))
ax.scatter(y=school_y_vals, x=np.repeat(1, len(school_list)), s=10, color='black', alpha=0.7)
ax.scatter(y=major_y_vals, x=np.repeat(3, len(major_list)), s=10, color='black', alpha=0.7)

# write the lines and annotation
for school, school_y_val in zip(school_list, school_y_vals):
    ax.text(1-0.05, school_y_val, school, horizontalalignment='right', verticalalignment='center', fontdict={'size':10})
    for major in top_10_schools[school]:
        major_y_val = major_y_vals[major_list.index(major)]
        newline([1, school_y_val], [3, major_y_val], color=cm.get_cmap('tab20')(school_list.index(school)))
        
for major, major_y_val in zip(major_list, major_y_vals):
    ax.text(3+0.05, major_y_val, major, horizontalalignment='left', verticalalignment='center', fontdict={'size':10})
    
# vertical line annotations
ax.text(1-0.05, scale-0.25, 'College', horizontalalignment='right', verticalalignment='center', 
        fontdict={'size':14, 'weight':700})
ax.text(3+0.05, scale-0.25, 'Major', horizontalalignment='left', verticalalignment='center', 
        fontdict={'size':14, 'weight':700})

# misc cleanup
ax.set(xlim=(0, 4), ylim=(0, scale))
ax.axis('off')

plt.title("Most Popular Majors at Ohio's Top 10 Colleges")
plt.show()

And the result is at the top of this post. Get my full code here.

Conclusions?

Well, I did note that over half of the popular majors are popular at only one of the Top 10 schools. I expected to see many of the same majors appear repeatedly across multiple schools. I guess maybe that’s a good thing: if, say, you want to study Finance, it would seem The Ohio State University and only The Ohio State University is the best place to study the discipline.

More importantly, the slope chart is now another cool visual I (and you) can add to your tool box. Happy sloping!

Graduation Party media

My eldest graduated high school this year and, to celebrate, the wife and I threw her a graduation party. I’ll spare you the to-dos and checklists for throwing a graduation party–I left those largely to my wife as it is. Instead, I’d like to focus this post on the media aspects of the party.

I’m a big proponent of taking pictures and videotaping those magic moments–and even mundane moments–throughout the lives of your children and family. So, I figured this was the perfect time to pull out those embarrassing pictures and video from the past so that all her friends could see.

For the party, we rented an outdoor facility at a local park. We would have access to electricity and I decided to bring two large flat screen monitors to display the media, but I did not want to bring along expensive laptops or desktops to play the media as I didn’t want to risk damaging that equipment. So, what to do? I know: run my media on cheap Raspberry Pis!

The Slideshow

For my picture slideshow, I pulled out about 700 pictures I had of my child from over the years. However I was going to run the slideshow, I knew I wanted to show the pictures in chronological order. So, I came up with this quick PowerShell script to rename the pictures in increasing numbers so that sorting the pictures alphabetically would effectively sort them chronologically:

$pics = gci "C:\grad_party\data\*" | sort {$_.LastWriteTime}

$count = 0
foreach($pic in $pics){
    $count = $count + 5
    mv $pic.FullName ("C:\grad_party\data\{0}{1}" -f ($count), $pic.Extension)
}

(I increased the count by 5 with each loop so that I could easily, manually reorder pictures if needed.)

Now the big question was, on a Raspberry Pi, what are my options to run a snazzy slideshow?

feh

feh is a linux utility for displaying images and its relatively easy to get it started on a Raspberry Pi. The challenge I encountered was getting it to do anything else other than hard transitions from slide to slide. Ideally, I want a slideshow where the slides are in motion and have nice cross-fade transitions: something like this. I don’t know if feh has those capabilities, but I certainly couldn’t get it to inject those effects, so I continued my search.

Python and Pi3D

A solution using Python?! Tell me more! I happened upon a post from TheDigitalPictureframe.com that used a home-grown Python script leveraging a package called Pi3D. The demo video looked close to the solution I was pursuing, so I gave it a try. Unfortunately, despite all the parameter tweaking I tried, I just couldn’t get the final product I wanted. Nevertheless, I would like to circle back someday and do some more experimentation with this approach.

Make the slideshow myself

In the end, I simply used PowerDirector–the software I use to make my annual family movies–to make a video of my slideshow. Then, I just dropped the video on the Raspberry Pi and played it with the built in VLC Player. I even set VLC to continuously loop the video and set the player to full screen to take up the entire monitor. It worked out pretty well and even garnered a few compliments.

The video montage

On a second flat screen monitor and second Raspberry Pi, I played a video montage of nearly 18 years of video highlights of my child’s life. As I’ve been making annual family movies for most of my daughter’s life, it was pretty easy collecting a few minutes of video of her from each year. Here again, I turned to PowerDirector for my solution. I spliced together the video from over the years separating each year with a “year” title and then created a single mp4 video that I copied to my Raspberry Pi and played with VLC player. Like the slideshow, it turned out pretty well.

Background music

“Dad, your music is too old!”

the kid

For the party background music, I pulled out my reliable portable speaker. Initially, my plan was to copy a bunch of my mp3 files to a third Raspberry Pi, play those files with VLC player, and run the audio out into my speaker. Then, the kid complained that my music was too old.

As an Amazon Prime member, I get access to lots of free music. So, I downloaded a few “modern pop” playlists to my phone. My speaker is bluetooth enabled, so I paired my phone to my speaker and then just played those playlists from my phone through the speaker. That worked out pretty well and I was surprised that both my speaker and phone batteries outlasted the three hour party.

So, there are your media suggestions for your child’s graduation party:

  1. Create a slick slideshow “video” and run it from a Raspberry Pi into a large flat screen monitor,
  2. Create a video montage “video” and do the same thing on a second monitor and Pi, and
  3. Download some music for offline play to your phone and bluetooth your phone to a nice, portable speaker.

The slideshow and video all presume that you’re actively photographing and videoing all your family activities. I highly recommend you do that: family vacations, basketball games, and banquets might feel mundane now but they’ll be gold at your child’s graduation party and beyond.

Python bingo

Have a road trip planned this summer? Want to keep the kids from driving you crazy as you drive to Walley World? How about playing the ol’ standard License Plate game but with a twist: License Plate Bingo!

Run this code a couple of times and print out the chart on separate pieces of paper. There are your bingo cards. Give one to each kid and/or adult. Now, hit the road! If you see a license plate from, say, Texas, and you have a “Texas” square, mark it off on your bingo card. If you can mark off a row horizontally, vertically, or diagonally, you win!

Step 1: Import your packages

import matplotlib.pyplot as plt
import matplotlib.style as style
import numpy as np
import random

%matplotlib inline
style.use('seaborn-poster')

Step 2: Get your State names

# compliments of this forum: https://gist.github.com/JeffPaine/3083347
states = ["AL - Alabama", "AK - Alaska", "AZ - Arizona", "AR - Arkansas", "CA - California", "CO - Colorado",
"CT - Connecticut", "DC - Washington DC", "DE - Deleware", "FL - Florida", "GA - Georgia",
"HI - Hawaii", "ID - Idaho", "IL - Illinios", "IN - Indiana", "IA - Iowa",
"KS - Kansas", "KY - Kentucky", "LA - Louisiana", "ME - Maine", "MD - Maryland",
"MA - Massachusetts", "MI - Michigan", "MN - Minnesota", "MS - Mississippi",
"MO - Missouri", "MT - Montana", "NE - Nebraska", "NV - Nevada", "NH - New Hampshire",
"NJ - New Jersey", "NM - New Mexico", "NY - New York", "NC - North Carolina",
"ND - North Dakota", "OH - Ohio", "OK - Oklahoma", "OR - Oregon", "PA - Pennsylvania",
"RI - Rhode Island", "SC - South Carolina", "SD - South Dakota", "TN - Tennessee",
"TX - Texas", "UT - Utah", "VT - Vermont", "VA - Virgina", "WA - Washington", "WV - West Virginia",
"WI - Wisconsin", "WY - Wyoming"]
state_names = [s.split('-')[1].strip() for s in states]

Step 3: Generate your bingo card

random.shuffle(state_names)
rowlen= 4  # make any size card you'd like

fig = plt.figure()
ax = fig.gca()
ax.set_xticks(np.arange(0, rowlen + 1))
ax.set_yticks(np.arange(0, rowlen + 1))
plt.grid()

for i, word in enumerate(state_names[:rowlen**2]):
    x = (i % rowlen) + 0.4
    y = int(i / rowlen) + 0.5
    ax.annotate(word, xy=(x, y), xytext=(x, y))
    
plt.show()
Python Bingo, FTW!

Grab my full source code here.

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑