Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Author: Brad (Page 31 of 57)

Dad. Technologist. Fan of English poet of John Lillison.

Slope Charts in Python

I continue to explore the different charts from Machine Learning Plus’s Top 50 matplotlib visualizations post and look for good opportunities to recreate them with data sets I care about. Recently, I thought it might be interesting to create a slope chart where I simply match objects on one side of the chart to objects on the other side, without using the Y axis to convey any meaning. For my data set, I grabbed CollegeChoice.net’s 25 Best Colleges in Ohio. I didn’t dig into how they decide one college is better than another, although they do provide a description of their methodology. What I thought was interesting was that they provide the 4-5 most popular majors at each of the colleges. So, I thought I could create a slope chart where I write the top 10 Ohio Colleges on one side (all 25 would make the chart too cluttered), their most popular majors on the other side, and draw lines in between. How common are these majors among the top 10? My chart should be able to tell that story.

Step 1: Bring in all the packages I’ll need

Since I’m pulling in a parsing a web page for its data, requests and BeautifulSoup are in. numpy and math will help with spacing out the points in my chart and, of course, matplotlib will render the chart:

import requests
from bs4 import BeautifulSoup
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
from matplotlib import cm
import math

Step 2: Grab the page and parse out the data

# grab the page
result = requests.get("https://www.collegechoice.net/rankings/best-colleges-in-ohio/")
soup = BeautifulSoup(result.content, 'lxml')

# parse out just the top 10 schools and their popular majors
ranking_divs = soup.find_all('div', 'ranking-box')
top_10_schools = {}

for ranking_div in ranking_divs[:10]:
    school = ranking_div.select_one('div.rb-list-title h3').text
    majors = [maj.text for maj in ranking_div.select('div.rb-ranking-body ul li')]
    top_10_schools[school] = majors

Step 3: Do some data cleanup

As I scanned the results, I noticed that CollegeChoice used two slightly different names for the same major: Visual and Performing Arts and Visual & Performing Arts. I had to write some code to clean that up:

for school, majors in top_10_schools.items():
    if 'Visual and Performing Arts' in majors:
        top_10_schools[school] = ['Visual & Performing Arts' if maj=='Visual and Performing Arts' else maj for maj in majors]

Step 4: Finally, build the chart

Now, I can build the chart, and that Machine Learning Plus article really helped out. The one difference was that, in their slope chart, they used GDP dollars on their Y axis. My Y axis wouldn’t have any sort of meaning: just a list of colleges. So, I used the numpy and math packages to help me evenly space out my points along the axis. Here’s what I came up with:

# if, say, you have a count of 11 and you want to round up to the nearest 5, this will return 15
def roundupto(your_count, round_up_to_nearest):
    return int(math.ceil(your_count / round_up_to_nearest)) * round_up_to_nearest

# draws a line between points
def newline(p1, p2, color='black'):
    ax = plt.gca()
    l = mlines.Line2D([p1[0], p2[0]], [p1[1], p2[1]], color=color, marker='o', markersize=6)
    ax.add_line(l)
    return l
    

fig, ax = plt.subplots(1, 1, figsize=(14, 14))

# get school and major lists and calculate the scale of the chart
school_list = list(top_10_schools.keys())
school_list.reverse()  # matplotlib will then put the #1 school at the top of the chart
major_list = list(set(sum(top_10_schools.values(), [])))
major_list.sort(); major_list.reverse()  # to help matplotlib list majors alphabetically down the chart
scale = roundupto(max(len(school_list), len(major_list)), 5)

# write the vertical lines
ax.vlines(x=1, ymin=0, ymax=scale, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
ax.vlines(x=3, ymin=0, ymax=scale, color='black', alpha=0.7, linewidth=1, linestyles='dotted')

# plot the points; unlike the slope chart in the MachineLearningPlus.com article, my Y axis has no meaning, so
# I use numpy's linspace function to help me evenly space each point
school_y_vals = np.linspace(1, scale-1, num=len(school_list))
major_y_vals = np.linspace(1, scale-1, num=len(major_list))
ax.scatter(y=school_y_vals, x=np.repeat(1, len(school_list)), s=10, color='black', alpha=0.7)
ax.scatter(y=major_y_vals, x=np.repeat(3, len(major_list)), s=10, color='black', alpha=0.7)

# write the lines and annotation
for school, school_y_val in zip(school_list, school_y_vals):
    ax.text(1-0.05, school_y_val, school, horizontalalignment='right', verticalalignment='center', fontdict={'size':10})
    for major in top_10_schools[school]:
        major_y_val = major_y_vals[major_list.index(major)]
        newline([1, school_y_val], [3, major_y_val], color=cm.get_cmap('tab20')(school_list.index(school)))
        
for major, major_y_val in zip(major_list, major_y_vals):
    ax.text(3+0.05, major_y_val, major, horizontalalignment='left', verticalalignment='center', fontdict={'size':10})
    
# vertical line annotations
ax.text(1-0.05, scale-0.25, 'College', horizontalalignment='right', verticalalignment='center', 
        fontdict={'size':14, 'weight':700})
ax.text(3+0.05, scale-0.25, 'Major', horizontalalignment='left', verticalalignment='center', 
        fontdict={'size':14, 'weight':700})

# misc cleanup
ax.set(xlim=(0, 4), ylim=(0, scale))
ax.axis('off')

plt.title("Most Popular Majors at Ohio's Top 10 Colleges")
plt.show()

And the result is at the top of this post. Get my full code here.

Conclusions?

Well, I did note that over half of the popular majors are popular at only one of the Top 10 schools. I expected to see many of the same majors appear repeatedly across multiple schools. I guess maybe that’s a good thing: if, say, you want to study Finance, it would seem The Ohio State University and only The Ohio State University is the best place to study the discipline.

More importantly, the slope chart is now another cool visual I (and you) can add to your tool box. Happy sloping!

Graduation Songs

Now that the hustle of the graduation season is dying down, I’m trying to write up a few notes that might save the rest of you some time planning around such momentous times in your family. One note is on different songs I’ve chosen in the past to help celebrate my kids’ academic accomplishments. Here is my list of favorite graduation songs, but unlike a lot of the other lists out there, I’m going 80s and 90s old school with this one:

School’s Out – Alice Cooper

I remember singing this song with my friends on the bus as it carted me home on many last days of school. Given all the security tensions around schools these days, though, you may be better off skipping this one or relegating it to background music.

Time After Time – Cyndi Lauper

A big song from my youth, although I much preferred her Goonies song over this one.

Forever – Kenny Loggins

Sure, Forever seems to be primarily a love song, but with lines like, “even when I’m gone, you’ll be here with me,” you can’t help noting that, after years of forging friendships, your graduate will be leaving those friends for new experiences and likely only taking fond memories with him.

Don’t You Forget About Me – Simple Minds

It’s interesting how my kids have picked up on certain notables of my youth including the movie The Breakfast Club. The movie and soundtrack were big back in the day and this song has certainly held up well.

The Future’s So Bright, I Got to Wear Shades – Timbuk 3

I suspect this song’s a subtly cynical observation on the future of humanity, but on it’s surface, it’s a fun song about reaping the benefits of your hard work.

Friends – Michael W. Smith

Holding with the theme that even though your new phase in life might have you moving far away from your friends, you’ll always take your friends with you in your heart. A big hit in Christian youth groups everywhere back in the day.

The Time of My Life – Bill Medley and Jennifer Warnes

Part of the soundtrack of the smash movie Dirty Dancing, The Time of My Life, like a true graduation song, stops to appreciate that these last few moments–days, months, years?–have been fantastic experiences for the singer and it’s all because of the person(s) that he surrounded himself with.

I Will Remember You – Amy Grant

I find the opening verse of this song particularly touching:

I will be walking one day
Down a street far away
And see a face in the crowd and smile
Knowing how you made me laugh
Hearing sweet echoes of you from the past
I will remember you.

These Are The Days – 10,000 Maniacs

Underneath, this song seems to celebrate the process of youth maturing in their sexuality. On the surface, it’s just a fun song asking its listeners to appreciate the present moment.

I Will Remember You – Sarah McLachlan

Two years after Amy Grant’s I Will Remember You, Sarah McLachlan came out with another song of the same title hitting on that same theme of friends who find their lives diverging from one another.

Good Riddance – Green Day

Taking a substantial departure from their punk inclinations, Green Day came out with this acoustic ballad in 1997 to help graduates–and likely all others embarking on major life changes–take stock and appreciate the good events in their pasts as they head out on new and separate paths.

Graduation – Vitamin C

This is probably the quintessential graduation song and should definitely be in your playlist for such events. It even borrows from Pachelbel’s Canon, a piece commonly played at graduations today.

Ten things I like to do in Jupyter Markdown

One of the great things about Jupyter Notebook is how you can intersperse your code blocks with markdown blocks that you can use to add comments or simply more context around your code. Here are ten ways I like to use markdown in my Jupyter Notebooks.

1. Use hashes for easy titles

In your markdown cell, enter a line like this:

# This becomes a H1 header/title

That line will render as a header (h1 element).

2. Use asterisks and hyphens for bullet points

Try these lines in your markdown cell:

* this is one way to do a bullet point
- this is another way to do a bullet point

Both render as bullet point lists.

3. Use asterisks and underscores for emphasis

Next, try this:

*these words become italicized*
__these words become bold__
Wait…that didn’t render quite as expected

The phrase I wanted to italicize italicized and the phrase I wanted to bold went bold, but both phrases rendered on the same line. What gives? I’ve noticed that some markdown behaves like this, but here’s a simple solution: add a <br> (HTML for line break) at the end of each line where you want a line break. So, write this in your markdown cell:

*these words become italicized*<br>
__these words become bold__

4. Center my headers with some HTML

Instead of using the hashtag shortcut, code your header elements directly and style them to center:

<h1 style="text-align: center">This header is centered</h1>

Interestingly, I’ve noticed that my centering works in Jupyter Notebook, but not in Jupyter Lab.

5. Create thick dividing lines with HTML

My notebooks that do a lot of exploratory data analysis before jumping into data modeling can get quite lengthy. I find that a nice, thick dividing line between sections can be a great visual indicator of the changing focus of my notebook. In a markdown cell, give this a try:

<hr style="border-top: 5px solid purple; margin-top: 1px; margin-bottom: 1px"></hr>

6. Write mathematical formulas

I’m more coder than math guy, but a formula or two can sometimes be helpful explaining your solution to a problem. Jupyter markdown cells support LaTeX, so give this a whirl:

linear regression: $y = ax + b$
two dimensions: $y = a_{1}x_{1} + a_{2}x_{2} + b$
Ridge Regression: standard OLS loss function + $\alpha \times \sum_{i=1}^{n} a^{2}_i$

7. Create hyperlinks

Hyperlinks are easy in markdown:

[Google](https://google.com)

8. Drop in images with HTML

A picture is worth a thousand words:

<img src="mind_blown.gif" style="max-width:50%; max-height:50%"></img>

9. Create nice tables

Use pipes and dashes to create a table in your markdown:

|| sepal length (cm) | sepal width (cm) |
|----|----|----|
|0|5.1|3.5|
|1|4.9|3.0|

10. Escape text with three tick marks

Occasionally, I’ll want to show a code snippet in my markdown or other kind of escaped text. You can do that by surrounding your snippet with three back-tick characters:

```
sample code goes here
```

Bonus: change the background color of your markdown cells

It never occurred to me until recently, but Notebooks bring with them a variety of style classes that you can leverage in your own markdown. Here are four examples (note: this is yet another markdown trick that works in Jupyter Notebook, but not in Jupyter Lab…at least the version I’m presently running):

<div class="alert alert-block alert-info">
This is a blue background
</div>
<div class="alert alert-block alert-warning">
This is a yellow background
</div>
<div class="alert alert-block alert-success">
This is a green background
</div>
<div class="alert alert-block alert-danger">
This is a red background
</div>

For all of this code, check out my notebook here. Also, here are two other great posts on more markdown tips and tricks.

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑