Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: jupyter_notebook (Page 3 of 17)

Making timelines with Python

For many years, as I’ve trolled the intertubes, I would occasionally run across creative résumés that would include timelines depicting different work events in the lives of those professionals. As I would happen upon these graphics, I would think to myself, “self, I’m no artist: is there a way to programmatically generate such timelines?”

Well, thanks to this recent article, here’s a neat way to use Python and matplotlib to just that.

Step 1: Do your imports

import matplotlib.pyplot as plt
from datetime import date
import numpy as np

%matplotlib inline  # since I'm doing this work in a Jupyter Notebook

Step 2: Get your timeline data together

For simplicity, I’m just hard coding my dates and event labels in two different lists, but you could easily pull together data from a dataframe or other object. I’m also calculating a “minimum date” (min_date) where I get the earliest date from my dataset and subtract two years and a “maximum date” (max_date) where I get the newest date and add two years. I’m subtracting and adding years just to get some padding in my graphic. I’ll use these variables later on. (Note that I do use “\n” in my labels to wrap long text to a second line.)

# reference: https://mentalitch.com/key-events-in-rock-and-roll-history/
dates = [date(1954, 7, 19), date(1959, 2, 3), date(1964, 2, 9), date(1965, 7, 25), date(1967, 6, 1), date(1969, 8, 15)]
min_date = date(np.min(dates).year - 2, np.min(dates).month, np.min(dates).day)
max_date = date(np.max(dates).year + 2, np.max(dates).month, np.max(dates).day)

labels = ['Elvis appears on\nthe Ed Sullivan Show', 'Buddy Holly dies', 'The Beatles appear\non the Ed Sullivan Show', 
          'Bob Dylan goes electric', 'The Beatles release\nSgt. Pepper', 'Woodstock']
# labels with associated dates
labels = ['{0:%d %b %Y}:\n{1}'.format(d, l) for l, d in zip (labels, dates)]

Step 3: Set up my timeline and points

This is where it starts to get cool: I knew matplotlib had a horizontal line function, but it never occurred to me that I could use it as a timeline. Likewise, it never occurred to me to use the library’s scatter plot function to paint dots on a timeline.

fig, ax = plt.subplots(figsize=(15, 4), constrained_layout=True)
_ = ax.set_ylim(-2, 1.75)
_ = ax.set_xlim(min_date, max_date)
_ = ax.axhline(0, xmin=0.05, xmax=0.95, c='deeppink', zorder=1)

_ = ax.scatter(dates, np.zeros(len(dates)), s=120, c='palevioletred', zorder=2)
_ = ax.scatter(dates, np.zeros(len(dates)), s=30, c='darkmagenta', zorder=3)

Step 4: Add my labels

Next, I can use the text function to add my event labels to the timeline. I did have to play around with my y-axis offsets for my labels to be nicely positioned above and below the timeline. I used Python list slicing to position labels with an even index above the line and labels with an odd index below.

label_offsets = np.zeros(len(dates))
label_offsets[::2] = 0.35
label_offsets[1::2] = -0.7
for i, (l, d) in enumerate(zip(labels, dates)):
    _ = ax.text(d, label_offsets[i], l, ha='center', fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)

Step 5: Add lollipops

What a clever way to use matplotlib’s stem plot function! Here, we can create stems to link our labels to their associated dots on the timeline.

stems = np.zeros(len(dates))
stems[::2] = 0.3
stems[1::2] = -0.3    
markerline, stemline, baseline = ax.stem(dates, stems, use_line_collection=True)
_ = plt.setp(markerline, marker=',', color='darkmagenta')
_ = plt.setp(stemline, color='darkmagenta')

Step 6: Sundry cleanup and titling

# hide lines around chart
for spine in ["left", "top", "right", "bottom"]:
    _ = ax.spines[spine].set_visible(False)

# hide tick labels
_ = ax.set_xticks([])
_ = ax.set_yticks([])

_ = ax.set_title('Important Milestones in Rock and Roll', fontweight="bold", fontfamily='serif', fontsize=16, 
                 color='royalblue')

And now, we have a pretty cool timeline:

Timeline in Matplotlib

This chart is using the default matplotlib style. I did try using other styles like XKCD, as the author highlighted in the article, but my chart just didn’t look very good. Your mileage may vary.

But, wait…there’s more!

What if I want to do a vertical timeline instead? Well, you can do that, as well, with some adjustments.

Additional import

To help better center my event labels, I’ll import the timedelta function:

from datetime import timedelta

Use the axvline function

For my vertical timeline, I’ll use the axvline function. I’ve also made a few other code adjustments you can see:

fig, ax = plt.subplots(figsize=(6, 10), constrained_layout=True)
_ = ax.set_xlim(-20, 20)
_ = ax.set_ylim(min_date, max_date)
_ = ax.axvline(0, ymin=0.05, ymax=0.95, c='deeppink', zorder=1)

_ = ax.scatter(np.zeros(len(dates)), dates, s=120, c='palevioletred', zorder=2)
_ = ax.scatter(np.zeros(len(dates)), dates, s=30, c='darkmagenta', zorder=3)

Adjust the dates used to position the event labels

Without the timedelta adjustment, the label positioning still doesn’t look too bad, but subtracting about 90 days from each date helps sort-of vertically center the labels:

label_offsets = np.repeat(2.0, len(dates))
label_offsets[1::2] = -2.0
for i, (l, d) in enumerate(zip(labels, dates)):
    d = d - timedelta(days=90)
    align = 'right'
    if i % 2 == 0:
        align = 'left'
    _ = ax.text(label_offsets[i], d, l, ha=align, fontfamily='serif', fontweight='bold', color='royalblue',fontsize=12)

There doesn’t seem to be a stem function for horizontal lines

The documentation says you should be able to orient stem lines horizontally, but I never got that to work, so I opted to go with the hlines function, instead.

stems = np.repeat(2.0, len(dates))
stems[1::2] *= -1.0    
x = ax.hlines(dates, 0, stems, color='darkmagenta')

Apply the same sundry cleanup

# hide lines around chart
for spine in ["left", "top", "right", "bottom"]:
    _ = ax.spines[spine].set_visible(False)

# hide tick labels
_ = ax.set_xticks([])
_ = ax.set_yticks([])

_ = ax.set_title('Important Milestones in Rock and Roll', fontweight="bold", fontfamily='serif', fontsize=16, 
                 color='royalblue')

And now you have a vertical timeline!

Learning Guitar with Python, Part 3

Past posts in this sorta-series: Part 1 and Part 2.

Pentatonic scales are quite popular among guitar players. If you’re rusty on your Greek, “Penta” means “five”. Thus, these scales only have five notes instead of seven–the fourth and seventh notes are dropped from their modal counterparts.

The major and minor pentatonic scales that I learned are truncated versions of the Ionian and Aeolian modal scales, respectively. My guitar teacher also showed me a longer form of both the major and minor pentatonic scales: scales that traverse more frets than the normal four-fret span. Furthermore, the long-form minor scale includes a feature called “The House,” where the notes on the fretboard resemble a house:

Bad rendering of how notes in the minor pentatonic scale resemble a house

“The House” is, apparently, a popular shape used by lots of famous guitar players.

To help me learn these scales, as I have with previous scales, I coded them up in Python in a Jupyter notebook and displayed them in HTML. Here’s the code I wrote:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display_html

%matplotlib inline

major_pentatonic = {'Low E':['','1','','2'], 'A':['3','','','5'], 'D':['6','','','1'], 'G':['2','','3',''], 
                    'B':['','5','','6'], 'High E':['','1','','2']}
minor_pentatonic = {'Low E':['6','','','1'], 'A':['2','','3',''], 'D':['5','','6',''], 'G':['1','','2',''], 
                    'B':['3','','','5'], 'High E':['6','','','1']}
major_pent_long = {'Low E':['1','','2','','3','','',''], 'A':['','','5','','6','','',''], 
                   'D':['','','1','','2','','3',''], 'G':['','','','','5','','6',''], 
                   'B':['','','','','','1','','2'], 'High E':['','','','','3','','','5']}
minor_pent_long = {'Low E':['5','','6','','','','',''], 'A':['1','','2','','3','','',''], 
                   'D':['','','5','','6','','',''], 'G':['','','1','','2','','3',''], 
                   'B':['','','','','','5','','6'], 'High E':['','','','','','1','','2']}

pent_scales = {'Major Pentatonic':major_pentatonic, 'Minor Pentatonic':minor_pentatonic, 
               'Major Pentatonic (Long)':major_pent_long, 'Minor Pentatonic (The House)': minor_pent_long}
pent_html = '<h3>Pentatonic Scale Shapes</h3>'

for i, scale_name in enumerate(pent_scales):
    nbr_of_frets = len(pent_scales[scale_name]['Low E'])
    df_scale = pd.DataFrame(pent_scales[scale_name], index=np.arange(1, nbr_of_frets+1))
    # https://stackoverflow.com/a/50899244
    df_scale_styler = df_scale.style.set_table_attributes("style='display:inline'").set_caption(scale_name)
    pent_html += df_scale_styler._repr_html_() + '&nbsp;&nbsp;&nbsp;'
    if (i+1) % 2 == 0:
        pent_html += '<p></p>'

display_html(pent_html, raw=True)

And that code rendered this graphic:

Pentatonic scales rendered in a Jupyter notebook

So now I can show my notebook in a computer monitor and play along to the scale.

More to come!

Parsing Oddly Formatted Spreadsheets

Python and pandas works well with conventionally formatted spreadsheets like this:

Conventional spreadsheet easily parsed in Python

But how do you deal with spreadsheets formatted in unconventional ways, like this?

Can you use pandas to parse an oddly formatted spreadsheet?

Here’s my approach to massaging this data into a dataframe I can work with.

Step 1: Go ahead and read in the wonky spreadsheet

Go ahead and read in the spreadsheet, warts and all, into a dataframe. I went ahead and skipped rows 0 and 1 as they were unnecessary:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')

As you’d expect, the results are not very pretty:

Step 2: Figure out where each record starts and stops

Looking at the spreadsheet, I determined that each record starts with a field named “First Name:” and ends with a field named “State:”. If I can put together a list of row indexes that lets me know where each record begins and ends, I should be able to iterate through that list and reformat each record uniformly. Pandas filtering can help with that. To get a list of each “start” row, I can use this code:

df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist()

To get a list of each “end” row, I can do this:

df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()

Finally, I can use Python’s handy zip function to glue both together in a list of tuples that I can easily loop through:

for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    # loop through each record

Step 3: Collect all key/value pairs per record

Now that I’m able to iterate over each record, I need to be able to capture each key/value pair in each record: each person’s first name, middle name (if available), last name, etc. I can use Python’s range function to loop from the starting row to the ending row of the record and pandas iloc function to zero in on each key and associated value:

person = {}  # I need some place to store the keys/values, so let's use a dictionary
for i in range(start_row, end_row+1):
    k = df_raw.iloc[i, 1]  # the keys are in column 1
    v = df_raw.iloc[i, 2]  # the values are in column 2

Each record has an empty row in the middle of it, separating “name” properties from “address” properties. I don’t need those empty rows, so I do a quick check before writing the keys and values to my dictionary object:

if len(k.strip()) > 0:
    person[k.strip().replace(':', '')] = v

Of course, I need to be writing each of these person objects to a master list, so I do that by appending each object:

people_list.append(person)

Step 4: Create a new dataframe from the people list

Finally, I can take that clean list of dictionaries and generate a new dataframe from it:

df_clean = pd.DataFrame(people_list)

Which renders a nice dataframe from which I can start my analysis:

That’s a little more like it!

So, putting it all together, my full code looks like this:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')
people_list = []
for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    person = {}
    for i in range(start_row, end_row+1):
        k = df_raw.iloc[i, 1]
        v = df_raw.iloc[i, 2]
        
        if len(k.strip()) > 0:
            person[k.strip().replace(':', '')] = v
            
    people_list.append(person)
    
df_clean = pd.DataFrame(people_list)

So, should you encounter similarly unconventionally formatted spreadsheets in the future, hopefully this code will help you find a solution to deal with them!

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑