Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: jupyter_notebook (Page 5 of 17)

Learning Guitar with Python, Part 1

After many years of just messing around, I’ve started formal guitar lessons this year. A lot of my instruction includes learning the notes on the fret board, the different keys of music, scales, some basic music theory, and so forth. I’ve taken a lot of hand written notes during my instructional sessions and recently started transcribing a lot of those digitally. It occurred to me that Jupyter Notebook and Python might be a fantastic way to depict some of the concepts I’m learning. So, here is Part 1 of some of my guitar notes with the help of Jupyter Notebook and Python.

The 12 Keys

I won’t take the time to explain the notes and basic pattern in music as that information can be found all over the internet. The first idea I wanted to construct was a grid of the 12 keys and the notes within each key. My instructor and I have also talked a lot about the relative minor in each major key, so I wanted my graphic to convey that point, too. I put together this code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# make up my list of notes
chromatic_scale_ascending = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
# since I usually start on the low E string, rearrange the notes starting on E
scale_from_e = (chromatic_scale_ascending + chromatic_scale_ascending)[4:16]

# the scale pattern:
# root, whole step, whole step, half step, whole step, whole step, whole step, half step
key_steps = [2, 2, 1, 2, 2, 2]  # on the guitar, a whole step is two frets
major_keys = []
for root in scale_from_e:
    three_octaves = scale_from_e * 3
    steps_from_root = three_octaves.index(root)
    major_scale = [root]
    # construct the unique notes in the scale
    for step in key_steps:
        steps_from_root += step
        major_scale.append(three_octaves[steps_from_root])
        
    # span the scale across 3 octaves
    major_keys.append(major_scale * 2 + [root])
    
df_major_keys = pd.DataFrame(major_keys)
df_major_keys.columns = df_major_keys.columns + 1  # start counting notes at 1 instead of 0

# use this function to highlight the relative minor scales in orange
def highlight_natural_minor(data):
    df = data.copy()
    df.iloc[:,:] = 'font-size:20px;height:30px'
    df.iloc[:,5:13] = 'background-color: lightgray; font-size:20px'
    return df

print('The 12 keys and the notes within them:')
df_major_keys.style.apply(highlight_natural_minor, axis=None)

Which produced this handy graphic:

The 12 keys and their notes

For simplicity, I used sharps in my keys instead of flats. The highlighted part of the table marks the relative minor portion of the major key.

The notes of the fret board

Probably one of the best ways to learn the notes on your guitar’s fret board is to trace out the fret board on a blank piece of paper and start filling in each note by hand. Do that a few hundred times and you’ll probably start remembering the notes. Being lazy, though, I wanted to have my computer do that work for me. Here’s the code I came up with to write out the fret board:

standard_tuned_strings = ['E', 'A', 'D', 'G', 'B', 'E']
chromatic_scale_ascending = ['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B']
col_names = ['Low E', 'A', 'D', 'G', 'B', 'High E']
fretboard_notes = []

for string in standard_tuned_strings:
    start_pos = chromatic_scale_ascending.index(string)
    fretboard_notes.append((chromatic_scale_ascending + chromatic_scale_ascending)[start_pos+1:start_pos+13])

df_fretboard = pd.DataFrame(np.array(fretboard_notes).T, index=np.arange(1, 13), columns=col_names)
df_fretboard.index.name = 'fret'

def highlight_select_frets(data):
    fret_markers = [2, 4, 6, 8, 11]
    df = data.copy()
    df.iloc[:,:] = 'font-size:20px'
    df.iloc[fret_markers,:] = 'background-color: lightgray; font-size:20px'
    return df

df_fretboard.style.apply(highlight_select_frets, axis=None)
Notes on the guitar fret board (1st through 12th fret, standard tuning)

More notes to come, so stay tuned! (Puns intended)

Thinning out your tick labels

Have you ever rendered a chart with Pandas and/or Matplotlib where one or both of your axes (axises?) rendered as a smear of overlapping, unreadable black text?

https://youtu.be/d9kuDizrBPc?t=70
If you can read this, you don’t need glasses

As an example, let’s create a bar chart of COVID-19 data. [As an aside: I’ve noticed that line charts seem to automatically thin out any overlapping tick labels and tend not to fall prey to this problem.]

Load and clean up the data

After downloading the CSV data, I wrote the following code to load the data and prepare it for visualization:

df_covid_confirmed_us = pd.read_csv('./data/time_series_covid19_confirmed_US_20200720.csv')
df_covid_deaths_us = pd.read_csv('./data/time_series_covid19_deaths_US_20200720.csv')

cols_to_keep1 = [i for i, v in enumerate(df_covid_confirmed_us.columns) if v in ['Admin2', 'Province_State'] or v.endswith('20')]
cols_to_keep2 = [i for i, v in enumerate(df_covid_deaths_us.columns) if v in ['Admin2', 'Province_State'] or v.endswith('20')]
df_covid_confirmed_ohio = df_covid_confirmed_us[df_covid_confirmed_us.Province_State=='Ohio'].iloc[:,cols_to_keep1].copy()
df_covid_deaths_ohio = df_covid_deaths_us[df_covid_deaths_us.Province_State=='Ohio'].iloc[:,cols_to_keep2].copy()

df_covid_confirmed_ohio.head()

Tidy up the dataframes

The data is still a bit untidy, so I wrote this additional code to transform it into a more proper format:

date_cols = df_covid_confirmed_ohio.columns.tolist()[2:]
rename_cols_confirmed = {'variable': 'obs_date', 'value': 'confirmed_cases'}
rename_cols_deaths = {'variable': 'obs_date', 'value': 'deaths'}

df_covid_confirmed_ohio = pd.melt(df_covid_confirmed_ohio.reset_index(), id_vars=['Admin2', 'Province_State'], 
                                  value_vars=date_cols).rename(columns=rename_cols_confirmed)
df_covid_deaths_ohio = pd.melt(df_covid_deaths_ohio.reset_index(), id_vars=['Admin2', 'Province_State'], 
                               value_vars=date_cols).rename(columns=rename_cols_deaths)

df_covid_confirmed_ohio['obs_date'] = pd.to_datetime(df_covid_confirmed_ohio.obs_date)
df_covid_deaths_ohio['obs_date'] = pd.to_datetime(df_covid_deaths_ohio.obs_date)

print(df_covid_confirmed_ohio.head())
print(df_covid_deaths_ohio.head())

Concatenate the two dataframes together

I’d like to do a nice, side-by-side comparison, in bar chart form, of these two datasets. One way to do that is to concatenate both dataframes together and then render your chart from the single result. Here’s the code I wrote to concatenate both datasets together:

df_covid_confirmed_ohio['data_type'] = 'confirmed cases'
df_covid_confirmed_ohio['cnt'] = df_covid_confirmed_ohio.confirmed_cases
df_covid_deaths_ohio['data_type'] = 'deaths'
df_covid_deaths_ohio['cnt'] = df_covid_deaths_ohio.deaths
drop_cols = ['confirmed_cases', 'deaths', 'Admin2', 'Province_State']

df_combined_data = pd.concat([df_covid_confirmed_ohio[df_covid_confirmed_ohio.obs_date>='2020-5-1'], 
               df_covid_deaths_ohio[df_covid_deaths_ohio.obs_date>='2020-5-1']], sort=False).drop(columns=drop_cols)

Now, render the chart

Ok, I’m finally ready to create my chart:

fig, ax = plt.subplots(figsize=(12,8))
_ = df_combined_data.groupby(['obs_date', 'data_type']).sum().unstack().plot(kind='bar', ax=ax)

# draws the tick labels at an angle
fig.autofmt_xdate()

title = 'Number of COVID-19 cases/deaths in Ohio: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_combined_data.obs_date.min(), 
                                                                                      df_combined_data.obs_date.max())
_ = ax.set_title(title)
_ = ax.set_xlabel('Date')
_ = ax.set_ylabel('Count')

# clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(cnt, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend)
Wow! Those dates along the X axis are completely unreadable!

The X axis is a mess! Fortunately, there are a variety of ways to fix this problem: I particularly like the approach mentioned in this solution. Basically, I’m going to thin out the labels at a designated frequency. In my solution, I only show every fourth date/label. So, here’s my new code with my label fix highlighted:

fig, ax = plt.subplots(figsize=(12,8))
_ = df_combined_data.groupby(['obs_date', 'data_type']).sum().unstack().plot(kind='bar', ax=ax)

# draws the tick labels at an angle
fig.autofmt_xdate()

title = 'Number of COVID-19 cases/deaths in Ohio: {0:%d %b %Y} - {1:%d %b %Y}'.format(df_combined_data.obs_date.min(), 
                                                                                     df_combined_data.obs_date.max())
_ = ax.set_title(title)
_ = ax.set_xlabel('Date')
_ = ax.set_ylabel('Count')

# clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(cnt, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend)

# tick label fix
tick_labels = [l.get_text().replace(' 00:00:00', '') for l in ax.get_xticklabels()]
new_tick_labels = [''] * len(tick_labels)
new_tick_labels[::4] = tick_labels[::4]
_ = ax.set_xticklabels(new_tick_labels)
Much better!

That X axis is much more readable now thanks to the power of Python list slicing.

Cleaning up Stacked Bar Charts, Part 3

In my final mini-series on cleaning up stacked bar charts (Part 1 and Part 2, in case you missed them), let’s talk about how you might order the bars of your chart.

In my last post, each bar in my chart represented a different day of the week and I allowed the bars to be ordered accordingly:

The bars are ordered Monday – Sunday (starting at the bottom left)

Most people would probably expect this sort of ordering. However, what if your groups don’t have an inherent order like day-of-the-week?

For my example, I generated some random email data for five fake email accounts:

import numpy as np
from datetime import date, timedelta
import pandas as pd


# names compliments of: https://frightanic.com/goodies_content/docker-names.php
email_accounts = ['fervent_saha@test.com', 'serene_cori@test.com', 'agitated_pike@test.com', 
                  'cocky_turing@test.com', 'sad_babbage@test.com']
email_data = []

for acct in email_accounts:
    for cat in ['primary', 'promotions', 'social']:
        nbr_of_email = np.random.randint(50, high=100)
        for i in range(0, nbr_of_email):
            email_dt = date(2020, 6, 1) + timedelta(days=np.random.randint(0, high=30))
            email_data.append([email_dt, acct, cat])
            
df_email_accts = pd.DataFrame(email_data, columns=['email_date', 'email_account', 'email_category'])
df_email_accts['email_date'] = pd.to_datetime(df_email_accts.email_date)
df_email_accts.head()
A bunch of random, fake email data

Now, let’s use a stacked bar chart to compare the emails counts, by category, of the five different email accounts:

fig, ax = plt.subplots(figsize=(12,8))
_ = df_email_accts.groupby(['email_account', 'email_category']).count().unstack().plot(kind='barh', stacked=True, ax=ax)

_ = ax.set_title('Email counts by category, June 2020')
_ = ax.set_xlabel('Email Count')
_ = ax.set_ylabel('Email Account')
Bar chart chaos!

Technically, matplotlib has ordered the email accounts alphabetically–from agitated_pike@test.com to serene_cori@test.com–but most folks probably don’t care about that: they’ll likely want the chart ordered either greatest count to least or least count to greatest.

How can you then order your stacked bar chart by the total count? There may be a more elegant way to do this in pandas, but I came up with three lines to code to get the order right.

To start with, take a look at the dataframe we get with my standard groupby and unstack approach:

df_email_accts.groupby(['email_account', 'email_category']).count().unstack()

What I need is a way to total the counts of the three categories–primary, promotions, and social–for each of the five email accounts and then sort the dataframe by that total.

No problem! I can use the pandas sum function with axis=1–meaning, sum across the columns–to get that total:

df_rpt = df_email_accts.groupby(['email_account', 'email_category']).count().unstack()
df_rpt['total'] = df_rpt.sum(axis=1)
df_rpt.head()
The sum function gives me a “total” value I can use for sorting

Putting it all together, then, here’s the code I came up with to nicely sorted my stacked bar chart in a meaningful way:

# two lines of code to provide a "total" column that can be used for sorting
df_rpt = df_email_accts.groupby(['email_account', 'email_category']).count().unstack()
df_rpt['total'] = df_rpt.sum(axis=1)

fig, ax = plt.subplots(figsize=(12,8))

# sort the dataframe by the "total" column, then drop it before rendering the chart
_ = df_rpt.sort_values('total')[df_rpt.columns.tolist()[:-1]].plot(kind='barh', stacked=True, ax=ax)
_ = ax.set_title('Email counts by category, June 2020')
_ = ax.set_xlabel('Email Count')
_ = ax.set_ylabel('Email Account')

# and, of course, clean up the legend
original_legend = [t.get_text() for t in ax.legend().get_texts()]
new_legend = [t.replace('(email_date, ', '').replace(')', '') for t in original_legend]
_ = ax.legend(new_legend, title='Category')
A nicely sorted, stacked bar chart where the high and low counts are immediately apparent
« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑