If only Wayne and Garth had a thumb drive of music to listen to

In the past, I’ve written about using PowerShell to help build a thumb drive of music to listen to in the car. Recently, I took a crack at converting that work to Python with the help of the pymediainfo package. Here’s what I did:

Load the requisite packages

import os
import pandas as pd
import json
import shutil

Build your music inventory

The to_data function of pymediainfo makes it very easy to gather all the important properties of your music files. Optionally, I wrote code to save that inventory out to a json file for later analysis, but you don’t have to do that to build your thumb drive. I hard coded the path to my music folder (D:\music_backup) and my code does assume that you only want to process mp3 files (line 4).

music_col = []
for dirpath, dirs, files in os.walk("D:\\music_backup"):
    for filename in files:
        if filename.lower().endswith('mp3'):
            fname = os.path.join(dirpath,filename)
            mi = MediaInfo.parse(fname)
            music_col.append([t for t in mi.tracks if t.track_type == 'General'][0].to_data())

# save collection to file if needed
with open('music_col.json', 'w') as f:
    json.dump(music_col, f) 

Build a pandas dataframe

Yes, pandas is my go-to “hammer” to solve most of my coding problems. I use the fillna function to replace any null values with empty strings–makes filtering easier later on.

df_music = pd.DataFrame(music_col)
df_music = df_music.fillna('')

Filter on just the music I want to listen to in the car

As I’ve said before, I have a lot of music in my library but technical limits with my car stereo. So, I have to make certain decisions on what music to copy. Dataframe filtering makes that fast and easy. To make things interesting, I’m leveraging the pandas sample function to randomly sort my music. Here’s the code I came up with:

genres_to_include = ["Pop", "Rock", "Hard Rock & Metal"]

album_artists_to_exclude = ["ABBA", "Disney", "Vanilla Ice"]
albums_to_exclude = ["Frozen [Original Motion Picture Soundtrack]", "High School Musical 2 [Original Soundtrack]", "The Smurfs 2- Music from and Inspired By"]
# excluded any "songs" that might actually be talking of some sort
bad_titles = 'interview|speech'

df_usb = df_music[(df_music.genre.isin(genres_to_include)) & ~(df_music.performer.isin(album_artists_to_exclude)) & 
                  ~(df_music.album.isin(albums_to_exclude)) & ~(df_music.title.str.contains(bad_titles, case=False)) & 
                  (df_music.duration>30000)].sample(frac=1)

Don’t forget about the size constraints of the thumb drive

I’m using a 16Gb thumb drive and I have well over 50Gb of music, so I need to make sure I only copy over enough files to fill up the drive and nothing more. The pandas cumsum function will help me easily figure that out:

df_usb['file_size_cumsum'] = df_usb.file_size.cumsum()

Finally, write to the thumb drive

Now, I’m ready to write my randomized music, filtered just how I want, to my thumb drive:

# set a max bytes of about 15.7 Gb
max_bytes = 15700000000
usb_drive = 'E:\\.'

for f in df_usb[df_usb.file_size_cumsum<max_bytes].complete_name.tolist():
    shutil.copy(f, usb_drive)