Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Month: August 2021 (Page 1 of 2)

Filtering dataframes with tuples

Often when I’m working with data, I need to filter the data down into sub-groups for more detailed analysis. Usually, my needs are simple, where I only need to parse on fields independently. However, sometimes I need to filter on field combinations and that work can get complicated.

Even Mr. Hitchcock finds filtering by field combinations challenging

For example, suppose I’m working with a movie dataset and I want to compare Francis Ford Coppola dramas to Alfred Hitchcock mysteries to Martin Scorsese biography pictures. Initially, I might try something like this:

import pandas as pd


# data from: https://www.kaggle.com/mysarahmadbhat/imdb-top-1000-movies
df = pd.read_csv('./data/regex_imdb.csv').fillna(0)
# for simplicity, I'm just going to use the first, comma-delimited genre value
df['Genre'] = df.Genre.apply(lambda g: g.split(',')[0])

directors = ['Francis Ford Coppola', 'Alfred Hitchcock', 'Martin Scorsese']
genres = ['Drama', 'Mystery', 'Biography']
df[(df.Director.isin(directors)) & (df.Genre.isin(genres))]
This code allowed in Scorsese “mysteries”, for example, when I only wanted Scorsese “biographies”

Ok. Let’s then filter the hard way:

df[((df.Director=='Francis Ford Coppola') & (df.Genre=='Drama')) | 
   ((df.Director=='Alfred Hitchcock') & (df.Genre=='Mystery')) | 
   ((df.Director=='Martin Scorsese') & (df.Genre=='Biography'))]

This code does get me the data I want, but it’s long and a little unreadable. Is there a better way? Yes: tuples!

First, let’s take the director/genre combinations we want and put them together in a single list of tuples:

dirs_genres = [('Francis Ford Coppola', 'Drama'), ('Alfred Hitchcock', 'Mystery'), ('Martin Scorsese', 'Biography')]

Now, we can use the apply function to create an on-the-fly director/genre tuple field that we can parse on:

df[df.apply(lambda row: (row.Director, row.Genre), axis=1).isin(dirs_genres)]

We get the same results as before, but with slightly more readable code. Here’s a neat alternative option using Python’s tuple function:

df[df[['Director', 'Genre']].apply(tuple, axis=1).isin(dirs_genres)]

If you anticipate that you might use this director/genre combination a lot in your work, you could consider adding this tuple as a new column to your dataframe:

df['pg'] = df[['Director', 'Genre']].apply(tuple, axis=1)
df.head()

Then, you could do all your filtering directly against that single field.

Happy filtering!

Ideas to improve your college move-in experience

Tis that time of year when the doorbell rings a little more often marking delivery of item after item your child must possess to make her college experience even more positive. But after you’ve amassed a whole bunch of new paraphernalia–some of which she’ll likely never use–how do you pack your child’s gear together, load your vehicles, and move in those items to her new home-away-from-home as optimally as possible? Well, after a few of these such moves, here are some tips and tricks I’ve come up with to try to make the move-in experience as comfortable as possible.

Box up the gear

It may go without saying, but the more loose items you can box up, the better. I have found 18 gallon storage bins quite helpful because they’re easy to carry and stack up well. Ikea–and I’m sure others–sell handy bags with straps that do well to keep together your lighter items.

Bundle your clothes together

Most of your child’s clothes can be folded and boxed in some fashion, but your child will likely attend a few formal gatherings in her time at school and will want to bring more formal attire on hangers such as suits and dresses. For these, I like to group, say, 7-8 items together in dry cleaning bags. If you don’t have a handy supply of those bags, garbage bags work well as a substitute. Rubberband the hangers together and you can even put some duct tape over the tips to prevent the hangers from scratching your car or other possessions.

Label your items

Most schools will have some sort of welcoming team and some may even help you carry your gear to your room. As a precaution, you should label your bins, bags, and other items with masking tape and sharpie. Write the name of your child, her dorm, and her room number if you know it.

Bring a small dolly or cart

We have a couple of small dollies that easily fold up in the car and help to move your heavier items. I highly recommend bringing one along.

Tools

In my last move, I had to assemble two lamps, a TV stand, and a couple of storage shelves. My socket set and screw drivers made easy work of these. In the past, I’ve had to adjust the height of my kid’s bed frame that was well locked into place, but my hammer helped the adjustment. If you have even more components to assemble, you may consider bringing a charged, cordless drill, but, thankfully, I’ve yet to have to resort to that level of assembly. If you have plastic wrapped items, a knife or boxcutter will be a tool you’ll want available.

Bungees, door stops, etc.

You may need to prop open doors in your path. You’ll appreciate having a bungee cord or door stop on hand to wedge those open. Consider also including zip ties, duct tape, and even rope in your arsenal to help clear your path.

Assess the room

Most of the time, your child’s school will want you to fill out a move-in form where you note any damage to the room before you take possession of it. That way, you won’t have to pay for such damage when your child moves out. Either way, before you start hauling in your gear, take some time to survey the room and note any pre-existing damage. Taking pictures or video may even be advisable.

Don’t lose your key!

As soon as your child receives her dorm key, make sure she secures it: puts it on her keychain, lanyard, or secures it as she would her car keys or other similar items. Losing your key costs money and causes inconvenience. Ask me how I know!

Bring a change of clothes

August is usually a hot month no matter where you are in these United States. You’ll likely build up a nice “shine” carrying your child’s supplies to her room. Consider bringing a change of clothes to put on once the move is done.

Think about a time limit

After you’ve finished moving your child’s gear to her room, it will be time to unpack, organize, and decorate the room. You’ll certainly want to help her with all those tasks and absolutely should, but there will be a time let go and let her take charge of her own domicile. These moves can be emotionally tough for us parents and hard to walk away. It may help to establish some sort of time limit or milestone like an end-of-move meal to mark the time to leave your child and let her fly on her own.

Positioning Subplots

I’m certainly a big fan of visualizing data. Often, I like to present multiple types of visualizations together to offer a variety of perspectives on the data. For example, I might provide both a bar chart and scatter plot together to provide deeper insight than a single visual would:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

%matplotlib inline


# data from: https://www.kaggle.com/mysarahmadbhat/imdb-top-1000-movies
df = pd.read_csv('./data/regex_imdb.csv').fillna(0)

fig, ax = plt.subplots(1, 2, figsize=(10,6))

d1 = df[df.Year==2019][['Name', 'Gross']].sort_values('Gross').tail(10)
_ = ax[0].barh(d1.Name, d1.Gross)
_ = ax[0].set_xlabel('Gross Earnings')

d2 = df[df.Year==2019][['Run_time', 'Gross', 'Genre']].copy()
d2['Genre'] = d2.Genre.apply(lambda g: g.split(',')[0])

_ = sns.scatterplot(data=d2, x='Run_time', y='Gross', hue='Genre', ax=ax[1])
_ = ax[1].set_xlabel('Runtime (minutes)')

_ = fig.suptitle('Analysis of Movies from 2019')

In this sort of work, I will target specific axes to display specific charts. Thus, in my above example, I explicitly pushed a bar chart to ax[0] and a scatter plot to ax[1].

However, on occasion, circumstances demand that I write the same type of chart to multiple subplots where I change one variable for each. For example, suppose I want to get a quick view of the top 10 movies by gross earnings from 2010 to 2019:

I could write code to target each of these axes explicitly, but that would mean a lot of code and a lot of copy/paste. Instead, I’d rather just write a loop to iterate through the years and write the appropriate bar chart to the appropriate axis.

Looping and rendering the charts comes relatively easy to me. What usually trips me up in these efforts is targeting the right row and column. I often spend most of my time trying to remember how I solved this problem in the past.

Well no more! Hopefully this post will serve as a reference any time I need to do this type of work in the future. Ultimately, my solution is just three lines of code:

nbr_of_rows = 5
nbr_of_cols = 2

coords = [(r, c) for r in range(nbr_of_rows) for c in range(nbr_of_cols)]

Here, I set the number of rows and columns I want in my visual and do some list comprehension to pair those coordinates together in a list. Now, I have a nice, pre-built list of coordinates to leverage in my loop:

fig, ax = plt.subplots(nbr_of_rows, nbr_of_cols, figsize=(12,12))

for i, yr in enumerate(range(2010, 2020)):
    r,c = coords[i]  # grab the pre-built coordinates
    d = df[df.Year==yr][['Name', 'Gross']].sort_values('Gross').tail(10)
    _ = ax[r][c].barh(d.Name, d.Gross)
    _ = ax[r][c].set_title('Top 10 grossing movies in {0}'.format(yr))
    
fig.tight_layout()

Awesome: one less thing I have to think about!

« Older posts

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑