DadOverflow.com

Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Month: April 2019

Sorting your teammates randomly

You go first? No, you go first.

Every week, my team has a status meeting where we spend part of the time going around our virtual meeting room getting updates from each team member. To try to keep things fair, my manager attempts to randomly pick the next team member to give his or her update. If she really wanted to be fair, she’d simply run this PowerShell command every week:

"Larry","Moe","Curly","Doc","Sneazy","Grumpy"|sort {random}

Here, I start with an array of my team members.: Larry, Moe, Curly, Doc, Sneazy, and Grumpy. I pipe that array to the Sort-Object cmdlet. The “random” portion of the sort command basically tells the sort to sort the list according to the large random number that is generated. This doesn’t feel completely random to me and I’m not sure it would work well for teams larger than nine people, but it certainly works better than drawing names out of a hat–or the manager’s head, for that matter.

Hat tip to this blog post for the inspiration.

Ctrl + Shift + N

At work, I had a task of moving around a lot of files into new folders in Windows Explorer–and, unfortunately, not the kind of work I could script out.

After about the tenth time of right-clicking in Explorer, moving my mouse to “New” in the context menu, and waiting for the second menu to appear so that I could create a New Folder…I had had about enough. Surely there’s a quicker way to create a new folder in Windows Explorer. Well, there is: Ctrl + Shift + N

I do use several Windows shortcut keys, but, boy, are there a ton more out there! Check them all out here.

College Tuition vs. Starting Salary

A few months ago, Machine Learning Plus published a great article demonstrating the power of matplotlib by showcasing 50 cool visuals you can accomplish with the package. Inspired, I wanted to see if I could replicate some of these visuals, but with data I’m interested in.

So, I started with their bubble chart, but instead of using the strange, Midwest data they used, I thought I’d work in a space that’s been preoccupying my time of late: college tuition. What sort of bubble chart could I craft that depicted college tuition in some way? What about a bubble chart depicting the intersection of college tuitions and their corresponding average starting salaries? That might help parents and students better understand the return on investment associated with various colleges. Here’s what I came up with:

First, I decided to narrow down my work to just Ohio colleges. At Payscale.com, I found a dataset of median starting salaries by Ohio college for 2018.

Unfortunately, the Payscale.com dataset did not include college tuition prices. However, CollegeCalc.org did have a dataset of Ohio college tuition prices for 2018-2019.

Much of my work revolved around cleaning up these data sources and merging them together for the final visual. As you might imagine, each dataset tended to have slight name variations between schools. For example, the Payscale.com dataset had an entry for Kettering College whereas the CollegeCalc.org site calls that school Kettering College of Medical Arts. So, I had to do a fair amount of work making sure both datasets called each school the same name so that I could properly match on those names.

The Payscale.com dataset included some language to differentiate public schools from private, which I used to color my bubbles blue and red, respectively. The CollegeCalc.org dataset included the school size which I used to size each bubble.

Machine Learning Plus’s bubble chart includes a cool “encircling” device that draws a circle around certain datapoints to draw the user’s attention to those points. Instead of doing that, I thought it’d be interesting to draw a “break even” line. All things equal, if you pay, say, $10,000 in tuition for 4 years, you’re tuition investment would break even if your first job out of school paid $40,000. I drew a line to that effect on the graph: datapoints above that line would have a positive return on investment whereas datapoints below that line would have a negative return on investment. I didn’t want to muddy up the chart labeling each bubble with the name of the college, but I still thought it’d be fun to calculate which schools are above and below the line, so I found a way to do that, added the calculation as a column to the dataframe, and printed out the Top 5 “Best” returns on investment and the Top 5 “Worse” returns on investment.

Top 5 biggest ROI schools: 
68              Central State University
40        Kent State University at Salem
45     Kent State University at Trumbull
56    Kent State University at Ashtabula
46              Shawnee State University
Name: School Name, dtype: object

Top 5 least ROI schools: 
8              Oberlin College
1               Kenyon College
3           Denison University
18      The College of Wooster
6     Ohio Wesleyan University
Name: School Name, dtype: object

Obviously, my “break even” assessment is very simplistic. There are many other variables I don’t account for: room and board, fees, financial aid, merit scholarships, taxes, and the like. The median starting salaries are across all graduates from a given school–from Philosophy majors to Computer Science. So, your mileage will certainly vary. For me, the bigger take-aways were 1) the challenge of obtaining, cleaning, and merging the datasets, 2) charting out the results in a cool way, and 3) calculating the datapoints above and below my break-even line. All my work is here in case you want to check it out. Look for more matplotlib charts inspired by the Machine Learning Plus article in the future!

© 2019 DadOverflow.com

Theme by Anders NorenUp ↑