Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Month: December 2019

One big 2019 accomplishment

I’ve blogged in the past about the importance of recording your family events and periodically consolidating that work into a polished product for the world–or at least your family and friends–to see.

To that end, I’m now on my twelfth year of creating an annual family video where I painstakingly go through a year’s worth of family pictures, video, artwork, and awards to highlight my family’s accomplishments in an 80 minute montage of clips and fun segues. Here’s a short summary of that work:

My works average about 84 minutes year after year. You may also notice the missing years 2007 and 2008. After all the effort I expelled compiling my 2006 video, I spent the next two years saying, “never again”! I convinced myself to jump back into the fray in 2009 and kept at it ever since.

What sort of effort are we talking about?

I haven’t tabulated the total hours of effort I invest into culling my media for the few minutes of baskets, goals, and solos, but my efforts do span several weeks and consume much of my end-of-year vacation days.

Typically, I start with an outline–a text file. I start going through the year’s worth of media I’ve collected and note the major scenes or sections: for example, basketball season, soccer season, instrumental/choir recitals, family vacation, birthday celebrations, etc. Once I’ve decided on my scenes, I try to decide on how to order them in my outline. For the most part, I try to stick to a chronological order, but if I find that one of my children is the focus in multiple back-to-back scenes, I will try to intersperse those scenes with others that focus on another child, so as to dispel any sense of favoritism in the final product.

Once I have my outline, I wrap it with intro and outro scenes. These scenes allow me a small level of creativity. For my intros, I try to mimic one you might see in a television show or movie. I find some snazzy music bed, play a quick sequence of photos of the family members with fancy transitions over it, and end the quick 30 second intro with a nifty title of the video: something about our family and our goings-on over the year.

Again attempting to mimic conventional outros in the mainstream, I try to find an upbeat, family-friendly song and run a bunch of photos of the members from the year on top of it. I do forgo the credits piece as I tend to be the producer, directory, key grip, and best boy all in one.

With my outline set, the real work begins watching hours of footage to pull out just the highlights. In years past, I would work through the outline in order: building my intro scene and working all the way through to my outro scene. This year, though, I decided to work out-of-order, trying to knock out some of the harder scenes first. I think this proved to be pretty successful and I’ll probably take the same approach next year.

How much media do you really have?

Maybe next year, I’ll try to tally up the minutes and hours I spend assembly the final product. Starting with this year, though, I decided I wanted to at least tally up the number of hours and minutes of raw footage I must sift through to do my work. To do this, I needed to find an easy way to collect duration times of all my media files.

When I wrote my Music to Drive By solution, I wrote a PowerShell script that, in part, tallied up the duration of all the MP3 files it wrote to my thumb drive as an output to the console. That script is soooo slow. Surely there have been new, speedier innovations in capturing media file durations in both PowerShell and Windows since then. Nope. None that I can find, anyway.

WSL, ftw

So, I decided to see what I could do in the Windows Subsystem for Linux. There are many options out there and I decide to give the MediaInfo utility a try based on this helpful post. MediaInfo can only look at one media file at a time, but that same post included some very helpful Bash code to let me loop through directories of my media files and total their durations in miliseconds. Here’s, roughly, the Bash script I came up with:

let total_duration_ms=0
for media_file in /mnt/extdrv/qsync_backup/Videos/2019/*/*/*.{mp3,mp4,mov,wav,MP3,MP4,MOV,WAV}; do
        if [ -f "$media_file" ]; then
                total_duration_ms=$(expr $total_duration_ms + $(mediainfo --Inform="General;%Duration%" "$media_file"))
        fi
done
echo $total_duration_ms

One other item I should note: I house my media files on external hard drives. Getting WSL to see my external hard drive was simple once I mounted it. This post aided in that regard.

So, how much raw footage did I have to work with this year?

25.4 hours

I trimmed over twenty five hours down to an 82 minute family video. Well, less than that, actually, as a good 7-8 minutes of the video was probably still shots and transitions. So, yes, I consider this one of my accomplishments this year.

Assessing my Posts

The end of the year is a traditional time to reflect back and assess one’s actions for the past twelve months. So, what better time to do a little analysis on what I’ve been posting on this blog.

Getting my blog data

As far as I can tell, I have no way to download summary information on my posts from the WordPress console; however, some information–title, category, tags, publishing date, etc.–is available in a table in the Posts section of the console. So, I used the handy Table-to-Excel browser extension to copy the contents of the table to a CSV file that I could later process with Python.

Parsing the raw data

The blog data from my administration console didn’t copy down so nicely. Here’s some code I wrote to clean up the data and get it into a dataframe for cleaner work later:

blog_data = []

with open('./data/raw_post_data.txt', 'rb') as f:
    for raw_line in f:
        line = raw_line.decode("utf-8")
        title = line.split('false')[0]  # do some initial trimming of the row
        data_part = line[line.find('Brad')+4:]  # splitting on the "author" value
        data_list = data_part.split('\t')
        blog_data.append([title.strip(), data_list[1].strip(), data_list[2].strip(), data_list[5].strip()])
    
df_blog_data = pd.DataFrame(blog_data[1:], columns=['title', 'categories', 'tags', 'published'])
df_blog_data = df_blog_data[df_blog_data.title!='All']  # remove the header row from the dataframe

Afterward, I cleaned up my dataframe a little and added a few more columns:

df_blog_data['publish_date'] = df_blog_data.published.apply(lambda p: datetime.strptime(p.split()[1], '%Y/%m/%d'))
df_blog_data['year'] = df_blog_data.publish_date.apply(lambda p: p.year)
df_blog_data['month'] = df_blog_data.publish_date.apply(lambda p: p.month)

Time for some analysis

With a relatively manageable dataframe, I can generate some charts and do a little analysis. With the following code, I take a look at how prolific I’ve been with blogging:

width =0.3
fig, ax = plt.subplots(figsize=(10, 6))

df_blog_data[df_blog_data.year==2019].groupby(['month']).count().iloc[:,[0]].plot(kind='bar', ax=ax, width=width, position=0, color='orange')
df_blog_data[df_blog_data.year==2018].groupby(['month']).count().iloc[:,[0]].plot(kind='bar', ax=ax, width=width, position=1, color='blue')

_ = ax.set_title('Number of Blog Posts: 2018 - 2019')
_ = ax.set_ylabel('Number of Blog Posts')
l = ax.legend()
l.get_texts()[0].set_text('2019')
l.get_texts()[1].set_text('2018')

…and the results:

The number of blog posts I’ve written over the last two years

Well, I clearly peaked six months into the life of this website and it’s been downhill from there. At least in 2019 I think I’ve pretty consistently delivered three posts a month.

So, what sort of content have I been delivering? Categories and tags should tell this story. For the most part, I’ve tried to assign only one category per blog post, but not always. So, to try to get an idea of how often I’ve used each category on the site, I had to do a little gymnastics to pull out each category separately and report each count. Here’s the code I came up with:

df_cats = pd.DataFrame( ','.join( df_blog_data.categories.tolist()).replace(' ', '').split(','), columns=['category'])
fig, ax = plt.subplots(figsize=(10, 6))

_ = df_cats.groupby('category').size().plot(kind='barh', ax=ax, color='mediumpurple')
_ = ax.set_title('Categories used for blog posts: 2018 - 2019')

This blog is clearly heavily weighted toward technology. I also have an Uncategorized category in there which means I forgot to categorize one of my previous posts. I definitely need to work on adding more general and genealogy-type posts just to keep things interesting.

To analyze my use of tags, I wrote roughly the same sort of code:

df_tags = pd.DataFrame( ','.join( df_blog_data.tags.tolist()).replace(' ', '').split(','), columns=['tag'])
fig, ax = plt.subplots(figsize=(10, 6))

_ = df_tags.groupby('tag').size().sort_values().plot(kind='barh', ax=ax, color='green')
_ = ax.set_title('Tags used for blog posts: 2018 - 2019')

Well, I do like tools–especially the software kind! I had feared that python would be a dominating topic, but it’s not as bad as I thought and even the parenting topic is a close fourth. In the future, I would like to write more about the college experience as I have recently become the parent of a college student and will add another to that list in the not-too-distant future. I must also write more on the podcast topic as I do make much use of that medium in my lengthy commutes to and from work. And, so here’s to more quality posts in 2020!

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑