Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: linux (Page 2 of 3)

Parsing unconventional data files

One of these data files is not like the others

Recently, a friend presented me with an interesting challenge. He had a data file that he wanted to pull into a pandas dataframe; however, the file was substantially different from the CSVs and TSVs he normally parses. Here’s a representation of his data file:

Example of the unconventional data file

So, how would someone such as myself go about parsing such a file into a dataframe? Well, I would just do some parsing with Python first. Here’s the solution I came up with.

Step 1: Import the packages

import re
import datetime
import pandas as pd

Step 2: Import the file as text and parse it into a list

To start with, I split the file on those dotted lines. Then, I iterate over each entry line-by-line. With each iteration, I use a regular expression to find the timestamp value then I look for the other properties. Ultimately, I append a list of the timestamp, the price, and the commodity name to my master list.

delim = '-----------------------------------------------'
ld = re.compile(r'\d{4}-\d{2}-\d{2}')
log_list = []

with open('commodities.txt', 'r') as f:
    log = f.read()
    
for entry in log.split(delim):
    for line in entry.split('\n'):
        if ld.match(line):
            d = datetime.datetime.strptime(line, '%Y-%m-%d %H:%M:%S')
        elif len(line.strip()) > 0:
            price = line.strip().split()[0]
            commodity = line.strip().split()[1]
            log_list.append([d, price, commodity])

Step 3: Read the master list into a new dataframe

Once I finish iterating over the file and building out my log list, I can then properly pull it into a dataframe.

df1 = pd.DataFrame(log_list, columns=['log_date', 'price', 'commodity'])
df1.head()

But, wait, there’s more!

My friend happens to really like awk. While I was off coding my Python solution, he was busy writing an awk script to do the same. It occurred to me, though, that even if he wanted to solve his problem with awk, he could code it up and run it in Jupyter Notebook. Here’s how you might solve this same problem with awk.

Step 1: Develop the AWK script and write to disk

The writefile magic word basically turns your Jupyter Notebook cell into a text editor where you can easily save your work to a file. Here, I’m coding the awk script and then writing it to the file my_awk_script.awk.

%%writefile my_awk_script.awk
BEGIN {
    FS = " "
    OFS = ","
}
{
    if ( /^20/ )
    {
        dtstamp = $0
    }
    if ( NF = 2 )
    {
        price = $1
        commodity = $2
    }
    if ( commodity ~ /^[0-9]/ )
    {
        print dtstamp,price,commodity
    }
}
END {}

Step 2: Run the awk file in a bash shell

With my awk script done, I can execute it right from Jupyter Notebook with the help of the bash magic word. One cool thing about this magic word is that you can pipe the cell output to a variable for later processing. Here, I’m piping the results of the awk script to the variable awk_output.

%%bash --out awk_output
gawk -f my_awk_script.awk commodities.txt

Step 3: Clean up the output and load it into a dataframe

The output is one long string with return and newline characters denoting each new line. I can do some list comprehension work on that string, though, and easily get it ready for reading into a new dataframe:

parsed_log = [l.split(',') for l in awk_output.split('\r\n')]
df2 = pd.DataFrame(parsed_log, columns=['log_date', 'price', 'commodity'])
df2.head()

And there you have it: two ways to parse unconventional data files into a pandas dataframe. Check out my complete Jupyter Notebook here!

Graduation Party media

My eldest graduated high school this year and, to celebrate, the wife and I threw her a graduation party. I’ll spare you the to-dos and checklists for throwing a graduation party–I left those largely to my wife as it is. Instead, I’d like to focus this post on the media aspects of the party.

I’m a big proponent of taking pictures and videotaping those magic moments–and even mundane moments–throughout the lives of your children and family. So, I figured this was the perfect time to pull out those embarrassing pictures and video from the past so that all her friends could see.

For the party, we rented an outdoor facility at a local park. We would have access to electricity and I decided to bring two large flat screen monitors to display the media, but I did not want to bring along expensive laptops or desktops to play the media as I didn’t want to risk damaging that equipment. So, what to do? I know: run my media on cheap Raspberry Pis!

The Slideshow

For my picture slideshow, I pulled out about 700 pictures I had of my child from over the years. However I was going to run the slideshow, I knew I wanted to show the pictures in chronological order. So, I came up with this quick PowerShell script to rename the pictures in increasing numbers so that sorting the pictures alphabetically would effectively sort them chronologically:

$pics = gci "C:\grad_party\data\*" | sort {$_.LastWriteTime}

$count = 0
foreach($pic in $pics){
    $count = $count + 5
    mv $pic.FullName ("C:\grad_party\data\{0}{1}" -f ($count), $pic.Extension)
}

(I increased the count by 5 with each loop so that I could easily, manually reorder pictures if needed.)

Now the big question was, on a Raspberry Pi, what are my options to run a snazzy slideshow?

feh

feh is a linux utility for displaying images and its relatively easy to get it started on a Raspberry Pi. The challenge I encountered was getting it to do anything else other than hard transitions from slide to slide. Ideally, I want a slideshow where the slides are in motion and have nice cross-fade transitions: something like this. I don’t know if feh has those capabilities, but I certainly couldn’t get it to inject those effects, so I continued my search.

Python and Pi3D

A solution using Python?! Tell me more! I happened upon a post from TheDigitalPictureframe.com that used a home-grown Python script leveraging a package called Pi3D. The demo video looked close to the solution I was pursuing, so I gave it a try. Unfortunately, despite all the parameter tweaking I tried, I just couldn’t get the final product I wanted. Nevertheless, I would like to circle back someday and do some more experimentation with this approach.

Make the slideshow myself

In the end, I simply used PowerDirector–the software I use to make my annual family movies–to make a video of my slideshow. Then, I just dropped the video on the Raspberry Pi and played it with the built in VLC Player. I even set VLC to continuously loop the video and set the player to full screen to take up the entire monitor. It worked out pretty well and even garnered a few compliments.

The video montage

On a second flat screen monitor and second Raspberry Pi, I played a video montage of nearly 18 years of video highlights of my child’s life. As I’ve been making annual family movies for most of my daughter’s life, it was pretty easy collecting a few minutes of video of her from each year. Here again, I turned to PowerDirector for my solution. I spliced together the video from over the years separating each year with a “year” title and then created a single mp4 video that I copied to my Raspberry Pi and played with VLC player. Like the slideshow, it turned out pretty well.

Background music

“Dad, your music is too old!”

the kid

For the party background music, I pulled out my reliable portable speaker. Initially, my plan was to copy a bunch of my mp3 files to a third Raspberry Pi, play those files with VLC player, and run the audio out into my speaker. Then, the kid complained that my music was too old.

As an Amazon Prime member, I get access to lots of free music. So, I downloaded a few “modern pop” playlists to my phone. My speaker is bluetooth enabled, so I paired my phone to my speaker and then just played those playlists from my phone through the speaker. That worked out pretty well and I was surprised that both my speaker and phone batteries outlasted the three hour party.

So, there are your media suggestions for your child’s graduation party:

  1. Create a slick slideshow “video” and run it from a Raspberry Pi into a large flat screen monitor,
  2. Create a video montage “video” and do the same thing on a second monitor and Pi, and
  3. Download some music for offline play to your phone and bluetooth your phone to a nice, portable speaker.

The slideshow and video all presume that you’re actively photographing and videoing all your family activities. I highly recommend you do that: family vacations, basketball games, and banquets might feel mundane now but they’ll be gold at your child’s graduation party and beyond.

Bash in your notebook

#!/bin/bash

At work recently, I had to call an internal REST API for some data I needed to process. To try the API out, I fired up Ubuntu in my Windows Subsystem for Linux and ran a cURL command to try out the interface. That went well, so I created a new Jupyter Notebook in which to call the API–I wanted to take the response data, load them into a pandas dataframe, and create a chart. Easy, right?

So, I called the API with requests and promptly received a SSL “bad handshake” error. Like many others, I struggled to resolve this error. Clearly, the server hosting that API was misconfigured in some way. However, I didn’t own the server and had no real recourse to get the issue fixed, so I decided to call cURL directly from my notebook; this led me to the bash magic command.

With the bash magic command, you can tell Jupyter Notebook to run all the commands in your cell as if you were executing them at a bash command prompt…even if you’re running Jupyter Notebook on a Windows operating system. How cool is that?! Furthermore, with the out argument, you can pipe all your cell output to a variable for easy processing. Check this out:

(Note: I’m using the free lyrics API from lyrics.ovh in my examples below)

%%bash --out lyrics1
curl https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/here%20comes%20the%20sun
The Bash magic command returns output as a String

The Bash magic command returns output as a String. Since the output is really JSON, I can easily convert it to JSON with the loads function:

import json
json_lyrics1 = json.loads(lyrics1)
print(json_lyrics1['lyrics'])
That JSON String easily converts to standard JSON

What if you want to call Bash commands in a loop?

Suppose I wanted to get the lyrics of multiple Beatles songs…now what do I do? Well, here’s one hack I came up with: do in-line calls to the shell.

song_list = ['yesterday', 'yellow%20submarine', 'eleanor%20rigby']
song_lyrics = []

for song in song_list:
    lyric = !wsl curl -s 'https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/{song}'
    song_lyrics.append(lyric)
    
song_lyrics

Here are a few things to note with my shell operation:

  • Since my operating system is Windows 10, I’m actually shelling out to the Windows command shell, not bash. However, since I have WSL installed on my machine, I can use wsl.exe to run commands in that shell. So, I’m basically calling a shell within a shell to ultimately execute my bash operation.
  • With the braces syntax, I can pass the value of my song variable to my shell command.
  • I pass the silent argument (-s) to cURL to suppress the noise cURL would normally send back to Jupyter Notebook. This allows me to pass just the JSON response to my variable lyric.

One challenge with this approach is that the shell command returns a SList. Basically, a list of strings. I should be able to join those lists together, though, and then convert them to JSON with the loads function:

song_list = ['yesterday', 'yellow%20submarine', 'eleanor%20rigby']
song_lyrics = []

for song in song_list:
    lyric = !wsl curl -s 'https://private-anon-cd823708f8-lyricsovh.apiary-proxy.com/v1/the%20beatles/{song}'
    json_lyrics = json.loads(''.join(lyric))
    song_lyrics.append(json_lyrics)
    
song_lyrics

And now I have a list of JSON objects (or dictionaries) to work with. Awesome!

For more on the bash magic command, check out this excellent article. Go here to get all my example code.

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑