Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: tools (Page 11 of 35)

End of year gotcha in Java

Some of the data I work with sends dates in epoch or unix time. This works fine for software, but it is often helpful to convert those numbers to year, month, and day values for human consumption.

Consider the epoch time value: 1608552000.000 (the decimal places being for milliseconds)

Epoch Converter converts this value to: December 21, 2020 12:00:00 PM (GMT)

In Java, I might write code like this to convert the value:

import java.time.*;
import java.time.format.DateTimeFormatter;


class MyApplication{
    public static void main(String[] args){
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("YYYY-MM-dd");
        String uTime = "1608552000.000";
        Float fTime = Float.parseFloat(uTime);
        String sTime = LocalDateTime.ofInstant(Instant.ofEpochMilli((long) (fTime*1000)), ZoneOffset.ofHours(-5)).format(dtf);

        System.out.println(sTime);
    }
}

This code writes out the date: 2020-12-21

Job well done, right? Not so fast.

Consider this epoch value: 1609070400.000

Epoch Converter converts this value to: December 27, 2020 12:00:00 PM (GMT)

However, when I run this number through the code above, the output is: 2021-12-27

The year is wrong! What’s going on here? Well, it turns out the problem is with the date formatting pattern (line 7). In the code, I’m using “Y” (capital Y) to convert the year. The documentation defines this format as a “week-based-year”. December 27 was a Sunday. The first day of 2021 was the following Friday. My best guess is that, since part of that week changed to the year 2021, use of that particular format pattern must apply the year 2021 to all days in the week. Interesting.

Thankfully, if you want to get the actual year for that date, just use “y” (lowercase y). In code, that looks like this:

import java.time.*;
import java.time.format.DateTimeFormatter;


class MyApplication{
    public static void main(String[] args){
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("yyyy-MM-dd");
        String uTime = "1609070400.000";
        Float fTime = Float.parseFloat(uTime);
        String sTime = LocalDateTime.ofInstant(Instant.ofEpochMilli((long) (fTime*1000)), ZoneOffset.ofHours(-5)).format(dtf);

        System.out.println(sTime);
    }
}

And now the output is as I expect: 2020-12-27

Music to Drive By: Python Edition

If only Wayne and Garth had a thumb drive of music to listen to

In the past, I’ve written about using PowerShell to help build a thumb drive of music to listen to in the car. Recently, I took a crack at converting that work to Python with the help of the pymediainfo package. Here’s what I did:

Load the requisite packages

import os
import pandas as pd
import json
import shutil

Build your music inventory

The to_data function of pymediainfo makes it very easy to gather all the important properties of your music files. Optionally, I wrote code to save that inventory out to a json file for later analysis, but you don’t have to do that to build your thumb drive. I hard coded the path to my music folder (D:\music_backup) and my code does assume that you only want to process mp3 files (line 4).

music_col = []
for dirpath, dirs, files in os.walk("D:\\music_backup"):
    for filename in files:
        if filename.lower().endswith('mp3'):
            fname = os.path.join(dirpath,filename)
            mi = MediaInfo.parse(fname)
            music_col.append([t for t in mi.tracks if t.track_type == 'General'][0].to_data())

# save collection to file if needed
with open('music_col.json', 'w') as f:
    json.dump(music_col, f) 

Build a pandas dataframe

Yes, pandas is my go-to “hammer” to solve most of my coding problems. I use the fillna function to replace any null values with empty strings–makes filtering easier later on.

df_music = pd.DataFrame(music_col)
df_music = df_music.fillna('')

Filter on just the music I want to listen to in the car

As I’ve said before, I have a lot of music in my library but technical limits with my car stereo. So, I have to make certain decisions on what music to copy. Dataframe filtering makes that fast and easy. To make things interesting, I’m leveraging the pandas sample function to randomly sort my music. Here’s the code I came up with:

genres_to_include = ["Pop", "Rock", "Hard Rock & Metal"]

album_artists_to_exclude = ["ABBA", "Disney", "Vanilla Ice"]
albums_to_exclude = ["Frozen [Original Motion Picture Soundtrack]", "High School Musical 2 [Original Soundtrack]", "The Smurfs 2- Music from and Inspired By"]
# excluded any "songs" that might actually be talking of some sort
bad_titles = 'interview|speech'

df_usb = df_music[(df_music.genre.isin(genres_to_include)) & ~(df_music.performer.isin(album_artists_to_exclude)) & 
                  ~(df_music.album.isin(albums_to_exclude)) & ~(df_music.title.str.contains(bad_titles, case=False)) & 
                  (df_music.duration>30000)].sample(frac=1)

Don’t forget about the size constraints of the thumb drive

I’m using a 16Gb thumb drive and I have well over 50Gb of music, so I need to make sure I only copy over enough files to fill up the drive and nothing more. The pandas cumsum function will help me easily figure that out:

df_usb['file_size_cumsum'] = df_usb.file_size.cumsum()

Finally, write to the thumb drive

Now, I’m ready to write my randomized music, filtered just how I want, to my thumb drive:

# set a max bytes of about 15.7 Gb
max_bytes = 15700000000
usb_drive = 'E:\\.'

for f in df_usb[df_usb.file_size_cumsum<max_bytes].complete_name.tolist():
    shutil.copy(f, usb_drive)

Lists in your Dataframes

I had a challenge not long ago where I had a dataframe of users and a list of different security groups to which each belonged. I wanted to do some simple analysis on how many groups were represented in the dataframe and how many users belonged to each group. A simple horizontal bar chart would suffice.

To provide a more real life example of my problem and solution, imagine you wanted to do some analysis on the three main UEFA titles–Champions League, Europa League, and UEFA Super Cup–and wanted to know how many English teams won each. You might first start by collecting the title winners for each of the contests into a single dataframe. Following that approach, we now have a dataframe that looks like this:

Our dataframe with “Club” as a string and “title” as a list of strings

Start with a unique set of titles

Since I want my chart to show each UEFA title, let’s get a list of those titles like so:

unique_title_list = list(set([item for sublist in df_combo.title.tolist() for item in sublist if len(item)>0]))

This code performs several operations in a single line:

  1. It converts the title column into a list. Since each value is already a list, the result is a list of lists.
  2. Next, I use some clever list comprehension to iterate into each sublist and then interate into each item in that sublist. The result is one large list of all titles won. Note that I also add a “length greater than 0” test just to make sure I avoid empty strings.
  3. Next, I use Python’s set function to produce a group of just the unique titles.
  4. Finally, I cast the set back to a list.

Count the teams that have won each title

To get the count of teams winning each title, I iterate across my unique list, filter down the dataframe by each title, and count the results:

title_counts = {}

for u in unique_title_list:
	winner_count = df_combo[df_combo.title.apply(lambda t: u in t)].shape[0]
	title_counts[u] = winner_count

Nicely sort your results

A good looking bar chart usually sorts the bars low-to-high or high-to-low, so I take this additional step to sort my results:

title, c = [], []
for k,v in sorted(title_counts.items(), key=lambda x: x[1]):
	title.append(k)
	c.append(v)

Finally, chart the results

Last, I wrote this code to produce a horizontal bar charts showing a count of the English teams winning a UEFA title:

fig, ax = plt.subplots(figsize=(10,6))
_ = ax.barh(title, c)
_ = ax.set_xlabel('Number of English Teams')
_ = ax.set_ylabel('Title')
_ = ax.set_title('Number of English football teams winning UEFA titles')

So, this chart is a little lackluster, but what an accomplishment to have five different English teams winning these titles!

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑