Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Month: April 2020 (Page 1 of 2)

Coding Google Apps Script

Like most people, I use a great many of the free applications provided by Google. Unlike a lot of people, I like to occasionally collect data on how I’m using some of these applications, especially Gmail.

Unfortunately, there do not appear to be easy ways to collect usage data from Gmail; however, Google does provide a scripting platform, Google Apps Script, that allows you to code against Google’s various APIs and get at most of the data you might be interested in.

Recently, I found myself interested in analyzing the categories of email coming into my inbox. How could I collect simple email properties like date, subject line, and category and write the results to a CSV for further analysis?

Step 1: Launch a new spreadsheet

Navigate to Google Docs in your browser and launch a new spreadsheet. This tutorial was excellent help.

Step 2: Launch the script editor

Following that tutorial, open up a new instance of the Script Editor under the Tools menu. This step is pretty important because it seems to “bind” your scripting work to the new spreadsheet. Initially, I simply started a new scripting project in Apps Script apart from any spreadsheet and found no way to get that work to write to any of my spreadsheets.

Step 3: Code away

In this endeavor, my goal was pretty simple: grab particular email metadata and write the data to a spreadsheet. This required me coding against both the GmailApp API and the Spreadsheet API. Here’s the code I ultimately came up with (I believe this is vanilla JavaScript):

function write_gmail_stats() {
  //get reference to active spreadsheet
  var sheet = SpreadsheetApp.getActiveSheet();
  // get collection of email categorized as "promotions" sent after 31 Mar 2020
  var promos = GmailApp.search('category:promotions after:2020/03/31');
  //iterate through the email collection and write date, subject, and category to spreadsheet
  for (var i = 0; i < promos.length; i++){
    var m = promos[i];
    var d = m.getLastMessageDate();
    var s = m.getFirstMessageSubject();
    sheet.appendRow([d, s, "promotions"]);
  }
  
  var social = GmailApp.search('category:social after:2020/03/31');
  //iterate through the email collection and write date, subject, and category to spreadsheet
  for (var i = 0; i < social.length; i++){
    var m = social[i];
    var d = m.getLastMessageDate();
    var s = m.getFirstMessageSubject();
    sheet.appendRow([d, s, "social"]);
  }
  
  var primary = GmailApp.search('category:primary after:2020/03/31');
  //iterate through the email collection and write date, subject, and category to spreadsheet
  for (var i = 0; i < primary.length; i++){
    var m = primary[i];
    var d = m.getLastMessageDate();
    var s = m.getFirstMessageSubject();
    sheet.appendRow([d, s, "primary"]);
  }
}

This code will take the date, subject, and category name of the three default categories in Gmail and write the results to my spreadsheet. To keep my results modest, I crafted a search query to only look at email from April (this support page was very helpful in figuring out the query I needed). The results worked out pretty well:

Sweet! This certainly beats screen scraping or manual data collecting.

Fun with SVG

Andrew Wang-Hoyer’s work of just SVG and CSS

This amazing collection of SVG animations really piqued my interest the other day. Andrew has over 200 of these on his site with the code on his Github page. This work is so cool, I decided to take a crack at it:

(Sorry, but the only way I could figure out how to get this HTML to appear in the blog page without messing up the rest of the page is through the dreaded iframe tag.)

In any event, I certainly pose no threat to Andrew, but SVG is hard! Here’s the code I came up with for this gem:

<!DOCTYPE html>
<html lang="en">
    <head>
        <link href='https://fonts.googleapis.com/css?family=Dekko' rel='stylesheet'>
    </head>
    <svg
    preserveAspectRatio="xMidYMid meet"
    version="1.1"
    viewBox="0 0 50 10"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns="http://www.w3.org/2000/svg">
        <defs>
            <linearGradient id="water-color" x1="0%" y1="0%" x2="0%" y2="100%">
            <stop offset="0%" style="stop-color:white;stop-opacity:1" />
            <stop offset="100%" style="stop-color:#add8e6;stop-opacity:1" />
            </linearGradient>
        </defs>
        <style>
            /* <![CDATA[ */
                .logo-frame {
                    border-bottom: solid black .2;
                    fill: url(#water-color);
                }
                .logo {
                    font: italic 6px 'Dekko';
                    fill: white;
                    stroke: black;
                    stroke-width: 0.2;
                    font-stretch: semi-condensed;
                }
                @keyframes water-line {
                    0%  {transform: translateX(0px)}
                    25% {transform: translateX(0.5px)}
                    50% {transform: translateX(-0.5px)}
                    100%{transform: translateX(0px)}
                }
                .water-line {
                    stroke: #add8e6;
                    fill: transparent;
                    stroke-width: 0.1;
                    animation: water-line 10s linear infinite;
                }
            /* ]]> */
        </style>
        <rect
            class="logo-frame"
            x="1.0"
            y="3.0"
            height="6"
            width="46"
        />
        <text x="1.5" y="8" class="logo">DadOverflow.com</text>
        <path d="M 1 2 C 1 3, 5 3, 5 2 C 5 3, 9 3, 9 2 C 9 3, 13 3, 13 2 C 13 3, 17 3, 17 2 C 17 3, 21 3, 21 2 C 21 3, 25 3, 25 2 C 25 3, 29 3, 29 2 C 29 3 33 3, 33 2 C 33 3, 37 3, 37 2 C 37 3, 41 3, 41 2 C 41 3, 45 3, 45 2" class="water-line"/>
    </svg>
</html>

Figuring out that water line “path” was especially hard, but once I figured out the first wave or two, the rest weren’t too difficult. I get the impression that a lot of SVG developers use tools like Adobe Illustrator to solve those problems in easier ways. Here’s another cool post I found animating SVG with CSS. Take a look at both posts and get animating on SVG today!

Grouping moving averages with Pandas

A friend of mine posed a challenge to me recently: how do you calculate a moving average on a field, by group, and add the calculation as a new column back to the original dataframe?

Moving averages calculate an average of a value over a range of time as that “window” shifts over time. They’re often used to smooth out fluctuations in real data.

For example, let’s take a look at the COVID-19 data I used in my last post. Recall what my Ohio dataframe (df_ohio) looked like:

df_ohio.head()

Before I can even think about calculating moving averages on this data, I need to first tidy it up a bit, but pandas makes that pretty easy:

date_cols = df_ohio.columns.tolist()
rename_cols = {'variable': 'obs_date', 'value': 'confirmed_cases'}

df_ohio_tidy = pd.melt(df_ohio.reset_index(), id_vars=['county'], value_vars=date_cols).rename(columns=rename_cols)
df_ohio_tidy['obs_date'] = pd.to_datetime(df_ohio_tidy.obs_date)

df_ohio_tidy = df_ohio_tidy.set_index('obs_date')
df_ohio_tidy

Now, I’m ready to calculate moving averages. The pandas rolling function is generally used for that purpose. It’s quite a powerful and versatile function, so be sure to check out the documentation. Normally, I just draw the moving average values in a chart along side the actual observations:

fig, ax = plt.subplots(figsize=(8,8))
rename_col = {'confirmed_cases': '7 day moving avg'}
title = 'Confirmed COVID-19 cases in Cuyahoga, Ohio as of {0:%d %b %Y}'.format(df_ohio_tidy.index.max())

_ = df_ohio_tidy[df_ohio_tidy.county=='Cuyahoga, Ohio'][['confirmed_cases']].plot(ax=ax, title=title)
_ = df_ohio_tidy[df_ohio_tidy.county=='Cuyahoga, Ohio'][['confirmed_cases']].rolling(7).mean().\
    rename(columns=rename_col).plot(ax=ax, color='gray')

_ = ax.set_ylabel('Confirmed Cases')

But in this case, I need to calculate moving averages for each county in Ohio and add those calculations to the dataframe as a new column. For this, I use a combination of the rolling function and the equally powerful transform function. With help from this post, pandas has no issue doing that (in one line, no less):

df_ohio_tidy['7ma'] = df_ohio_tidy.groupby('county').confirmed_cases.transform(lambda c: c.rolling(7).mean())

Now, let’s do some spot checking to make sure the results are as expected:

df_ohio_tidy.sort_values(['county', 'obs_date']).iloc[1170:1190,:]

Above, we can see that the 7 day moving average for Crawford County stops at the last entry for Crawford County on March 30 and resets to start calculating for Cuyahoga County.

df_ohio_tidy.sort_values(['county', 'obs_date']).iloc[1235:1250,:]

Above, we spot check the change from Cuyahoga County to Darke County. Again, the calculation for Cuyahoga County stops with the last entry on March 30 and starts over calculating on Darke County.

So, yes, he can both calculate and group the moving average, Mr. Waturi! All the code behind my posts on the COVID-19 data can be found here.

« Older posts

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑