Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: tools (Page 15 of 35)

Fun with SVG

Andrew Wang-Hoyer’s work of just SVG and CSS

This amazing collection of SVG animations really piqued my interest the other day. Andrew has over 200 of these on his site with the code on his Github page. This work is so cool, I decided to take a crack at it:

(Sorry, but the only way I could figure out how to get this HTML to appear in the blog page without messing up the rest of the page is through the dreaded iframe tag.)

In any event, I certainly pose no threat to Andrew, but SVG is hard! Here’s the code I came up with for this gem:

<!DOCTYPE html>
<html lang="en">
    <head>
        <link href='https://fonts.googleapis.com/css?family=Dekko' rel='stylesheet'>
    </head>
    <svg
    preserveAspectRatio="xMidYMid meet"
    version="1.1"
    viewBox="0 0 50 10"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns="http://www.w3.org/2000/svg">
        <defs>
            <linearGradient id="water-color" x1="0%" y1="0%" x2="0%" y2="100%">
            <stop offset="0%" style="stop-color:white;stop-opacity:1" />
            <stop offset="100%" style="stop-color:#add8e6;stop-opacity:1" />
            </linearGradient>
        </defs>
        <style>
            /* <![CDATA[ */
                .logo-frame {
                    border-bottom: solid black .2;
                    fill: url(#water-color);
                }
                .logo {
                    font: italic 6px 'Dekko';
                    fill: white;
                    stroke: black;
                    stroke-width: 0.2;
                    font-stretch: semi-condensed;
                }
                @keyframes water-line {
                    0%  {transform: translateX(0px)}
                    25% {transform: translateX(0.5px)}
                    50% {transform: translateX(-0.5px)}
                    100%{transform: translateX(0px)}
                }
                .water-line {
                    stroke: #add8e6;
                    fill: transparent;
                    stroke-width: 0.1;
                    animation: water-line 10s linear infinite;
                }
            /* ]]> */
        </style>
        <rect
            class="logo-frame"
            x="1.0"
            y="3.0"
            height="6"
            width="46"
        />
        <text x="1.5" y="8" class="logo">DadOverflow.com</text>
        <path d="M 1 2 C 1 3, 5 3, 5 2 C 5 3, 9 3, 9 2 C 9 3, 13 3, 13 2 C 13 3, 17 3, 17 2 C 17 3, 21 3, 21 2 C 21 3, 25 3, 25 2 C 25 3, 29 3, 29 2 C 29 3 33 3, 33 2 C 33 3, 37 3, 37 2 C 37 3, 41 3, 41 2 C 41 3, 45 3, 45 2" class="water-line"/>
    </svg>
</html>

Figuring out that water line “path” was especially hard, but once I figured out the first wave or two, the rest weren’t too difficult. I get the impression that a lot of SVG developers use tools like Adobe Illustrator to solve those problems in easier ways. Here’s another cool post I found animating SVG with CSS. Take a look at both posts and get animating on SVG today!

Grouping moving averages with Pandas

A friend of mine posed a challenge to me recently: how do you calculate a moving average on a field, by group, and add the calculation as a new column back to the original dataframe?

Moving averages calculate an average of a value over a range of time as that “window” shifts over time. They’re often used to smooth out fluctuations in real data.

For example, let’s take a look at the COVID-19 data I used in my last post. Recall what my Ohio dataframe (df_ohio) looked like:

df_ohio.head()

Before I can even think about calculating moving averages on this data, I need to first tidy it up a bit, but pandas makes that pretty easy:

date_cols = df_ohio.columns.tolist()
rename_cols = {'variable': 'obs_date', 'value': 'confirmed_cases'}

df_ohio_tidy = pd.melt(df_ohio.reset_index(), id_vars=['county'], value_vars=date_cols).rename(columns=rename_cols)
df_ohio_tidy['obs_date'] = pd.to_datetime(df_ohio_tidy.obs_date)

df_ohio_tidy = df_ohio_tidy.set_index('obs_date')
df_ohio_tidy

Now, I’m ready to calculate moving averages. The pandas rolling function is generally used for that purpose. It’s quite a powerful and versatile function, so be sure to check out the documentation. Normally, I just draw the moving average values in a chart along side the actual observations:

fig, ax = plt.subplots(figsize=(8,8))
rename_col = {'confirmed_cases': '7 day moving avg'}
title = 'Confirmed COVID-19 cases in Cuyahoga, Ohio as of {0:%d %b %Y}'.format(df_ohio_tidy.index.max())

_ = df_ohio_tidy[df_ohio_tidy.county=='Cuyahoga, Ohio'][['confirmed_cases']].plot(ax=ax, title=title)
_ = df_ohio_tidy[df_ohio_tidy.county=='Cuyahoga, Ohio'][['confirmed_cases']].rolling(7).mean().\
    rename(columns=rename_col).plot(ax=ax, color='gray')

_ = ax.set_ylabel('Confirmed Cases')

But in this case, I need to calculate moving averages for each county in Ohio and add those calculations to the dataframe as a new column. For this, I use a combination of the rolling function and the equally powerful transform function. With help from this post, pandas has no issue doing that (in one line, no less):

df_ohio_tidy['7ma'] = df_ohio_tidy.groupby('county').confirmed_cases.transform(lambda c: c.rolling(7).mean())

Now, let’s do some spot checking to make sure the results are as expected:

df_ohio_tidy.sort_values(['county', 'obs_date']).iloc[1170:1190,:]

Above, we can see that the 7 day moving average for Crawford County stops at the last entry for Crawford County on March 30 and resets to start calculating for Cuyahoga County.

df_ohio_tidy.sort_values(['county', 'obs_date']).iloc[1235:1250,:]

Above, we spot check the change from Cuyahoga County to Darke County. Again, the calculation for Cuyahoga County stops with the last entry on March 30 and starts over calculating on Darke County.

So, yes, he can both calculate and group the moving average, Mr. Waturi! All the code behind my posts on the COVID-19 data can be found here.

Pictures and Words

A chart may be worth 1000 words, but sometimes embedding a few words in your chart can convey additional, helpful information. For example, take this chart that I built from the incredible COVID-19 data collected by Johns Hopkins University:

Pretty telling as it is. Now, let’s add some words to it:

Adding some words to the chart, 1) conveys additional, helpful information and 2) fills in some awkward whitespace. Seems like a win to me. For completeness sake, here’s what I did to build this chart:

Step 1: Import the packages

import pandas as pd
import matplotlib.pyplot as plt

Step 2: Load up the JHU dataset

df = pd.read_csv('./data/time_series_covid19_confirmed_US.csv')

Step 3: Trim the data down to just counties in Ohio

cols = [i for i, v in enumerate(df.columns) if v in ['Admin2', 'Province_State'] or v.endswith('2020')]
df_ohio = df[df.Province_State=='Ohio'].iloc[:,cols].copy()
df_ohio['county'] = df_ohio.Admin2 + ', ' + df_ohio.Province_State  # combine county and State together in a field
df_ohio = df_ohio.drop(columns=['Admin2', 'Province_State']).set_index('county')

Step 4: Build the chart

fig, ax = plt.subplots(figsize=(12,10))
title = 'Top 10 Ohio Counties with Confirmed COVID-19 Cases as of ' + df.columns[-1]
worst_county, worst_co_cases = [(k, v) for k, v in df_ohio['3/30/2020'].sort_values().tail(1).items()][0]

inset = """
There are {0} counties and other Ohio 
entities in this dataset.  As of {1}, 
there are {2:,} confirmed cases of COVID-19.  
{3} represents {4:.1f}% of that population.
""".format(df_ohio.shape[0], df.columns[-1], df_ohio['3/30/2020'].sum(), worst_county, 
           (worst_co_cases/df_ohio['3/30/2020'].sum())*100)

_ = df_ohio['3/30/2020'].sort_values().tail(10).plot(kind='barh', ax=ax, title=title)
_ = ax.set_ylabel('Ohio counties')
_ = ax.set_xlabel('Confirmed Cases')

# you have to experiment a little with the x, y positioning to get your word inset positioned just right
text = fig.text(0.30, 0.35, inset, va='center', ha='left', size=18)

Pretty darn slick!

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑