I’ve alluded to my interest in reading a few times in the past. Several years ago, I made the switch from physical books to digital and use an Amazon Kindle as my main reading vehicle.

One frustration I have with the Kindle, though, is either its inability to track the reading time metrics I’m interested in collecting or its poor way of sharing those metrics with data nerds like me.

Earlier in the year, I decided to spend more than five minutes solving this problem and found out about Kindle FreeTime. Kindle FreeTime is an application on Kindle devices with the primary focus of getting kids to read. Parents can use FreeTime to decide what books their children can read and what minimum daily reading goals they want their children to meet. A side benefit of FreeTime, though, is that it captures a lot of the metrics I’m interested in in a SQLite database: all you have to do is plug your kindle into your workstation, download the database at system\freetime\freetime.db, and start exploring.

Dayinfo

One of the tables in the FreeTime database is dayinfo. This is probably a good place to start gathering some general reading metrics. Here’s how I went about digging into the data.

Load all the standard packages

In my notebook, I started by loading all the normal packages I use including the sqlite3 package:

import pandas as pd
import numpy as np
import sqlite3
from datetime import datetime
import matplotlib.pyplot as plt

%matplotlib inline

Load and clean the data from the table

Next, I queried the data from the dayinfo table and added a few helpful columns:

conn = sqlite3.connect('./data/freetime.db')
query = "SELECT * FROM dayinfo;"

df_dayinfo = pd.read_sql_query(query,conn)

# clean up fields and do some feature engineering
df_dayinfo['accessdate'] = pd.to_datetime(df_dayinfo.accessdate)
df_dayinfo['access_month'] = df_dayinfo.accessdate.dt.month
df_dayinfo['access_dow'] = df_dayinfo.accessdate.dt.dayofweek
df_dayinfo['read_mins'] = df_dayinfo.timeread / 60
df_dayinfo['read_hours'] = df_dayinfo.timeread / 3600

Calculate some preliminary metrics

Finally, I wanted to calculate my total reading time for the year 2019 and my average daily reading time. I only started using FreeTime in March 2019, so I had to pro-rate some of my calculations. Here’s what I came up with:

df_dayinfo_2019 = df_dayinfo[(df_dayinfo.accessdate > datetime(2019, 1, 1)) & (df_dayinfo.accessdate < datetime(2020, 1, 1))]
days_in_2019 = (df_dayinfo_2019.accessdate.max() - df_dayinfo_2019.accessdate.min()).days

print('From {0:%d %b %Y} to {1:%d %b %Y} ({2} days):'.format(df_dayinfo_2019.accessdate.min(), 
                                                             df_dayinfo_2019.accessdate.max(), days_in_2019))
print('I read {0:.2f} hours'.format(df_dayinfo_2019.read_hours.sum()))
print("That's an average of {0:.2f} minutes per day".format((df_dayinfo_2019.read_mins.sum())/days_in_2019))
From 10 Mar 2019 to 29 Dec 2019 (294 days):
I read 111.18 hours
That's an average of 22.69 minutes per day

Bah! Only 22 minutes reading time per day on average?! Well, I know one goal I’ll need to work on for 2020. Lets see what this data looks like in some charts:

fig, ax = plt.subplots(figsize=(10, 8))
df_dayinfo_2019[['access_month', 'read_hours']].sort_values('access_month').groupby('access_month').sum().plot.bar(ax=ax)
_ = ax.set_title('Hours Read by Month: {0:%d %b %Y} to {1:%d %b %Y}'.format(df_dayinfo_2019.accessdate.min(), 
                                                                            df_dayinfo_2019.accessdate.max()))
_ = ax.set_xlabel('Month')
_ = ax.set_ylabel('Hours')
My monthly reading totals starting in March: May was a good month
fig, ax = plt.subplots(figsize=(10, 8))
df_dayinfo_2019[['access_dow', 'read_hours']].sort_values('access_dow').groupby('access_dow').sum().plot.bar(ax=ax)
_ = ax.set_title('Hours Read by Day of Week: {0:%d %b %Y} to {1:%d %b %Y}'.format(df_dayinfo_2019.accessdate.min(), 
                                                                                  df_dayinfo_2019.accessdate.max()))
_ = ax.set_xlabel('Day of Week')
_ = ax.set_ylabel('Hours')

_ = ax.set_xticklabels(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])

I read the most on Wednesdays. That makes sense because most of my Wednesday evenings in Spring are sitting in the parking lot outside a bandroom while my kid practices with his middle school orchestra. I get a lot of reading time in on those days.

There are other tables in the database including details on each of the books that I’ve read over the year. Hopefully, at some point, I’ll dig in to those details, as well. But for now, this data is sufficient to get me motivated to read more in 2020.