DadOverflow.com

Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Page 47 of 57

Roots Magic and Jupyter Notebook: like peas and carrots


via GIPHY

Roots Magic seems to be a popular tool among genealogists. At least, the Genealogy Guys certainly recommend it.

I’ve been using the tool for about a year now. My previous genealogy database tool seemed to lock away my data in its own proprietary database confining me to the queries and views exposed only in its user interface. Before I switched away, though, I wanted to make sure my next tool would give me a little more latitude with regard to accessing my data. At a genealogy conference in 2017, I actually had a short conversation with the head honcho himself, Bruce Buzbee, and voiced this concern. Bruce briefly mentioned Roots Magic’s connection with sqlite. Interest piqued, I bought his software and made the switch.

It seems like most of the work exploring the sqlite foundation of Roots Magic has been captured at the site SQLiteToolsForRootsMagic. [Side note: Wikispaces, the platform on which SQLiteToolsForRootsMagic is built is going away, so, by the time you read this, the SQLiteTools site might be no more. Fortunately, the site owners are developing migration plans, so stay tuned.] These folks tend to use clients like SQLiteSpy to run their queries. Nothing wrong with that, but since my favorite development canvas lately is Jupyter Notebook, I asked myself, “Self, could you query a Roots Magic database in Jupyter Notebook?” The answer is: absolutely!  Here are some of the steps I took to query a Roots Magic database in Jupyter Notebook.

Step 1: Import the necessary packages

Load all my go-to packages including pandas and matplotlib as well as sqlite3 and Pivotal’s SQL Magic (SQL Magic isn’t necessary, but it makes writing SQL queries a little nicer):


1
2
3
4
5
6
7
8
9
10
import sqlite3
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.style as style

%load_ext sql_magic
%matplotlib inline
plt.style.use('fivethirtyeight')

Step 2: Connect to my Roots Magic file

For demonstration purposes, I grabbed a copy of George Washington’s family tree and saved it off in Roots Magic.  Connecting to the file is pretty darn easy:


1
2
3
4
# Note that sqlite seems to require a full path to the file you wish to load
conn = sqlite3.connect("C:\\data_files\\qsync_laptop\\jupyter_notebooks\\query_gen_dbs\\GeorgeWashingtonFamilyBig.rmgc")
%config SQL.conn_name='conn'  # useful when using sql_magic
cur = conn.cursor()

Step 3: Go to town!

At this point, the only real challenge is dealing with the COLLATE NOCASE issue, but that’s just a minor inconvenience.  After that, it’s just spending time understanding the Roots Magic database schema and the relationships between tables.  SQLiteToolsForRootsMagic has really blazed the trail here, so I encourage you to spend some time on the site looking over the hundreds of posted queries to get a better understanding of the database schema.

With direct access to the database, you can print out facts about your data that may not be exposed in the Roots Magic user interface:


1
2
3
4
5
6
7
8
9
10
11
12
cur.execute("SELECT OwnerID FROM NameTable")
nbr_of_people = len(cur.fetchall())
cur.execute("SELECT FamilyID FROM FamilyTable")
nbr_of_families = len(cur.fetchall())
cur.execute("SELECT FamilyID FROM FamilyTable")
nbr_of_families = len(cur.fetchall())
cur.execute("SELECT FamilyID FROM EventTable")
nbr_of_events = len(cur.fetchall())

print('This database contains {0} individuals.'.format(nbr_of_people))
print('It includes {0} families.'.format(nbr_of_families))
print('It includes {0} events.'.format(nbr_of_events))

Output:

This database contains 529 individuals.
It includes 114 families.
It includes 2679 events.

Or this:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
%%read_sql df_ages -d
SELECT OwnerID
    ,Surname COLLATE NOCASE AS Surname
    ,Given COLLATE NOCASE AS Given
    ,BirthYear AS BirthYear
    ,DeathYear AS DeathYear
    ,(DeathYear - BirthYear) AS age
FROM NameTable n
WHERE COALESCE(BirthYear, 0) > 0 AND COALESCE(DeathYear, 0) > 0
    AND (age BETWEEN 0 AND 110) --remove anyone over 110 years of age or under 0 as that's a likely error

oldest = df_ages.sort_values('
age', ascending=False).head(1)
youngest = df_ages.sort_values('
age').head(1)
print('
This family tree contains {0} individuals with recorded birth and death dates.'.format(df_ages.shape[0]))
print('
The oldest person in the tree is {0} {1} at age {2}.'.format(oldest.Given.values[0], oldest.Surname.values[0],
                                                                   oldest.age.values[0]))
print('
The youngest person in the tree is {0} {1} at age {2}.'.format(youngest.Given.values[0], youngest.Surname.values[0],
                                                                      youngest.age.values[0]))
print('
The average age for family members in this tree is {0:.1f} years.'.format(df_ages.age.mean()))
print('
The median age for family members in this tree is {0:.1f} years.'.format(df_ages.age.median()))

Output:

This family tree contains 192 individuals with recorded birth and death dates.
The oldest person in the tree is John WASHINGTON at age 99.
The youngest person in the tree is Mildred WASHINGTON at age 1.
The average age for family members in this tree is 52.9 years.
The median age for family members in this tree is 52.0 years.

How about some charts?

You can find this work and more on my Github page.

Don’t forget about Linux on Windows

I love articles on “Top X tools for…whatever”.  Recently, I was reading an article on essential tools for data scientists.  Tool #1: wget.  I’ve used wget in the past at work and suddenly realized I actually had a need for wget at home.  Unfortunately, wget is a tool for Linux operating systems, not Windows, the system I typically use at home.  What to do?  I’ve been using Git Bash quite a bit lately and that shell includes several Linux utilities.  Maybe it has wget?  No dice.  You can add more utilities to Git Bash, but I wonder if there’s a better solution.

Again at work I occasionally use Cygwin and was just about to download it to my home workstation when I suddenly said to myself, “Self, you silly goose!  Microsoft just released a Linux sub-system for Windows!  Why don’t you install that?”  So, that’s what I did!

Step 1: Enable the Linux Subsystem

Obviously, you’ll want to make sure you’re version of Windows 10 supports the Linux subsystem.  If it does, fire up a PowerShell command shell as Administrator.  The easiest way to do that is to right-click on your Start Menu icon on your taskbar.  A context menu will pop up with the option Windows PowerShell (Admin).  In the shell, execute this statement:


1
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

Step 2: Install the Linux “app” of your choosing

I rarely use the Windows Store, but once you enable your system to support the Linux sub-system, you’ll have to jump into the store and install the particular flavor or flavors of Linux you prefer.  Yes, that’s right: apparently you can install more than one flavor of Linux.  HowToGeek.com has a detailed article on installing the Linux app from the Windows Store that I highly recommend.  I was particularly excited to see Kali Linux as an option!

Note well: It seems like Microsoft recommends not letting Windows software interact with files you manage in you Linux sub-system.  For my immediate purposes, I simply wanted to download some web resources with wget.  I would, though, like to read that downloaded content in one of my Windows browsers, but where the heck are those files?

To find your home directory in your Linux sub-system, follow these steps:

  1. Open Windows Explorer, and drop this in the address bar: %LOCALAPPDATA%\Packages
  2. That takes you to a directory of installed packages.  Find the directory of the Linux sub-system you installed (hint: the folder should start with the name of the distribution like “KaliLinux” and the timestamp of the folder should match the date and time you installed the sub-system).
  3. Double-click into the appropriate Linux folder then navigate to LocalState\rootfs\home and look for your user.

Now, you have access to the folders and files you created in your home directory.  Again, though, treat those resources with kid gloves.

So now you have wget and a host of other cool Linux tools all on Windows!

 

Exploring chess tournament results

Back in March, my son competed in the 2018 Queen City Classic Chess Tournament. The tournament coordinators graciously provided the player results online, although those files no longer appear on the site. At the time, I posted on the challenge of downloading the match results and parsing the values. After that, I had intended to do some exploratory data analysis (EDA) on the data and, ideally, see what sort of machine learning models I might want to build against the data.

Well, I did do some EDA work, but since grew a little restless and moved on to other projects; so, I want to go ahead and publish the little bit of work I did do on the data. Maybe next year, I’ll get to more interesting data modeling.

The tournament was composed of 699 players from 134 teams. Kindergarteners through high school 12th graders competed. Rated and non-rated players competed. Here’s a visual of that distribution across the grades:

The largest team, Detroit City Chess Club, brought almost 100 players! Here’s a look at the top 10 largest teams:

The average team size, though, was 5.2 players:

There were 14 competition categories by age and rated and non-rated. Dragon Chess Center dominated most categories:

That’s all I’ll post here, but be sure to check out the notebook I put together that has a lot more analysis.

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑