Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Category: technology (Page 31 of 36)

Exploring chess tournament results

Back in March, my son competed in the 2018 Queen City Classic Chess Tournament. The tournament coordinators graciously provided the player results online, although those files no longer appear on the site. At the time, I posted on the challenge of downloading the match results and parsing the values. After that, I had intended to do some exploratory data analysis (EDA) on the data and, ideally, see what sort of machine learning models I might want to build against the data.

Well, I did do some EDA work, but since grew a little restless and moved on to other projects; so, I want to go ahead and publish the little bit of work I did do on the data. Maybe next year, I’ll get to more interesting data modeling.

The tournament was composed of 699 players from 134 teams. Kindergarteners through high school 12th graders competed. Rated and non-rated players competed. Here’s a visual of that distribution across the grades:

The largest team, Detroit City Chess Club, brought almost 100 players! Here’s a look at the top 10 largest teams:

The average team size, though, was 5.2 players:

There were 14 competition categories by age and rated and non-rated. Dragon Chess Center dominated most categories:

That’s all I’ll post here, but be sure to check out the notebook I put together that has a lot more analysis.

Parsing my DataCamp.com Accomplishments

I’m a big fan of DataCamp.com. I’m on my second year with the training site and learning valuable data analysis skills all the time.

When you complete a course on the site, they usually send you an email of congratulations, along with a link to a certificate of your accomplishment and a handy link to add your certificate to your LinkedIn profile. I’ve completed multiple courses and added several to my profile; however, I know I’ve missed a few here and there. If you go to your profile page in DataCamp, you’ll see a page listing the different topics your training has covered so far, the tracks you’ve completed, and the courses you’ve completed. Each completed course includes a LinkedIn button allowing you to easily attached that completed course to your LinkedIn profile. That’s all well and good, but I’d also like to be able to download my certificates of completion for each course. It’d be great if DataCamp had a single “download” button that would allow me to download all my certificates of accomplishment at once. No matter: I can use Python to do that. Here’s how I solved that problem:

Step 1: Download my profile page

I could write Python to log into DataCamp.com for me and download my profile page, but for this step, I’ll just do it manually. In the site, manually navigate to the “My Learning Progress” link and then save the profile page to disk.

Step 2: Load the packages we’ll need

For this work, I’ll use BeautifulSoup, urllib.parse, urlretrieve, and csv packages:


1
2
3
4
from bs4 import BeautifulSoup
import urllib.parse
from urllib.request import urlretrieve
import csv

Step 3: Open my saved profile page and load it into a “soup” object:


1
2
with open('DataCamp.html') as f:
    soup = BeautifulSoup(f, 'lxml')

Step 4: Do the hard work

The first thing I need to do is figure out where in the HTML live the list of completed courses with which I want to work. After some digging around in the HTML, I determined that I need to look for a section element containing a profile-courses class. Underneath that element will be article nodes–one for each completed course. So, I’ll use BeautifulSoup to get me the list of those article nodes. Next, I’ll iterate through that node list and peel off the two values I’m interested in: the course title and the link to the statement of accomplishment. The course title is easy enough to find: it’s in a h4 tag under the article. The link to the statement of accomplishment is a little dodgier, though. It’s actually part of the query string in the LinkedIn link. No problem. I’ll just grab that link and split out the accomplishment link part. Since the accomplishment link is part of the query string, it’s url encoded. So, to turn it back into a real boy, er, url, I’ll use the unquote function of urllib.parse; I’ll write these values to a list for easier processing later:


1
2
3
4
5
6
7
8
completed_courses = soup.find('section', {'class': 'profile-courses'}).findAll('article')
completed_courses_list = [['course_name', 'certificate_url']]

for completed_course in completed_courses:
    course_name = completed_course.find('h4').string
    linkedin_url = completed_course.find('a', {'class': 'dc-btn--linkedin'})['href']
    cert_url = linkedin_url.split('&url=')[1]
    completed_courses_list.append([course_name, urllib.parse.unquote(cert_url)])

Step 5: Download all my statements of accomplishment

Now that I have an easy list to work from, I’ll download all my certificates in one fell swoop:


1
2
for completed_course in completed_courses_list[1:]:
    urlretrieve(completed_course[1], '{0}.pdf'.format(completed_course[0]))

 

Easy peasy!

More handy PowerShell snippets

In another installment of “handy PowerShell snippets“, I offer a few more I’ve used on occasion:

Comparing documents in PowerShell

WinMerge is a great tool for identifying differences between files, but if you want to automate such a process, PowerShell’s Compare-Object is an excellent choice.

Step 1: Load the documents you wish to compare


1
2
$first_doc = cat "c:\somepath\file1.txt"
$second_doc = cat "c:\somepath\file2.txt"

 Step 2: Perform your comparison.
Note that Compare-Object will return a “<=” indicating that a given value was found in the first file but not the second, a “=>” indicating a given value was found in the second file but not the first, or a “==” indicating that a given value was found in both files.


1
$items_in_first_list_not_found_in_second = ( Compare-Object -ReferenceObject $first_doc -DifferenceObject $second_doc | where { $_.SideIndicator -eq "<=" } | % { $_.InputObject } )

 Step 3: Analyze your results and profit!

One note of warning: In my experience, Compare-Object doesn’t do well comparing nulls. To avoid these circumstances, when I import the files I wish to compare, I’ll explicitly remove such troublesome values.


1
$filtered_doc = ( Import-Csv "c:\somepath\somedoc.csv" | where { $null -ne $_.SomeCol } | % { $_.SomeCol } )

 

Join a list of items into a single, comma-delimited line

Sometimes I’ll have a list of items in a file that I’ll need to collapse into a single, delimited line. Here’s a one-liner that will do that:


1
(cat "c:\somepath\somefile.csv") -join ","

 

Use a configuration file with a PowerShell script

A lot of times, PowerShell devs will either declare all their variables at the top of their scripts or in some sort of a custom configuration file that they load into their scripts. Here’s another option: how about leveraging the .NET framework’s configuration system?

If you’ve ever developed a .NET application, you’re already well aware of how to use configuration files. You can actually use that same strategy with PowerShell. For example, suppose you’ve built up a configuration file like so:


1
2
3
4
5
6
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="test_key" value="dadoverflow.com is awesome and i'm going to tell my friends all about it"/>
  </appSettings>
</configuration>

You can then load that config file into your PowerShell script with the following:


1
2
3
4
$script_path =$MyInvocation.MyCommand.Path

$my_config =[System.Configuration.ConfigurationManager]::OpenExeConfiguration($script_path)
$my_config_val = $my_config.AppSettings.Settings.Item("test_key").Value

One note: your PowerShell script and config file will need to share the same name. If your PowerShell script is called dadoverflow_is_awesome.ps1, then you’ll want to name your config file dadoverflow_is_awesome.ps1.config.

Here’s a bonus: Yes, it might be easier to just declare your variables at the top of your file and forgo the extra work of crafting such a config file. However, what if one of your configuration values is a password? By leveraging .NET’s configuration system you also get the power to encrypt values in your config file and hide them from prying eyes…but that’s a discussion that merits its own blog post, so stay tuned.

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑