I’m a big fan of DataCamp.com. I’m on my second year with the training site and learning valuable data analysis skills all the time.

When you complete a course on the site, they usually send you an email of congratulations, along with a link to a certificate of your accomplishment and a handy link to add your certificate to your LinkedIn profile. I’ve completed multiple courses and added several to my profile; however, I know I’ve missed a few here and there. If you go to your profile page in DataCamp, you’ll see a page listing the different topics your training has covered so far, the tracks you’ve completed, and the courses you’ve completed. Each completed course includes a LinkedIn button allowing you to easily attached that completed course to your LinkedIn profile. That’s all well and good, but I’d also like to be able to download my certificates of completion for each course. It’d be great if DataCamp had a single “download” button that would allow me to download all my certificates of accomplishment at once. No matter: I can use Python to do that. Here’s how I solved that problem:

Step 1: Download my profile page

I could write Python to log into DataCamp.com for me and download my profile page, but for this step, I’ll just do it manually. In the site, manually navigate to the “My Learning Progress” link and then save the profile page to disk.

Step 2: Load the packages we’ll need

For this work, I’ll use BeautifulSoup, urllib.parse, urlretrieve, and csv packages:


1
2
3
4
from bs4 import BeautifulSoup
import urllib.parse
from urllib.request import urlretrieve
import csv

Step 3: Open my saved profile page and load it into a “soup” object:


1
2
with open('DataCamp.html') as f:
    soup = BeautifulSoup(f, 'lxml')

Step 4: Do the hard work

The first thing I need to do is figure out where in the HTML live the list of completed courses with which I want to work. After some digging around in the HTML, I determined that I need to look for a section element containing a profile-courses class. Underneath that element will be article nodes–one for each completed course. So, I’ll use BeautifulSoup to get me the list of those article nodes. Next, I’ll iterate through that node list and peel off the two values I’m interested in: the course title and the link to the statement of accomplishment. The course title is easy enough to find: it’s in a h4 tag under the article. The link to the statement of accomplishment is a little dodgier, though. It’s actually part of the query string in the LinkedIn link. No problem. I’ll just grab that link and split out the accomplishment link part. Since the accomplishment link is part of the query string, it’s url encoded. So, to turn it back into a real boy, er, url, I’ll use the unquote function of urllib.parse; I’ll write these values to a list for easier processing later:


1
2
3
4
5
6
7
8
completed_courses = soup.find('section', {'class': 'profile-courses'}).findAll('article')
completed_courses_list = [['course_name', 'certificate_url']]

for completed_course in completed_courses:
    course_name = completed_course.find('h4').string
    linkedin_url = completed_course.find('a', {'class': 'dc-btn--linkedin'})['href']
    cert_url = linkedin_url.split('&url=')[1]
    completed_courses_list.append([course_name, urllib.parse.unquote(cert_url)])

Step 5: Download all my statements of accomplishment

Now that I have an easy list to work from, I’ll download all my certificates in one fell swoop:


1
2
for completed_course in completed_courses_list[1:]:
    urlretrieve(completed_course[1], '{0}.pdf'.format(completed_course[0]))

 

Easy peasy!