Often in my notebooks, I will connect to a relational database or other data store, query the system for data, and then do all sorts of amazing operations with said data. Many times, these data stores are restricted to select users and I must authenticate myself to the system–usually with an id and password. One might be inclined to code such connection strings inline in his Jupyter Notebook. However, I usually check my notebooks in to source control and/or hand them in to management as reports or documentation. Thus, any number of people might see my notebooks potentially compromising my personal id and password were I to code the credentials inline.
So, how can I hide my secrets–my connection strings and other sensitive information–so I can still safely share the good work I do in my notebooks? The way I do it is by moving my connection strings to configuration files. Allow me to demonstrate:
Step 1: Import my packages
from sqlalchemy import create_engine
import pandas as pd
from configparser import ConfigParser
I import the usual suspects–SQLAlchemy for database management and pandas for my dataframe work–but I’m also loading in configparser. It’s this last package that will help me pull out my secret stuff to a separate file that I can protect.
Step 2: Create my configuration file
Now, I need to create that separate configuration file. In the same directory as my notebook, I’ll create a text file. I usually name my file nb.cfg–as in, notebook config. For my example, storing the connection string to my SQLite database, my configuration file looks like so:
[my_db]
conn_string: sqlite:///mwc.db
Although SQLite databases don’t have authentication requirements, you can imagine, say, a connection string to a PostgreSQL database that would contain an id and password.
Step 3: Load the configuration file
Back in your notebook, load your configuration file:
parser = ConfigParser()
_ = parser.read('nb.cfg')
Step 4: Access the secrets in your configuration file
Now you’re ready to access those secrets! In this example, I’ll pass my secret connection string to my database engine object:
engine = create_engine(parser.get('my_db', 'conn_string'))
Step 5: Profit!
That’s basically it. In my example, I can now use my database engine object to query a table in my database and load the results into a dataframe:
qry = """
SELECT *
FROM people
"""
df_mwc_people = pd.read_sql(qry, engine)
Check out the complete code example here.
Postscript
You might ask yourself, “self, do I need to do anything else to protect my config file from getting into the hands of my enemies?” Well, since I often use Git for source control, I do want to make sure I don’t accidentally check my configuration file into my source code repository. To avoid that problem, I create a .gitignore file and add the name of my configuration file to it. Then, every time I commit a change, Git will simply ignore committing my configuration file.
Recent Comments