Notepad++ was #7 on my list of awesome, free tools and for good reason: it just rocks!
The other day, I found myself working through some online training with Pandas dataframes and friends. Part of the course included working exercises in the site’s web-based Python console. When I work such exercises, I like to copy off the code I come up with to local Jupyter Notebooks that I can easily reference in the future. I also like to copy down whatever test data the exercises have me working with, so I can make sure I calculate the same results locally that I do in the exercises.
Here’s the challenge: how do you copy the dataframes from the online exercises to your local Jupyter Notebook? Usually, when I want to copy off a dataframe, I’ll call the to_csv() function to save the contents of the dataframe to a CSV file that I can easily transport. That’s not really an option with online exercises, though. Here’s a thought: what about the to_dict() function to write the contents of the dataframe to the standard out of the online console as a dictionary? Then, I can copy that dictionary over to my Notebook. Let’s see what that looks like:
The view from the online training console
Let’s pretend for a moment that we’re in the online training console and we’re working some dataset (for simplicity, I’m using the Iris dataset). The dataframe in the console might look like this:
Now, we can use to_dict() to write the dataframe out to the console (note: for brevity, I’m only writing out the first 5 records):
Copy the dictionary locally
With the dictionary written out to the online console, let’s copy that output to our clipboard, paste it into a local Notebook, and see if we can now load it into a local dataframe:
Well, waddya know? That worked! Wait a minute…where’s Notepad++ in all of this?
Yes, this approach works when you’re copying the dictionary directly into a code cell in a local Jupyter Notebook, but what if, instead, you copy that dictionary into a JSON file that you then load into a dataframe?
That approach doesn’t end well. The reason is that Pandas doesn’t consider that copied JSON as valid JSON. Specifically, it wants all the key names to be surrounded by quotation marks.
Incidentally, you might be asking, why would you copy the dictionary to a file when I’ve already demonstrated that you can copy it directly into a local Notebook? The big reason is size: copying a dictionary of five rows is fine, but what if the dataframe you’re working with has 200 rows? That becomes a very long dictionary that really muddies up your local Notebook. To keep things clean, I find it best to copy such dictionaries to a JSON file.
So, how do you format this copied dictionary so Pandas can load it successfully? Notepad++! Notepad++ has a great find/replace feature that let’s you use regular expressions. What I need to do is find all numbers that serve as key names in my dictionary and make sure these keys are surrounded by quotation marks.
For my “find” regular expression, I’ll use this: (\d+):
With this expression, I look for digits that are followed by a colon. I’ll group those digits so that I can reference them in the “replace”.
For my “replace” expression, I’ll use this: \”\1\”:
The \1 refers to my group of digits. I’m surrounding that group with quotation marks and then making sure the colon follows. That yields the following:
And when we load that local JSON file into our local dataframe:
…we get success! So, just one clever way Notepad++ has really helped me out.
Recent Comments