Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: jupyter_notebook (Page 12 of 17)

LaTeX: inline versus display

I’m continually trying to strengthen my data science skills, in particular making heavy use of the excellent DataCamp.com. Obviously, data science is steeped in math and any discussion of a machine learning algorithm will inevitably touch on the underlying mathematical concepts.

As I progress in my learning, I’ve been taking notes in Jupyter Notebook because…why not? With its markdown and code capabilities, Jupyter Notebook is a fantastic medium for taking notes on machine learning topics, as those topics are full of both prose explaining the concepts and code executing the algorithms.

In my note taking, when my training displays a formula, I’ve been trying to re-write the formula in LaTeX in my notebook. For the most part, I’ve been successful reproducing those formulas thanks in large part to sites like Overleaf.com. However, my LaTeX still didn’t quite represent the formulas I would see in the training slides. Here’s how I represented the Ridge Regression calculation:

My LaTeX notation looked like this:

$\alpha * \sum_{i=1}^{n}a^{2}_i$

The lower and upper bounds of sigma aren’t…quite…right.

As I was looking through some of Overleaf.com’s content, though, I ran across this statement:

Note, that integral expression may seems a little different in inline and display math mode – in inline mode the integral symbol and the limits are compressed.

I also noted that some of their LaTeX expressions used double dollar signs. So, I changed my Ridge Regression expression to this:

$$\alpha * \sum_{i=1}^{n}a^{2}_i$$

And my expression rendered as such:

This fixed my lower and upper bounds problem, but it shifted the expression to the center of the cell. If you look at the rendered HTML, you’ll see that Jupyter Notebook adds a “text-align: center” style to the expression by default. Changing that style to “left” makes the formula a little more readable:

But not a whole lot. At any rate, it’s interesting to note these two different behaviors of LaTeX and and the pros and cons of each option. If you’re so inclined, you can find my LaTeX examples here.

The goodness of Notepad++

Notepad++ was #7 on my list of awesome, free tools and for good reason: it just rocks!

The other day, I found myself working through some online training with Pandas dataframes and friends.  Part of the course included working exercises in the site’s web-based Python console.  When I work such exercises, I like to copy off the code I come up with to local Jupyter Notebooks that I can easily reference in the future.  I also like to copy down whatever test data the exercises have me working with, so I can make sure I calculate the same results locally that I do in the exercises.

Here’s the challenge: how do you copy the dataframes from the online exercises to your local Jupyter Notebook?  Usually, when I want to copy off a dataframe, I’ll call the to_csv() function to save the contents of the dataframe to a CSV file that I can easily transport.  That’s not really an option with online exercises, though.  Here’s a thought: what about the to_dict() function to write the contents of the dataframe to the standard out of the online console as a dictionary?  Then, I can copy that dictionary over to my Notebook.  Let’s see what that looks like:

The view from the online training console

Let’s pretend for a moment that we’re in the online training console and we’re working some dataset (for simplicity, I’m using the Iris dataset).  The dataframe in the console might look like this:

Now, we can use to_dict() to write the dataframe out to the console (note: for brevity, I’m only writing out the first 5 records):

Copy the dictionary locally

With the dictionary written out to the online console, let’s copy that output to our clipboard, paste it into a local Notebook, and see if we can now load it into a local dataframe:

Well, waddya know?  That worked!  Wait a minute…where’s Notepad++ in all of this?

Yes, this approach works when you’re copying the dictionary directly into a code cell in a local Jupyter Notebook, but what if, instead, you copy that dictionary into a JSON file that you then load into a dataframe?

That approach doesn’t end well.  The reason is that Pandas doesn’t consider that copied JSON as valid JSON.  Specifically, it wants all the key names to be surrounded by quotation marks.

Incidentally, you might be asking, why would you copy the dictionary to a file when I’ve already demonstrated that you can copy it directly into a local Notebook?  The big reason is size: copying a dictionary of five rows is fine, but what if the dataframe you’re working with has 200 rows?  That becomes a very long dictionary that really muddies up your local Notebook.  To keep things clean, I find it best to copy such dictionaries to a JSON file.

So, how do you format this copied dictionary so Pandas can load it successfully?  Notepad++!  Notepad++ has a great find/replace feature that let’s you use regular expressions.  What I need to do is find all numbers that serve as key names in my dictionary and make sure these keys are surrounded by quotation marks.

For my “find” regular expression, I’ll use this: (\d+):

With this expression, I look for digits that are followed by a colon.  I’ll group those digits so that I can reference them in the “replace”.

For my “replace” expression, I’ll use this: \”\1\”:

The \1 refers to my group of digits.  I’m surrounding that group with quotation marks and then making sure the colon follows.  That yields the following:

And when we load that local JSON file into our local dataframe:

…we get success!  So, just one clever way Notepad++ has really helped me out.

Documenting your jupyter notebooks

A recent episode of the excellent podcast Talk Python to Me discussed an effort to collect and analyze some one million Jupyter Notebooks on Github.  Unsurprisingly, one conclusion drawn by the analyst is that notebook authors are not good at documenting their work.  I find that a little sad, given how rich Jupyter markdown is.

I have found the markdown syntax to be a little confusing, but recently I found this great “cheatsheet” that has helped:

I haven’t had a whole lot of opportunity to work with LaTeX, but when I have, it has been a challenge.  Here’s a cheatsheet that’s been helpful in the past:

 

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑