DadOverflow.com

Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Page 38 of 57

Data driven investing

I found a recent episode of the Cashflow Academy podcast rather interesting and wanted to share some notes I took on the show. Host Andy Tanner interviewed Keith McCullough, professional investor and author of the book, Diary of a Hedge Fund Manager.

The crux of the interview was around Mr. McCullough’s data-driven approach to investing. Much of Mr. McCullough’s algorithms for predicting the path of financial markets is based on the second derivative of Calculus.

Everything that we do is based on the second derivative or the rate of change.

Keith McCullough

The two predominant categories McCullough factors into his second derivative calculations are growth data and inflation data. As an aside, investment guru Ray Dalio, in his book Principles, also stated that growth and inflation are the two most important challenges to solve for in investing.

I’ve learned over the years is that if we get those two very basic things right–growth and inflation and whether it’s accelerating or decelerating, getting better or worse–then we’re in a pretty good place.

Keith McCullough

GDP appears to one of his “growth data” datapoints. GDP just set a new record for accelerating for nine straight quarters in a row.

According to his calculations, though, that rate won’t maintain as he’s predicting a deceleration in Q1 2019. Consequently, he sold all his fast growth products and bought slower growing ones that do well when GDP and inflation are slowing–like long term bonds, the US dollar, and utility stocks.

Interest rates are another datapoint McCullough seems keenly interested in, given the tight relationship between interest rates and inflation. Interest rates seems to rise when both growth and inflation are accelerating at the same time.

McCullough then transitioned into discussing his Four Quad Model: Growth and Inflation, modeled on a second derivative basis, have one of four different outcomes or “quads”. Quad 2 is when both growth and inflation are accelerating at the same time. Quad 4 is the opposite: both growth and inflation are slowing at the same time. Interest rates also usually fall in a Quad 4. Interest rates have been on the rise for about the last three years. Now Mr. McCullough thinks that, in the next 3-4 quarters, inflation will start falling and interest rates will fall in kind.

Throughout the conversation, Mr. Tanner also injected some interesting thoughts. Mr. Tanner underscored four important principles in managing your finances:

  1. Gather fundamental data
  2. Gather technical data
  3. Maintain a position for cashflow
  4. and manage your risk by investing

In a brief tangent on real estate investing, Tanner mentioned two datapoints he valued highly:

  1. Net Operating Income (NOI)
  2. Capitalization Rate (aka Cap Rate)

Mr. Tanner also mentioned a few other terms I must research:

  • Factor exposures — popular quantitative strategies/predictive tracking algorithms
  • Momentum — I believe one type of “factor exposure”
  • High Beta
  • Technology Sector Factor
  • Delta Hedging

Some of my favorite Linux commands

At work, I administrate several Linux systems. That, plus the bundling of the Windows Subsystem for Linux has me in the Linux environment quite a lot. Here is a list of several of my go-to Linux commands:

ssh

I use ssh to remotely connect to endpoints that I administrate. A command like:

ssh -i ~/my_key.pem someuser@someendpoint

let’s me use a private key to quickly gain access to a server I need to work on.

scp

Once I connect to a remote system, I often have to upload or download files to the system. The “secure copy” (scp) utility does the trick! To, say, download a log file from a remote system to my workstation for analysis, I can run a command like this:

scp -i ~/my_key.pem someuser@someendpoint:my_app.log ./my_app.log

ps

I often administrate long-running processes, so getting a report of the running processes with the ps command comes in quite handy. Piping that report to grep is even better. Here’s what I do to check for running Python processes:

ps -ef|grep python

df/du

Even in this age of disk space abundance, I still have to pay close attention to disk space on the systems I manage (even my own workstations). Command like df and du help in this regard. With df, I’ll run a simple command like this to get a quick snapshot of the space available to the main drives mounted to my system:

df -h

Occasionally, I’ll have one or a few directories or files larger than others that I should focus on for freeing up disk space. The du command helps me drill down to the problem areas with a command like this:

du -h –max-depth=1

find

Find is a great command for helping me find directories or files that meet certain criteria. Some of the systems I manage write hundreds or thousands of data files to a single directory. I usually archive such data files by month in case I ever need to refer back to them. I’ll use find to find all the files produced in a particular month and pipe those files to a command like tar to archive them. For example, suppose I need to archive all the CSV and TSV files produced in January 2019. I’ll run a command like so:

find .(-name “*.csv” -o -name “*.tsv”) -type f -newermt “2019-01-01” ! -newermt “2019-02-01” -print0|tar -czvf Jan2019.tar.gz –null -T –

For a nice explanation of some of those arguments, check out this article.

Of course, now that I’ve archived those files, I don’t want to leave the originals laying around taking up disk space, so I need to remove them. I will now reuse my find command, this time piping it to xargs and the remove (rm) command:

find .(-name “*.csv” -o -name “*.tsv”) -type f -newermt “2019-01-01” ! -newermt “2019-02-01” -print0|xargs rm

cp/mv/rm/touch

Working with files means creating, copying, moving, renaming, and deleting them…among other tasks. Linux has commands for all these tasks and more with tools like copy (cp), move (mv), remove (rm), and touch–a nice command to quickly create a new, blank file.

cat/head/tail/less/more

There are plenty of tools in Linux for viewing files. For example, cat writes the entire contents of a file to the screen. That’s fine if the file is small, but if it’s large, you might spend the next minute or two watching data fill up and scroll across your screen. If I’m looking for a particular word or phrase, I might pipe cat to grep like so:

cat my_app.log|grep error

I tend to use tail a lot, too, for looking at the last few lines of a log file. With this command, I can look at the last 20 lines of my log file:

tail -20 my_app.log

The great thing now with WSL is that you can use all these powerful Linux commands against your own Windows file system, although Microsoft does caution about getting too crazy with that.

For more interesting actions you can do with Linux commands, check out this article.

LaTeX: inline versus display

I’m continually trying to strengthen my data science skills, in particular making heavy use of the excellent DataCamp.com. Obviously, data science is steeped in math and any discussion of a machine learning algorithm will inevitably touch on the underlying mathematical concepts.

As I progress in my learning, I’ve been taking notes in Jupyter Notebook because…why not? With its markdown and code capabilities, Jupyter Notebook is a fantastic medium for taking notes on machine learning topics, as those topics are full of both prose explaining the concepts and code executing the algorithms.

In my note taking, when my training displays a formula, I’ve been trying to re-write the formula in LaTeX in my notebook. For the most part, I’ve been successful reproducing those formulas thanks in large part to sites like Overleaf.com. However, my LaTeX still didn’t quite represent the formulas I would see in the training slides. Here’s how I represented the Ridge Regression calculation:

My LaTeX notation looked like this:

$\alpha * \sum_{i=1}^{n}a^{2}_i$

The lower and upper bounds of sigma aren’t…quite…right.

As I was looking through some of Overleaf.com’s content, though, I ran across this statement:

Note, that integral expression may seems a little different in inline and display math mode – in inline mode the integral symbol and the limits are compressed.

I also noted that some of their LaTeX expressions used double dollar signs. So, I changed my Ridge Regression expression to this:

$$\alpha * \sum_{i=1}^{n}a^{2}_i$$

And my expression rendered as such:

This fixed my lower and upper bounds problem, but it shifted the expression to the center of the cell. If you look at the rendered HTML, you’ll see that Jupyter Notebook adds a “text-align: center” style to the expression by default. Changing that style to “left” makes the formula a little more readable:

But not a whole lot. At any rate, it’s interesting to note these two different behaviors of LaTeX and and the pros and cons of each option. If you’re so inclined, you can find my LaTeX examples here.

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑