Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Author: Brad (Page 23 of 57)

Dad. Technologist. Fan of English poet of John Lillison.

Loguru, ftw

In virtually all the applications and operations I write, I try to incorporate some level of logging so that my code can be adequately supported, particularly in Production environments. Some time ago, I wrote about how I generally log in my Python applications. Well, lately, I’ve switched from that approach to using Loguru and I must say I’m rather satisfied with its ease of use. Here’s a quick example I put together recently of the package:

Step 1: Do your standard imports

As I explained in other posts on logging, I like adding a “run id” to each log line so that I can easily group lines together belonging to a single instance of my application, so I import the uuid package to help in that regard:

import os
import sys
import uuid
from loguru import logger

Step 2: Setup/customize my logging context

In one line, I can customize how each log line is written and set logging behavior like rolling the file when it hits 10 Mb in size:

runid = str(uuid.uuid4()).split('-')[-1]
logger.add('loguru_example.log', format='{time}|{extra[runid]}|{level}|{message}', level='INFO', rotation='10 MB')
logger_ctx = logger.bind(runid=runid)

Step 3: Start logging

def main(argv):
    logger_ctx.info('Starting run of the loguru_example.py script')
    # do some stuff
    logger_ctx.info('Completing run of the loguru_example.py script')


if __name__ == '__main__':
    main(sys.argv[1:])

And now you have a nice and easy log for your application:

Pretty darn simple! So now there’s no excuse: start logging today!

Logging in PowerShell

As far as I can tell, there are no logging APIs built into the PowerShell framework. So, when I have to write a code solution in PowerShell, I just add this snippet to meet some of my basic logging needs.

Set up some helper variables

I need to set the name and location of my log file. In addition, I like to add a “run id” to my logs so that I can easily group all the log entries associated with a given execution together for analysis or performance reporting:

$ExecutionDir = Split-Path $MyInvocation.MyCommand.Path
$RunId = ([guid]::NewGuid()).ToString().Replace("-", "")
$logFile = "$ExecutionDir\my_log.log"

Add my helper function

This simple helper function meets my basic logging needs:

function Write-Log($logMsg){
    ("{0}|{1}|{2}" -f ("{0:yyyy-MM-dd HH:mm:ss}" -f (Get-Date).ToUniversalTime()), $RunId, $logMsg) | Out-File $logFile -Append
}

Now, log away

Write-Log -logMsg "Starting up some script"
# do some work here
Sleep 2
Write-Log -logMsg "Finishing some script"

This produces a simple log like below. You can see my “run id” in the second column that helps me group log lines together by unique executions:

Sure, there are no log levels, no file rotations, no easy way to change the format…but in lieu of a built in API, this does the trick. One way it could be improved would be to stick this work into a module that I could simply import from script-to-script instead of copy/pasting this code around.

Wordclouds and Domains

I’m not a big fan of wordclouds, but management seems to like them. Recently, I was working on a wordcloud of domains and generated some unexpected results. For demonstration purposes, I grabbed the domains of stories Firefox Pocket recommended to me and shoved them into a dataframe:

df_domains = pd.DataFrame(domains, columns=['domain'])
df_domains.head()

Then, I took a list of the domains and preprocessed them in the conventional way you do for the package: you join them together likes words of text with spaces in between:

text = ' '.join(df_domains.domain.tolist())

Finally, I loaded those words into a wordcloud object:

wordcloud = WordCloud().generate(text)
_ = plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
_ = plt.title('Random Domains')

Which produced this wordcloud:

Anything missing?

Excellent…except…where are the top-level domains? All the .coms, .nets, etc? By Jove, they’re not there! If you check out the frequency map the wordcloud created, you can start to get a clue about what happened:

wordcloud.words_
{'theatlantic': 1.0,
 'theguardian': 0.6666666666666666,
 'outsideonline': 0.6666666666666666,
 'bbc': 0.6666666666666666,
 'theoutline': 0.6666666666666666,
 'washingtonpost': 0.3333333333333333,
 'mentalfloss': 0.3333333333333333,
 'citylab': 0.3333333333333333,
 'bloomberg': 0.3333333333333333,
 'popsci': 0.3333333333333333,
 'espn': 0.3333333333333333,
 'nytimes': 0.3333333333333333,
 'rollingstone': 0.3333333333333333,
 'inverse': 0.3333333333333333,
 'livescience': 0.3333333333333333,
 'newyorker': 0.3333333333333333,
 'nautil': 0.3333333333333333,
 'us': 0.3333333333333333,
 'theconversation': 0.3333333333333333,
 'vox': 0.3333333333333333,
 'hbr': 0.3333333333333333,
 'org': 0.3333333333333333,
 'wired': 0.3333333333333333,
 'lifehacker': 0.3333333333333333,
 'dariusforoux': 0.3333333333333333,
 'atlasobscura': 0.3333333333333333}

The “generate” function removed all the .coms, .nets, and so on when it built the frequency map. A little more digging and we can see that the default regular expression is the problem: “\w[\w’]+”. It’s looking for words (and even apostrophes), but stopping with punctuation marks like periods. Now, you can futz around with providing your own regular expression that will include the full domain–I tried that–but regular expressions are hard and there’s actually a better way: the pandas value_counts function. The value_counts function will let you generate your own frequency map that you can provide to the wordcloud package directly. First, let’s just take a look at what value_counts produces. We’ll pipe the results to a dictionary so that the data is in a form the wordcloud package needs:

df_domains.domain.value_counts().to_dict()
{'theatlantic.com': 3,
 'theoutline.com': 2,
 'outsideonline.com': 2,
 'bbc.com': 2,
 'theguardian.com': 2,
 'bloomberg.com': 1,
 'rollingstone.com': 1,
 'theconversation.com': 1,
 'wired.com': 1,
 'inverse.com': 1,
 'popsci.com': 1,
 'atlasobscura.com': 1,
 'mentalfloss.com': 1,
 'newyorker.com': 1,
 'espn.com': 1,
 'nytimes.com': 1,
 'hbr.org': 1,
 'nautil.us': 1,
 'washingtonpost.com': 1,
 'lifehacker.com': 1,
 'livescience.com': 1,
 'vox.com': 1,
 'citylab.com': 1,
 'dariusforoux.com': 1}

True, the values are integers and not floats, but wordcloud doesn’t care:

freqs = df_domains.domain.value_counts().to_dict()

wordcloud = WordCloud().generate_from_frequencies(freqs)
_ = plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
_ = plt.title('Random Domains')

And now we have a wordcloud of domains, complete with their top-level domain parts. Nice!

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑