Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Category: technology (Page 16 of 36)

Logging in PowerShell

As far as I can tell, there are no logging APIs built into the PowerShell framework. So, when I have to write a code solution in PowerShell, I just add this snippet to meet some of my basic logging needs.

Set up some helper variables

I need to set the name and location of my log file. In addition, I like to add a “run id” to my logs so that I can easily group all the log entries associated with a given execution together for analysis or performance reporting:

$ExecutionDir = Split-Path $MyInvocation.MyCommand.Path
$RunId = ([guid]::NewGuid()).ToString().Replace("-", "")
$logFile = "$ExecutionDir\my_log.log"

Add my helper function

This simple helper function meets my basic logging needs:

function Write-Log($logMsg){
    ("{0}|{1}|{2}" -f ("{0:yyyy-MM-dd HH:mm:ss}" -f (Get-Date).ToUniversalTime()), $RunId, $logMsg) | Out-File $logFile -Append
}

Now, log away

Write-Log -logMsg "Starting up some script"
# do some work here
Sleep 2
Write-Log -logMsg "Finishing some script"

This produces a simple log like below. You can see my “run id” in the second column that helps me group log lines together by unique executions:

Sure, there are no log levels, no file rotations, no easy way to change the format…but in lieu of a built in API, this does the trick. One way it could be improved would be to stick this work into a module that I could simply import from script-to-script instead of copy/pasting this code around.

Wordclouds and Domains

I’m not a big fan of wordclouds, but management seems to like them. Recently, I was working on a wordcloud of domains and generated some unexpected results. For demonstration purposes, I grabbed the domains of stories Firefox Pocket recommended to me and shoved them into a dataframe:

df_domains = pd.DataFrame(domains, columns=['domain'])
df_domains.head()

Then, I took a list of the domains and preprocessed them in the conventional way you do for the package: you join them together likes words of text with spaces in between:

text = ' '.join(df_domains.domain.tolist())

Finally, I loaded those words into a wordcloud object:

wordcloud = WordCloud().generate(text)
_ = plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
_ = plt.title('Random Domains')

Which produced this wordcloud:

Anything missing?

Excellent…except…where are the top-level domains? All the .coms, .nets, etc? By Jove, they’re not there! If you check out the frequency map the wordcloud created, you can start to get a clue about what happened:

wordcloud.words_
{'theatlantic': 1.0,
 'theguardian': 0.6666666666666666,
 'outsideonline': 0.6666666666666666,
 'bbc': 0.6666666666666666,
 'theoutline': 0.6666666666666666,
 'washingtonpost': 0.3333333333333333,
 'mentalfloss': 0.3333333333333333,
 'citylab': 0.3333333333333333,
 'bloomberg': 0.3333333333333333,
 'popsci': 0.3333333333333333,
 'espn': 0.3333333333333333,
 'nytimes': 0.3333333333333333,
 'rollingstone': 0.3333333333333333,
 'inverse': 0.3333333333333333,
 'livescience': 0.3333333333333333,
 'newyorker': 0.3333333333333333,
 'nautil': 0.3333333333333333,
 'us': 0.3333333333333333,
 'theconversation': 0.3333333333333333,
 'vox': 0.3333333333333333,
 'hbr': 0.3333333333333333,
 'org': 0.3333333333333333,
 'wired': 0.3333333333333333,
 'lifehacker': 0.3333333333333333,
 'dariusforoux': 0.3333333333333333,
 'atlasobscura': 0.3333333333333333}

The “generate” function removed all the .coms, .nets, and so on when it built the frequency map. A little more digging and we can see that the default regular expression is the problem: “\w[\w’]+”. It’s looking for words (and even apostrophes), but stopping with punctuation marks like periods. Now, you can futz around with providing your own regular expression that will include the full domain–I tried that–but regular expressions are hard and there’s actually a better way: the pandas value_counts function. The value_counts function will let you generate your own frequency map that you can provide to the wordcloud package directly. First, let’s just take a look at what value_counts produces. We’ll pipe the results to a dictionary so that the data is in a form the wordcloud package needs:

df_domains.domain.value_counts().to_dict()
{'theatlantic.com': 3,
 'theoutline.com': 2,
 'outsideonline.com': 2,
 'bbc.com': 2,
 'theguardian.com': 2,
 'bloomberg.com': 1,
 'rollingstone.com': 1,
 'theconversation.com': 1,
 'wired.com': 1,
 'inverse.com': 1,
 'popsci.com': 1,
 'atlasobscura.com': 1,
 'mentalfloss.com': 1,
 'newyorker.com': 1,
 'espn.com': 1,
 'nytimes.com': 1,
 'hbr.org': 1,
 'nautil.us': 1,
 'washingtonpost.com': 1,
 'lifehacker.com': 1,
 'livescience.com': 1,
 'vox.com': 1,
 'citylab.com': 1,
 'dariusforoux.com': 1}

True, the values are integers and not floats, but wordcloud doesn’t care:

freqs = df_domains.domain.value_counts().to_dict()

wordcloud = WordCloud().generate_from_frequencies(freqs)
_ = plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
_ = plt.title('Random Domains')

And now we have a wordcloud of domains, complete with their top-level domain parts. Nice!

Converting file size values

Lately, I’ve been challenged with performing calculations and charting of file size values in different units of measure. For example, I’ll have file size values in gigabytes but will have to plot those values against terabytes of disk capacity. I’m a little surprised that Python doesn’t have a ready way to solve this problem. There is the hurry.filesize package, but that requires that you pass a bytes value into the function. What if you only have a gigabytes value to pass? Well, I came up with my own solution, largely inspired by similar solutions. Here’s my function:

import re


def convert_filesize(size, desired_uom, factor=1024):
    """Converts a provided computer data storage value to a different unit of measure.
    
    Keyword arguments:
    size -- a string of the current size (ex. '1.5 GB')
    desired_uom -- a string of the new unit of measure (ex. 'TB')
    factor -- the factor used in the conversion (default 1024)
    """
    uom_options = ['B', 'KB', 'MB', 'GB', 'TB']
    supplied_uom = re.search(r'[a-zA-Z]+', size)
    if supplied_uom:
        supplied_uom = supplied_uom.group()
    else:
        raise ValueError('size argument did not contain expected unit of measure')
        
    supplied_size = float(size.replace(supplied_uom, ''))
    supplied_size_in_bytes = supplied_size * (factor ** (uom_options.index(supplied_uom)))
    converted_size = supplied_size_in_bytes / (factor ** (uom_options.index(desired_uom)))
    return converted_size, '{0:,.2f} {1}'.format(converted_size, desired_uom)

Then, you’ll use it like so:

# non SI conversion
print('Using default conversion factor of 1024:')
print(convert_filesize('1024 B', 'KB'))
print(convert_filesize('1.5 GB', 'MB'))
print(convert_filesize('59.3 GB', 'TB'))

print('\nUsing this IEC/SI conversion factor of 1000:')
# conversion recommended by IEC (https://www.convertunits.com/from/MB/to/GB)
print(convert_filesize('1024 B', 'KB', factor=1000))
print(convert_filesize('1.5 GB', 'MB', factor=1000))
print(convert_filesize('59.3 GB', 'TB', factor=1000))

Which produces the following results:

Using default conversion factor of 1024:
(1.0, '1.00 KB')
(1536.0, '1,536.00 MB')
(0.05791015625, '0.06 TB')

Using this IEC/SI conversion factor of 1000:
(1.024, '1.02 KB')
(1500.0, '1,500.00 MB')
(0.0593, '0.06 TB')

I’m sure there’s much room for improvement, but this routine seems to meet my needs for now.

« Older posts Newer posts »

© 2025 DadOverflow.com

Theme by Anders NorenUp ↑