Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: tools (Page 9 of 35)

Maintaining a Positive Sentiment

For various reasons, companies like to know the emotional dispositions of their customers and, sometimes, their employees. One common way to determine these states is through a technique called Sentiment Analysis.

One challenge with sentiment analysis is simply knowing how to score words, one way or another. Our brains know that the word “happy” is usually associated with a positive sentiment because we have a dictionary of sorts in our brains that associates “happy” to a favorable emotional state. Computers don’t immediately know these associations, so you usually have to supply them such mappings, often in the form of simple lists of words.

Now, suppose you want to maintain the appearance of positivity with an organization with whom you interact–maybe through online reviews, email, chat, or some other written medium. Here’s an idea you can try.

Step 1: Downlist a list of positive words

Positive (and negative) word lists are easily available on the Internet. Here’s one I downloaded.

Step 2: Write some PowerShell to extract some random, positive words you can use in your correspondence

If you’re running Microsoft Windows, you’ll likely have PowerShell installed, which is a convenient tool to parse the file you downloaded and provide back a few random words. Here’s a one-liner I wrote:

((cat c:\positive-words.txt|? {!$_.startswith(';') -and $_.length -gt 0}|get-random -count 3) -join '. ') + '.'|set-clipboard

In this code, I load in the word file into memory and then filter out lines that start with semicolons–in the file I downloaded, the author used semicolons to denote comment lines I need to exclude–and empty lines. Next, I select three random lines as each word is on a new line. Next, I put a period between words. Finally, I send the results to the Windows clipboard for easy pasting into the editor in which you are communicating–email, chat window, etc. The result is something like this:

pleasantly. enthralled. idolize.

But wait, there’s more

The above is great, but to get your words, you’ll have to execute the code in a PowerShell command shell. Who has time for that? How about executing your PowerShell from a batch file? Add the following to a new bat file:

set c="((cat c:\positive-words.txt|? {!$_.startswith(';') -and $_.length -gt 0}|get-random -count 3) -join '. ') + '.'|set-clipboard "
powershell -command %c%

Now, place that bat file on your desktop or in a convenient area where you can double-click on it to get your words easily. Even better: make a shortcut to it with a tool like Slickrun.

Now, you can easily leverage positive words in your correspondence and maintain a positive persona to the watchers.

Parsing Oddly Formatted Spreadsheets

Python and pandas works well with conventionally formatted spreadsheets like this:

Conventional spreadsheet easily parsed in Python

But how do you deal with spreadsheets formatted in unconventional ways, like this?

Can you use pandas to parse an oddly formatted spreadsheet?

Here’s my approach to massaging this data into a dataframe I can work with.

Step 1: Go ahead and read in the wonky spreadsheet

Go ahead and read in the spreadsheet, warts and all, into a dataframe. I went ahead and skipped rows 0 and 1 as they were unnecessary:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')

As you’d expect, the results are not very pretty:

Step 2: Figure out where each record starts and stops

Looking at the spreadsheet, I determined that each record starts with a field named “First Name:” and ends with a field named “State:”. If I can put together a list of row indexes that lets me know where each record begins and ends, I should be able to iterate through that list and reformat each record uniformly. Pandas filtering can help with that. To get a list of each “start” row, I can use this code:

df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist()

To get a list of each “end” row, I can do this:

df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()

Finally, I can use Python’s handy zip function to glue both together in a list of tuples that I can easily loop through:

for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    # loop through each record

Step 3: Collect all key/value pairs per record

Now that I’m able to iterate over each record, I need to be able to capture each key/value pair in each record: each person’s first name, middle name (if available), last name, etc. I can use Python’s range function to loop from the starting row to the ending row of the record and pandas iloc function to zero in on each key and associated value:

person = {}  # I need some place to store the keys/values, so let's use a dictionary
for i in range(start_row, end_row+1):
    k = df_raw.iloc[i, 1]  # the keys are in column 1
    v = df_raw.iloc[i, 2]  # the values are in column 2

Each record has an empty row in the middle of it, separating “name” properties from “address” properties. I don’t need those empty rows, so I do a quick check before writing the keys and values to my dictionary object:

if len(k.strip()) > 0:
    person[k.strip().replace(':', '')] = v

Of course, I need to be writing each of these person objects to a master list, so I do that by appending each object:

people_list.append(person)

Step 4: Create a new dataframe from the people list

Finally, I can take that clean list of dictionaries and generate a new dataframe from it:

df_clean = pd.DataFrame(people_list)

Which renders a nice dataframe from which I can start my analysis:

That’s a little more like it!

So, putting it all together, my full code looks like this:

import pandas as pd

df_raw = pd.read_excel('./data/odd_format.xlsx', skiprows=1).fillna('')
people_list = []
for start_row, end_row in zip(df_raw[df_raw['Unnamed: 1']=='First Name:'].index.tolist(), df_raw[df_raw['Unnamed: 1']=='State:'].index.tolist()):
    person = {}
    for i in range(start_row, end_row+1):
        k = df_raw.iloc[i, 1]
        v = df_raw.iloc[i, 2]
        
        if len(k.strip()) > 0:
            person[k.strip().replace(':', '')] = v
            
    people_list.append(person)
    
df_clean = pd.DataFrame(people_list)

So, should you encounter similarly unconventionally formatted spreadsheets in the future, hopefully this code will help you find a solution to deal with them!

Easy window positioning with PowerToys

A few years ago, I wrote about a solution I developed for neatly positioning windows–especially command shell windows–in a particular monitor of my multi-monitor setup. The script I wrote positioned windows evenly across the width of the screen. Recently, though, I bought one of those rather wide, curvy screens and decide that, instead of stretching my windows evenly across that width, I’d rather place my windows in a grid pattern. I set down to re-write my script and then remembered Microsoft PowerToys.

When I wrote about PowerToys in the past, it was still pretty fledgling. For example, the FancyZones tool didn’t play well with monitors that sat to the left of your primary monitor (the X coordinate was a negative number and that likely threw off the tool). To my delight, though, these issues have been addressed and now PowerToys and FancyZones in particular is my tool of choice for positioning windows on all my monitors.

The other option worth mentioning is Windows Terminal. Windows Terminal houses most/all the command shells you probably use: the standard command prompt, PowerShell, Windows Sub-System for Linux, etc. It also lets you layout these shells however you wish–sort of like a FancyZones for just command shells. I’ve yet to experiment with Windows Terminal, though, so until then, PowerToys will do.

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑