Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: tools (Page 28 of 35)

Scanning slides

My dad was a big fan of slides as a film medium back in the day.  I have boxes of these suckers that span at least the last four decades of the Twentieth Century.

A few years ago, I bought one of those slide scanner gizmos and started in on the overwhelming task.  One thing I immediately found frustrating was the timestamp: my scanner would affix a timestamp–June 1, 2013–to every slide.  I examined every nook and cranny of the menu to see where I could set the current time and date and found no way to set it.

To me, having an accurate timestamp on my scanned photos is important–it lets me know, and anyone else with whom I share my files, when I actually did the work.  It can help me group images together that may all be part of the same set.  It might even help me identify unknown individuals in the photos.   So, it’s important to get the date right on the files.

What can I do?  I know: PowerShell can solve this problem!  Here’s a short code snippet I now use to get the dates right on my scanned slides:


1
2
$dir = "C:\my_path\slides"  # set the filepath to my slides
gci $dir | where {$_.Extension.ToLower() -eq ".jpg"} | foreach{$_.CreationTime = (Get-Date); $_.LastWriteTime = (Get-Date)}

This will at least set the image timestamp to the current date and time.  Assuming you scanned the slides on the same day, you should be set.  If you want to set the timestamp to an earlier date, that can easily be done, as well, but I won’t go into it here.  Also know that there are a number of date properties with a file: create date, last modified date, etc.  I’m using a “hammer” approach to this problem by setting both the CreationTime and LastWriteTime properties.

My scanner also has a pre-set naming convention for the slides it scans.  In general, that’s fine with me, but you can also use PowerShell to easily rename your scanned images to your own convention.  Here, I want to indicate that all the slides in a given directory are part of the same group, so I basically add a “_grp001” suffix to the end of each file:


1
2
3
$dir = "C:\data_files\slides2"
$suffix = '_grp001.jpg'
gci $dir | where {$_.Extension.ToLower() -eq ".jpg"} | foreach{mv $_.FullName $_.FullName.ToLower().Replace('.jpg', $suffix)}

 

Documenting your jupyter notebooks

A recent episode of the excellent podcast Talk Python to Me discussed an effort to collect and analyze some one million Jupyter Notebooks on Github.  Unsurprisingly, one conclusion drawn by the analyst is that notebook authors are not good at documenting their work.  I find that a little sad, given how rich Jupyter markdown is.

I have found the markdown syntax to be a little confusing, but recently I found this great “cheatsheet” that has helped:

I haven’t had a whole lot of opportunity to work with LaTeX, but when I have, it has been a challenge.  Here’s a cheatsheet that’s been helpful in the past:

 

Annotating the War on Poverty

The other day, I was listening to the Contra Krugman episode entitled “How to Unwind the Welfare State”. Toward the end of the discussion, the hosts began listing examples of private organizations in the free market solving social problems only to be stymied when the federal government began to insert itself into the situation. Host Bob Murphy referenced an article he wrote for FEE where he discussed how, in the 1950s and 60s, the free market was already lifting people out of poverty at a pretty good clip just to have Lyndon Johnson and the federal government jump on the bandwagon halfway through and claim that it was their legislation, not the free market, that did all the heavy lifting.

I couldn’t find the article Bob was referencing (maybe it was this?); nevertheless, it occurred to me this might be an opportunity to improve my matplotlib skills. Maybe I could find the official US poverty numbers, plot them out, then annotate the plot with markers indicating when key legislation in the War on Poverty was enacted. Would this convey the point Bob was making?  Here are highlights of what I did (the full code is available on my Github page):

Step 1: Get the data

Is it me or is it just confusing downloading the data you want from the US government?  The US Census Bureau publishes the poverty numbers, but I found it very confusing which numbers I needed and for the time period in which I needed it.  I finally found a dataset I could use on the page, Historical Poverty Tables: People and Families – 1959 to 2016.

Step 2: Load the data

Here’s a snippet of the spreadsheet I downloaded:

Makes sense, I guess, but it took me a while to figure out an optimal way to load the spreadsheet into a dataframe with Pandas.  In the end, though, it only took two lines of code:


1
2
df_pov = pd.read_excel('./hstpov9.xls', header=[3,4,5], index_col=0)
df_pov = df_pov[:-1]  # drop the last row as it's just a footnote

Step 3: Get some legislation dates

Wikipedia to the rescue!  Wikipedia called out four major pieces of legislation in the War on Poverty:

  • The Economic Opportunity Act of 1964 – August 20, 1964
  • Food Stamp Act of 1964 – August 31, 1964
  • Elementary and Secondary Education Act – April 11, 1965
  • Social Security Act 1965 (Created Medicare and Medicaid) – July 19, 1965

Step 4: Plot time?  Not so fast!

So, the major pieces of legislation happened in 1964 and 1965.  Now, I can plot the poverty rate from the dataset I have and then add annotations at years 1964 and 1965.  Er, wait a minute…the dataset is missing the poverty rate from those years!  In fact, it’s missing all the years between 1960 and 1969.  Weird!  How will I know, on the plot, where to place my annotations?  Well, Pandas can figure that out with its handy interpolate function!  Only two lines of code to do the calculation!


1
2
3
4
5
# create a dataframe for the data I'm missing
df_gap_data = df_pov.loc[[1960, 1969], ('Total', 'Below poverty')]
# create rows for the missing data and use Pandas interpolate to make a best guess at what the poverty rate was during
# those missing years
df_gap_data = df_gap_data.reindex(pd.RangeIndex(df_gap_data.index.min(), df_gap_data.index.max() + 1)).interpolate()

Step 5: Now, plot time!

Now that I know where to place my annotations, here’s what I came up with for the plot:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
laws = [('Education Act', 1965), ('Social Security Act', 1965),
        ('Economic Opportunity Act', 1964), ('Food Stamp Act', 1964)]
title = 'Total Below Poverty Percentage, United States, with annotations'
y_offset = 0  # offset counter for the text block annotations

# plot the poverty rate
ax = df_pov.sort_index().loc[:, ('Total', 'Below poverty', 'Percent')].plot(title=title, figsize=(12, 10))
ax.set_xlabel('Year')
ax.set_ylabel('Percent below poverty')

# loop through the legislation so I can add those annotations
for law in laws:
    y_offset += 30
    name, year = law
    percent = df_gap_data.loc[year, 'Percent']
    ci = Ellipse((year, percent), width=0.5, height=0.1, color='black', zorder=5)
    ax.add_patch(ci)

    ax.annotate(name,
                xy=(year, percent), xycoords='data',
                xytext=(175, 300 + y_offset), textcoords='axes points',
                size=20,
                bbox=dict(boxstyle="round", fc="0.8"),
                arrowprops=dict(arrowstyle="->", color='black', patchB=ci,
                                connectionstyle="angle3,angleA=0,angleB=-90"))

 

And that rendered the plot at the top of this post.  Does that chart illustrate the point Bob Murphy was trying to make in the podcast?  I think so, but take a listen for yourself and let me know.  The big takeaway is all the cool annotations you can do in matplotlib.

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑