Musings of a dad with too much time on his hands and not enough to do. Wait. Reverse that.

Tag: powershell (Page 5 of 7)

Comparing files for backup

For particular reasons I won’t go into here, when it comes to backing up family photos–photos taken by my family’s digital cameras and various phones–I try to pull all those files together periodically and back them up under one common folder on my NAS, organized in year and month sub-directories.  The process works out well for the most part but I occasionally have to deal with the problem of duplicate files–files I’ve already backed up once but weren’t removed from the device in question–or, worse, files sharing the same name but are entirely different pictures.

Compare-Object

One way to identify these problems is with PowerShell’s Compare-Object cmdlet.  Imagine this scenario: I have a folder of pictures I downloaded from one of my kids’ phones.  That folder contains pictures taken in both June and July but I only want to determine if there are any new July pictures that I need to backup.  I can run the following at the PowerShell command prompt:


1
2
3
4
PS > $backed_up_files = gci "C:\my_nas_backup\Pictures\2018\July"
PS > $files_to_be_backed_up = gci "J:\104APPLE" | ? {$_.LastWriteTime -ge "2018-07-01" -and $_.LastWriteTime -lt "2018-08-01"}
PS > $c = compare -ReferenceObject $files_to_be_backed_up -DifferenceObject $backed_up_files
PS > $c | ? {$_.SideIndicator -eq "<="}

This produces a list of files that are not presently backed up, but I have two problems with this approach:

  1. The Compare-Object cmdlet can be slow, often taking several seconds to run, particularly if you have a few thousand photos to examine and
  2. More importantly, if you have photos that share the same filename yet are completely different photos, Compare-Object doesn’t seem to be smart enough to see the difference and point this out.

Enter Robocopy, FTW

There are lots of hidden gems in Windows, not the least of which is Robust File Copy (robocopy for short).  Robocopy is a fast and clever command line tool for copying folders and files and a great solution for some of your more challenging backup needs.  Leveraging some help from this post, I constructed the following robocopy command to see what July photos still needed backing up:


1
C:\>robocopy "J:\104APPLE" "C:\my_nas_backup\Pictures\2018\July" /l /ns /ndl /njs /njh /xx /minage:20180801 /maxage:20180701 /log:rc_july.log

Lot of command line arguments there!  Here are their explanations:

  • l – List only – don’t copy (this will just list out the files robocopy would normally want to copy over)
  • ns – don’t log file sizes
  • ndl – don’t log directory names (I’m only looking at files, not directories, anyway)
  • njs – No Job Summary (trying to keep my log file trim)
  • njh – No Job Header
  • xx – eXclude eXtra files and directories; “extra” files seem to be those files already in my backup folder.  For this particular problem, I only want to know about files I’ve yet to back up.
  • minage – exclude files newer than n days/date
  • maxage – exclude files older than n days/date (I only want photos taken in July, so I need to set these values accordingly)
  • log – output status to LOG file (the results of the comparison will be written to this file)

Robocopy applies different labels to the files it examines.  In my case, it applied the “extra” label to photos already in my backup directory.  Since I’m not concerned about these, I used the “xx” argument to suppress that information.  Next, it applied the label “new” to files in the phone directory but not in the backup directory.  Those are photos I definitely need to add to the backup.  Finally, it applied the label “newer” to photos in both directories that share the same file names but are completely different photos.  Sweet!

All this and robocopy copy ran in sub-seconds.  I think this will be my go-to tool going forward when comparing photos for backup.

More handy PowerShell snippets

In another installment of “handy PowerShell snippets“, I offer a few more I’ve used on occasion:

Comparing documents in PowerShell

WinMerge is a great tool for identifying differences between files, but if you want to automate such a process, PowerShell’s Compare-Object is an excellent choice.

Step 1: Load the documents you wish to compare


1
2
$first_doc = cat "c:\somepath\file1.txt"
$second_doc = cat "c:\somepath\file2.txt"

 Step 2: Perform your comparison.
Note that Compare-Object will return a “<=” indicating that a given value was found in the first file but not the second, a “=>” indicating a given value was found in the second file but not the first, or a “==” indicating that a given value was found in both files.


1
$items_in_first_list_not_found_in_second = ( Compare-Object -ReferenceObject $first_doc -DifferenceObject $second_doc | where { $_.SideIndicator -eq "<=" } | % { $_.InputObject } )

 Step 3: Analyze your results and profit!

One note of warning: In my experience, Compare-Object doesn’t do well comparing nulls. To avoid these circumstances, when I import the files I wish to compare, I’ll explicitly remove such troublesome values.


1
$filtered_doc = ( Import-Csv "c:\somepath\somedoc.csv" | where { $null -ne $_.SomeCol } | % { $_.SomeCol } )

 

Join a list of items into a single, comma-delimited line

Sometimes I’ll have a list of items in a file that I’ll need to collapse into a single, delimited line. Here’s a one-liner that will do that:


1
(cat "c:\somepath\somefile.csv") -join ","

 

Use a configuration file with a PowerShell script

A lot of times, PowerShell devs will either declare all their variables at the top of their scripts or in some sort of a custom configuration file that they load into their scripts. Here’s another option: how about leveraging the .NET framework’s configuration system?

If you’ve ever developed a .NET application, you’re already well aware of how to use configuration files. You can actually use that same strategy with PowerShell. For example, suppose you’ve built up a configuration file like so:


1
2
3
4
5
6
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="test_key" value="dadoverflow.com is awesome and i'm going to tell my friends all about it"/>
  </appSettings>
</configuration>

You can then load that config file into your PowerShell script with the following:


1
2
3
4
$script_path =$MyInvocation.MyCommand.Path

$my_config =[System.Configuration.ConfigurationManager]::OpenExeConfiguration($script_path)
$my_config_val = $my_config.AppSettings.Settings.Item("test_key").Value

One note: your PowerShell script and config file will need to share the same name. If your PowerShell script is called dadoverflow_is_awesome.ps1, then you’ll want to name your config file dadoverflow_is_awesome.ps1.config.

Here’s a bonus: Yes, it might be easier to just declare your variables at the top of your file and forgo the extra work of crafting such a config file. However, what if one of your configuration values is a password? By leveraging .NET’s configuration system you also get the power to encrypt values in your config file and hide them from prying eyes…but that’s a discussion that merits its own blog post, so stay tuned.

Handy PowerShell snippets

I code a fair amount with PowerShell at work and home and find myself reusing different snippets of code from script-to-script.  Here are a few handy ones I like to keep around.

Get the directory of the executing script

Having the directory of the executing script can be handy to load adjacent resources or as a location to which to write logs or other data:


1
$ExecutionDir = Split-Path $MyInvocation.MyCommand.Path

Dynamically add a new column to a CSV imported into a collection

Many times you need to add one or more columns to a data file you’re working on. Here’s a way to load your data file and add those other columns in one line:


1
Import-Csv "C:\somepath\some.csv" | select *, @{Name='my_new_column'; Expression={'some value'}}

Test whether an object is an object or an array

One thing I find frustrating with PowerShell is that when you retrieve an object, say through a web request or simply filtering on a collection, you don’t necessarily know the datatype of the result set. You could either have an array of objects or a single object. The problem is, the available properties change between arrays and single object. If you try to print “count” on a single object, for example, PowerShell will throw an exception. In order not to crash my scripts, then, I’ll use code like what I have below to test the datatype of my object before continuing on:


1
2
3
4
5
6
7
if($null -ne $myObj){
    if($myObj.GetType().IsArray){
        # $myObj is a collection, so deal with it as such`
    }else{
        # $myObj is a single object
    }
}

Add attributes to a XML document

Manipulating XML documents can be a real pain. Here’s an easy way to add an attribute to a XML element:


1
2
3
$x = [xml]"&lt;top_level_element/&gt;"
$x.DocumentElement.SetAttribute("my_attribute", "some value")
$x.OuterXml

Upload a document to a Sharepoint document library

I suspect there are probably easier ways to do this with the Sharepoint web APIs, but here’s a technique I’ve used in the past to upload a document to a document library in Sharepoint:


1
2
3
4
$web_client = New-Object System.Net.WebClient
$web_client.Credentials = [System.Net.CredentialCache]::DefaultCredentials
$my_file = gci "C:\somepath\somefile.txt"
$web_client.UploadFile( ("http://some_sharepoint_domain/sites/some_site/Shared Documents/{0}" -f $my_file.Name), "PUT", $my_file )

Use the HTML Agility Pack to parse an HTML document

Parsing HTML is the worst! In the Microsoft world, some genius came up with the HTML Agility Pack allowing you to effectively convert your HTML page into XML and then use XPath query techniques to easily find the data you’re interested in:


1
2
3
4
5
6
7
Add-Type -Path "C:\nuget_packages\HtmlAgilityPack.1.8.4\lib\Net40\HtmlAgilityPack.dll"

$hap_web = New-Object HtmlAgilityPack.HtmlWeb
$html_doc = $hap_web.Load("https://finance.yahoo.com/")
$xpath_qry = "//a[contains(@href, 'DJI')]"
$dow_data = $html_doc.DocumentNode.SelectNodes($xpath_qry)
$dow_stmt = ($dow_data.Attributes | ? {$_.Name -eq "aria-label"}).Value

Convert one collection to another (and guarantee column order)

First, imagine you have a collection of complex objects, say, a JSON document with lots of nesting. You want to try to pull out just the relevant data elements and flatten the collection to a simple CSV. This snippet will allow you to iterate through that collection of complex objects and append simplified records into a new collection. Another problem I’ve found is that techniques like Export-Csv don’t always guarantee that the columns in the resulting CSV will be in the same order you added them in your PowerShell script. If order is important, the pscustomobject is the way to go:


1
2
$col2 = @()
$col1 | %{ $col2 += [pscustomobject]@{"column1"=$_.val1; "column2"=$_.val2} }

Load multiple CSV files into one collection

It’s not uncommon to have multiple data files that you need to load into one collection to work on. Here’s a technique I use for that situation:


1
2
3
4
5
$col = @()
dir "C:\somepath" -Filter "somefile*.csv" | % { $col += Import-Csv $_.FullName }

# If you need to filter out certain files, try this:
dir "C:\somepath" -Filter "somefile*.csv" | ? { $_.Name -notmatch "excludeme" } | % { $col += Import-Csv $_.FullName }

Parse a weird date/time format

It’s inevitable that you’ll run into a non-standard date/time format that you’ll have to parse. Here’s a way to handle that:


1
$date = [datetime]::ParseExact( "10/Jun/18", "dd/MMM/yy",$null )

 

« Older posts Newer posts »

© 2024 DadOverflow.com

Theme by Anders NorenUp ↑