For particular reasons I won’t go into here, when it comes to backing up family photos–photos taken by my family’s digital cameras and various phones–I try to pull all those files together periodically and back them up under one common folder on my NAS, organized in year and month sub-directories.  The process works out well for the most part but I occasionally have to deal with the problem of duplicate files–files I’ve already backed up once but weren’t removed from the device in question–or, worse, files sharing the same name but are entirely different pictures.

Compare-Object

One way to identify these problems is with PowerShell’s Compare-Object cmdlet.  Imagine this scenario: I have a folder of pictures I downloaded from one of my kids’ phones.  That folder contains pictures taken in both June and July but I only want to determine if there are any new July pictures that I need to backup.  I can run the following at the PowerShell command prompt:


1
2
3
4
PS > $backed_up_files = gci "C:\my_nas_backup\Pictures\2018\July"
PS > $files_to_be_backed_up = gci "J:\104APPLE" | ? {$_.LastWriteTime -ge "2018-07-01" -and $_.LastWriteTime -lt "2018-08-01"}
PS > $c = compare -ReferenceObject $files_to_be_backed_up -DifferenceObject $backed_up_files
PS > $c | ? {$_.SideIndicator -eq "<="}

This produces a list of files that are not presently backed up, but I have two problems with this approach:

  1. The Compare-Object cmdlet can be slow, often taking several seconds to run, particularly if you have a few thousand photos to examine and
  2. More importantly, if you have photos that share the same filename yet are completely different photos, Compare-Object doesn’t seem to be smart enough to see the difference and point this out.

Enter Robocopy, FTW

There are lots of hidden gems in Windows, not the least of which is Robust File Copy (robocopy for short).  Robocopy is a fast and clever command line tool for copying folders and files and a great solution for some of your more challenging backup needs.  Leveraging some help from this post, I constructed the following robocopy command to see what July photos still needed backing up:


1
C:\>robocopy "J:\104APPLE" "C:\my_nas_backup\Pictures\2018\July" /l /ns /ndl /njs /njh /xx /minage:20180801 /maxage:20180701 /log:rc_july.log

Lot of command line arguments there!  Here are their explanations:

  • l – List only – don’t copy (this will just list out the files robocopy would normally want to copy over)
  • ns – don’t log file sizes
  • ndl – don’t log directory names (I’m only looking at files, not directories, anyway)
  • njs – No Job Summary (trying to keep my log file trim)
  • njh – No Job Header
  • xx – eXclude eXtra files and directories; “extra” files seem to be those files already in my backup folder.  For this particular problem, I only want to know about files I’ve yet to back up.
  • minage – exclude files newer than n days/date
  • maxage – exclude files older than n days/date (I only want photos taken in July, so I need to set these values accordingly)
  • log – output status to LOG file (the results of the comparison will be written to this file)

Robocopy applies different labels to the files it examines.  In my case, it applied the “extra” label to photos already in my backup directory.  Since I’m not concerned about these, I used the “xx” argument to suppress that information.  Next, it applied the label “new” to files in the phone directory but not in the backup directory.  Those are photos I definitely need to add to the backup.  Finally, it applied the label “newer” to photos in both directories that share the same file names but are completely different photos.  Sweet!

All this and robocopy copy ran in sub-seconds.  I think this will be my go-to tool going forward when comparing photos for backup.