When I upload a photo to this blog, WordPress creates lower-resolution versions of the photo and stores them along with the original. It serves the lower-resolution photos if the browser window is too small so it can save bandwidth.
As an example, this photo:
is named IMG_0185.jpg
. When I uploaded it, WordPress created files with names like IMG_0185-150x150.jpg
and IMG_0185-1024x768.jpg
for versions with those resolutions.
I wanted to get a list of all of the original photos in the library without the low-resolution variants and use exiftool
to extract some information. It’s a fairly straightforward process.
Step 1: create a list of all the photos in the WordPress directory by using the Linux find
command, like so:
find .
Step 2: use a regular expression to filter out all of the variants, all of which have a string of the form -
followed by a number followed by x
followed by a number followed by .
and jpeg
, jpg
, or png
. Easy enough:
find . | egrep -v '.*-[0-9]+x[0-9]+\.(jpe?g|png)'
Step 3: feed that result into exiftool
, collect my information, and profit!
There was only one problem – exiftool
processed every file in the directory, even the ones that weren’t in the output of Step 2.
I couldn’t figure out what was wrong – eventually, I realized that the output of find
included a line with a single dot, meaning the current directory, and when exiftool
saw it, it processed all the files in the current directory.
I spent quite a while trying to improve my regular expression to remove the line with the single dot before I realized there was a simple solution: add type -f
to the find
command so that it wouldn’t list directories; in particular, it wouldn’t create the line with the single dot and I wouldn’t have to filter it out so that exiftool
wouldn’t get confused.
Regular expressions are a wonderful tool, but they’re not always the right tool for the job.