2020-07-06 Auto renaming image files

Like probably everyone else, I have lots of pictures from digital camera(s). Cataloging them is basically a nightmare, and I lost any hope for doing that manually a long time ago.

But why not make the computer do as much work as possible? First of all, my camera has a stupid habit of naming my pictures like IMG1234.JPG etc. While it puts pictures from different days in different directories, they have similarly nonsensical names.

Some time ago, I set out to fix that, and came up with a bit Rube-Goldberg-ish solution – however, it works good enough for me. I share it here in the hope that someone might make use of it (or parts of it). Also, I learned quite a bit about shell scripting while writing it, so why not teach others the tricks I learned?

So, first the script. It accepts a directory name, and it assumes that it contains some pictures (and possibly videos). It does not descent recursively into the directory. Also, I don’t guarantee anything about it – it might rename your files in a wrong way, delete them, make your computer explode or even install Windows.

OTOH, I ran ShellCheck on it, and it almost passed, so I hope it should work.

#!/bin/bash
# Accepts a directory name and renames all image files within that
# directory to a format like
# YYYY-DD-MM--HH-MM-SS[-number][-place].jpg
# # Then, renames the directory to YYYY-MM-DD or YYYY-MM-DD--YYYY-MM-DD.

shopt -s nocaseglob
unset CDPATH

exiftool '-FileName<CreateDate' -d %Y-%m-%d--%H-%M-%S%%-c.%%e -progress "$1"
cd "$1" || exit
for f in *.jpg
do
    LAT=$(exiftool -s3 -c "%+.6f" "$f" -GPSLatitude)
    LON=$(exiftool -s3 -c "%+.6f" "$f" -GPSLongitude)
    BNAME=${f%.*}
    EXT=${f##*.}
    GEO=$(cli-rg.sh "$LAT" "$LON" | iconv -f utf-8 -t ascii//translit)
    echo LAT="$LAT", LON="$LON", GEO="$GEO", BNAME="$BNAME", EXT="$EXT".
    if [ -n "$GEO" ]; then
	mv -n "$f" "$BNAME"-"$GEO"."$EXT"
    fi
done
cd ..
mv -n "$1" "$(ls -f "$1" | \
	 grep '^[[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}' | \
	 sort | \
	 cut -d- -f1-3 | \
	 sed -n '1p;$p' | \
	 uniq | \
	 paste -d% - - | \
	 sed 's/%$//' | \
	 sed 's/%/--/')"

I really hope you like it (especially the one-liner at the end, which was broken into several lines for convenience);-). Here’s a bit of explanation.

The shopt -s nocaseglob makes *.jpg match both a.jpg and B.JPG (and also possible variations, like c.JpG, which I guess nobody uses).

Since we will use cd in a minute, we unset CDPATH. (Yes, using cd in a shell script is probably not the best practice, but it’s convenient. It could be easily avoided here, but – frankly – I don’t bother.)

The exiftool invocation does the renaming of all the image (and video) files into the ISO date. (Well, it’s not 100% ISO 8601-compliant, because of the two hyphens between the date and the time, but this is much more legible.)

We then cd to our directory for convenience.

While exiftool can work on all files in a directory (in can even do it recursively with the -r option, which I don’t use here), the next part is much less efficient. What we do here is to run the cli-rg.sh script, which is a command-line interface to a reverse geocoding utility I found on the internet. Interestingly, after I wrote my script, I found another such utility, which at the first glance seems slightly better. Also, it feels a bit embarassing to use a NodeJS utility in a bash script like this. Actually, I would really like to rewrite my script to JS, Python or maybe even Lua, since now the overhead of starting the script for every single file is really annoying. Still, I decided to publish this proof-of-concept script anyway, since it has some really interesting parts.

One of them is the definitions of the BNAME and EXT variables. The former one, ${f%.*}, deletes the shortest substring matching .* (i.e., the extension) from the end of the string in the variable $f. The latter, ${f##*.}, deletes the longest substring matching *. (i.e., everything up to the last period) from the beginning, thus resulting in only the extension. (You can read about these and other string manipulation capabilities of bash in the relevant chapter of the famous Advanced Bash Scripting Guide under Substring Removal).

We then run a simple shell wrapper around the equally simple NodeJS wrapper of the abovementioned Offline Geocoder. D’oh. Renaming the file is now a breeze. (This is another way in which my script is inefficient – it runs the geocoder even if there are no lat/lon data. I don’t care about it, though, since I will rewrite it anyway. Some day.) By the way, here are these wrappers:

#!/bin/sh
cd /home/mbork/works/marcin/programming/cli-reverse-geocode/ || exit
node cli-rg.js "$@"
const geocoder = require('offline-geocoder')({database: 'db.sqlite'});
var lat = parseFloat(process.argv[2]);
var lon = parseFloat(process.argv[3]);
geocoder.reverse(lat, lon)
  .then(function(result) {
    console.log(result.name || '')
  })
  .catch(function(error) {
    process.exit(1);
  });

And now comes the fun part. We want to extract the first and last date of the files in the directory, and rename the directory to that date (if they are equal) or to a range of dates (in other cases). We first list all the files whose beginnings match the YYYY-MM-DD pattern (ShellCheck complains about the ls ... | grep ... idiom, but unless you have some strange ls, this should be fine.) We sort them and cut only the “date” part. The sed invocation removes everything but the first and the last lines, effectively leaving only the first and last dates of all the files in our directory. If it happens that these dates are equal, uniq reduces them to one line. Then, paste joins the two lines with a % sign (I could have used anything I knew was not in the dates). This is probably the most tricky part. paste can only join lines with a one-character string (so we first use % and then replace it with a double hyphen). Also, the primary use for paste is merging two or more files line-by-line. For instance, if we have two files, abcd and xyz, containing the respective letters in consecutive lines, the result of paste -d: abcd xyz is

a:x
b:y
c:z
d:

The trick is, when we give - as both files, paste reads alternating lines from stdin. (This is not a peculiarity of paste, but apparently the way stdin works also with other coreutils commands – if - is given more than once, only one file descriptor is created, so whenever we read a line, we have read a line and moved to the next one in the file. Thus, - - lets us read pairs of lines.

The rest is easy: we delete the % signs at the ends of lines (so that there won’t be stuff like 2000-01-01--2000-01-01, only 2001-01-01), replace the remaining %​’s with double dashes, and finally rename our directory to the date or the range of dates.

As I said above, this may (and should) be optimized in a few way. But at least it is a nice starting point. Have fun!

CategoryEnglish, CategoryBlog