Blog

For the English part of the blog, see Content AND Presentation.

2020-07-06 Auto renaming image files

Like probably everyone else, I have lots of pictures from digital camera(s). Cataloging them is basically a nightmare, and I lost any hope for doing that manually a long time ago.

But why not make the computer do as much work as possible? First of all, my camera has a stupid habit of naming my pictures like IMG1234.JPG etc. While it puts pictures from different days in different directories, they have similarly nonsensical names.

Some time ago, I set out to fix that, and came up with a bit Rube-Goldberg-ish solution – however, it works good enough for me. I share it here in the hope that someone might make use of it (or parts of it). Also, I learned quite a bit about shell scripting while writing it, so why not teach others the tricks I learned?

So, first the script. It accepts a directory name, and it assumes that it contains some pictures (and possibly videos). It does not descent recursively into the directory. Also, I don’t guarantee anything about it – it might rename your files in a wrong way, delete them, make your computer explode or even install Windows.

OTOH, I ran ShellCheck on it, and it almost passed, so I hope it should work.

#!/bin/bash
# Accepts a directory name and renames all image files within that
# directory to a format like
# YYYY-DD-MM--HH-MM-SS[-number][-place].jpg
# # Then, renames the directory to YYYY-MM-DD or YYYY-MM-DD--YYYY-MM-DD.

shopt -s nocaseglob
unset CDPATH

exiftool '-FileName<CreateDate' -d %Y-%m-%d--%H-%M-%S%%-c.%%e -progress "$1"
cd "$1" || exit
for f in *.jpg
do
    LAT=$(exiftool -s3 -c "%+.6f" "$f" -GPSLatitude)
    LON=$(exiftool -s3 -c "%+.6f" "$f" -GPSLongitude)
    BNAME=${f%.*}
    EXT=${f##*.}
    GEO=$(cli-rg.sh "$LAT" "$LON" | iconv -f utf-8 -t ascii//translit)
    echo LAT="$LAT", LON="$LON", GEO="$GEO", BNAME="$BNAME", EXT="$EXT".
    if [ -n "$GEO" ]; then
	mv -n "$f" "$BNAME"-"$GEO"."$EXT"
    fi
done
cd ..
mv -n "$1" "$(ls -f "$1" | \
	 grep '^[[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}' | \
	 sort | \
	 cut -d- -f1-3 | \
	 sed -n '1p;$p' | \
	 uniq | \
	 paste -d% - - | \
	 sed 's/%$//' | \
	 sed 's/%/--/')"

I really hope you like it (especially the one-liner at the end, which was broken into several lines for convenience);-). Here’s a bit of explanation.

The shopt -s nocaseglob makes *.jpg match both a.jpg and B.JPG (and also possible variations, like c.JpG, which I guess nobody uses).

Since we will use cd in a minute, we unset CDPATH. (Yes, using cd in a shell script is probably not the best practice, but it’s convenient. It could be easily avoided here, but – frankly – I don’t bother.)

The exiftool invocation does the renaming of all the image (and video) files into the ISO date. (Well, it’s not 100% ISO 8601-compliant, because of the two hyphens between the date and the time, but this is much more legible.)

We then cd to our directory for convenience.

While exiftool can work on all files in a directory (in can even do it recursively with the -r option, which I don’t use here), the next part is much less efficient. What we do here is to run the cli-rg.sh script, which is a command-line interface to a reverse geocoding utility I found on the internet. Interestingly, after I wrote my script, I found another such utility, which at the first glance seems slightly better. Also, it feels a bit embarassing to use a NodeJS utility in a bash script like this. Actually, I would really like to rewrite my script to JS, Python or maybe even Lua, since now the overhead of starting the script for every single file is really annoying. Still, I decided to publish this proof-of-concept script anyway, since it has some really interesting parts.

One of them is the definitions of the BNAME and EXT variables. The former one, ${f%.*}, deletes the shortest substring matching .* (i.e., the extension) from the end of the string in the variable $f. The latter, ${f##*.}, deletes the longest substring matching *. (i.e., everything up to the last period) from the beginning, thus resulting in only the extension. (You can read about these and other string manipulation capabilities of bash in the relevant chapter of the famous Advanced Bash Scripting Guide under Substring Removal).

We then run a simple shell wrapper around the equally simple NodeJS wrapper of the abovementioned Offline Geocoder. D’oh. Renaming the file is now a breeze. (This is another way in which my script is inefficient – it runs the geocoder even if there are no lat/lon data. I don’t care about it, though, since I will rewrite it anyway. Some day.) By the way, here are these wrappers:

#!/bin/sh
cd /home/mbork/works/marcin/programming/cli-reverse-geocode/ || exit
node cli-rg.js "$@"
const geocoder = require('offline-geocoder')({database: 'db.sqlite'});
var lat = parseFloat(process.argv[2]);
var lon = parseFloat(process.argv[3]);
geocoder.reverse(lat, lon)
  .then(function(result) {
    console.log(result.name || '')
  })
  .catch(function(error) {
    process.exit(1);
  });

And now comes the fun part. We want to extract the first and last date of the files in the directory, and rename the directory to that date (if they are equal) or to a range of dates (in other cases). We first list all the files whose beginnings match the YYYY-MM-DD pattern (ShellCheck complains about the ls ... | grep ... idiom, but unless you have some strange ls, this should be fine.) We sort them and cut only the “date” part. The sed invocation removes everything but the first and the last lines, effectively leaving only the first and last dates of all the files in our directory. If it happens that these dates are equal, uniq reduces them to one line. Then, paste joins the two lines with a % sign (I could have used anything I knew was not in the dates). This is probably the most tricky part. paste can only join lines with a one-character string (so we first use % and then replace it with a double hyphen). Also, the primary use for paste is merging two or more files line-by-line. For instance, if we have two files, abcd and xyz, containing the respective letters in consecutive lines, the result of paste -d: abcd xyz is

a:x
b:y
c:z
d:

The trick is, when we give - as both files, paste reads alternating lines from stdin. (This is not a peculiarity of paste, but apparently the way stdin works also with other coreutils commands – if - is given more than once, only one file descriptor is created, so whenever we read a line, we have read a line and moved to the next one in the file. Thus, - - lets us read pairs of lines.

The rest is easy: we delete the % signs at the ends of lines (so that there won’t be stuff like 2000-01-01--2000-01-01, only 2001-01-01), replace the remaining %​’s with double dashes, and finally rename our directory to the date or the range of dates.

As I said above, this may (and should) be optimized in a few way. But at least it is a nice starting point. Have fun!

CategoryEnglish, CategoryBlog

Comments on this page

2020-06-27 Selective display

A few days ago I was working with some large JSON files. The top level of the file contained a large array, and each of its elements was a large object consisting of numbers, strings and other objects. I needed to work with a few elements at the top level of those objects, and did not want the lower-level objects to get in the way. In other words, I needed to hide them somehow.

Some modes allow for hiding e.g. function bodies, leaving only their headers. (Org-mode is probably the most well-known example.) Unfortunately, JSON mode is not one of them.

Selective display to the rescue. This is a very generic feature of Emacs, rather not known to everyone I suppose, which I find useful on more than one occasion. It hides everything indented more than some level. The UI is not very good – instead of using the point to designate the threshold level, you need to use a numeric prefix argument – but since I use it very rarely anyway, I can live with that. So I just moved the point to the last column I still wanted to see, pressed C-x = to see which column it was (it was the 12th one), and then said C-u 13 C-x $. (Pressing C-x $ with no argument shows everything back again.)

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

2020-06-22 lodash iteratee shorthand

As is widely known, JavaScript is a language with good semantics, not-so-good syntax and a terrible standard library. There are a few modules that aim to help with the last part, and lodash is one of them that I happen to use. It is a very nice thing, but is not necessarily easy to learn for newcomers. One of its nice features is so-called “iteratee shorthand”, mentioned many times in the docs. What is that? Well, Lodash has (among many others) the _.iteratee function. It accepts one argument and returns a function. If the argument is a function, _.iteratee just returns the same function – nothing interesting (and if given null, it returns the identity function).

The first interesting thing happens when the argument is a string or an integer. (This also happens with e.g. Booleans, although I’m not sure whether this should be ever used…) The _.iteratee function then creates a function which, handed an object, returns its property with the given name (or, when handed an array, returns its element with the given index). The string can also be a “path”, e.g. something of the form 'prop1.prop2.prop3', and this can be used to create functions which reach deeper in the object structure.

Things get even more interesting when _.iteratee is given some object obj. In such a case, it generates a function which returns true if its sole argument “matches” obj. (Here, “matches” means “has the same properties with the same values”. Of course, this is not a symmetric relation{a: 1, b: 2} matches {a: 1}, but not the other way round.).

Unfortunately, we can’t use this “matching” technique to reach deeper into the object.

Now it is clear that _.iteratee can be useful in a lot of places. For instance, if we have an array A of arrays, and we want to get their first elements, we can say (in pure JS) A.map(a => a[0]). The direct lodash equivalent is _.map(A, a => a[0]) (not really better than vanilla JS), or – using _.iteratee_.map(A, _.iteratee(0)) (which is even worse, since it is too verbose, at least for my taste). And here is the gist: many lodash functions (including _.map) implicitly wrap the relevant argument in _.iteratee. Thus, the idiomatic lodash version is actually _.map(a, 0), which is (finally) better (that is, shorter) than the vanilla JS solution.

CategoryEnglish, CategoryBlog, CategoryJavaScript

Comments on this page

More...