2023-07-08 Finding a common prefix of a set of strings

Some time ago I needed a very specific thing. Given two strings, I wanted to find their longest common (initial) substring. For example, given abc and abd, I wanted ab. I thought that Emacs Lisp must have something like this already, and I wasn’t wrong. It turns out there are even two functions which can do that! One is fill-common-string-prefix, which accepts two strings and returns, well, their longest common prefix. I found it by searching the Elisp reference for the phrase “initial substring”, finding fill-context-prefix and looking at its source. Interestingly, you can also look for “common substring” in the reference and find try-completion. This function is even better, since it can easily find the longest common initial substring for a whole list of strings. Note that its docstring is a tiny bit misleading, since it first says about the “longest common substring”, but later explains that it really is the “the longest initial [common] substring”, which is not really the same.

When I have two functions doing (roughly) the same (well, they don’t really – try-completion does much more, but it can do what fill-common-string-prefix does), I figured that it’s worthwhile to see which one is faster. My guess was that try-completion should be the faster one, since it is written in C and fill-common-string-prefix is written in Elisp – but the difference seems negligible. In fact, fill-common-string-prefix is just a wrapper around compare-strings, yet another C function. I ran C-u 1000000 M-x benchmark on simple expressions involving each of them, and it turned out that finding the longest common initial substring takes about 350 nanoseconds on my (not very modern) laptop using either method. Since I need to do this once, in an interactive function doing a lot of I/O and calling several dozen (possibly even several hundred) external processes, this (of course) does not matter at all. (And if you are now wondering why I would call several hundred external processes in one Elisp command… well, be patient, I will blog about it when it’s done.)

Anyway, that’s it for today. As usual, when you want to do something string-related, Elisp has you covered, even if sometimes the functions are not necessarily easily discoverable – at least not by the name, since isearching through the Elisp reference really did help.

CategoryEnglish, CategoryBlog, CategoryEmacs