2018-12-03 looking-back-p

Warning: this was meant to be a short tip about a simple thing I needed (looking-back-p), but during writing it got way out of hand and became a long, technical post. Be prepared.

Emacs has a very useful function looking-at, which says whether the text from the point on matches a given regex. Unfortunately, it modifies the match data, which are global state referring to the last search. Because of that, I often prefer to use looking-at-p, which takes care not to mess with the match data.

Occasionally, I also need looking-back, which does a similar thing in the opposite direction (of course, it is much slower, but sometimes it is just what is needed). There is no looking-back-p, however. It is easy to make it, though, and here it is.

(defun looking-back-p (regexp)
  "Same as `looking-back' but without modifying the match data."
  (let ((inhibit-changing-match-data t))
    (looking-back regexp)))

It used the same obvious approach that looking-at-p does, and works very well.

This is not the whole story, though. There is another approach to the same problem, which is using the macro save-match-data. It works by, well, saving the match data into a temporary variable and restoring them afterwards. Now the question is, why looking-at-p does not use that? Let’s look into this.

Here is the original definition of looking-at-p, taken from the file lisp/subr.el from the Emacs sources:

(defsubst looking-at-p (regexp)
  "\
Same as `looking-at' except this function does not change the match data."
  (let ((inhibit-changing-match-data t))
    (looking-at regexp)))

Before we go further, let’s mention that the docstring beginning with a backslash is connected with autoloading functions. You may read about it in (info "(elisp) Autoload") (I think I will try to dive into this and blog about it in some time.)

Next, defsubst is a way of defining an inline function in Elisp. This means that if you use it in some other function, and byte-compile that other function, it inlines the definition of our defsubst instead of calling it (which is slightly faster). See the node (info "(elisp) Inline Functions") in the Elisp reference for details (and warnings about potential problem with inline functions – the bottom line is that, as usual, premature optimization is the root of all evil, as prof. Knuth said – in other words, only inline a function when you really need the speedup).

Now I’m going to show you this trick: first, we execute compile-defun with point on our defsubst, and then call disassemble and type looking-at-p. This way we can see the byte-code in a legible form. Here it is:

byte code for looking-at-p:
  doc:  Same as `looking-at' except this function does not change the match data.
  args: (regexp)
0	constant  t
1	varbind	  inhibit-changing-match-data
2	constant  looking-at
3	varref	  regexp
4	call	  1
5	unbind	  1
6	return

I have to admit that my experience with Emacs byte code is next to zero, but I can more or less guess what the above code does. (Notice that the byte code seems to be using a stack, since we have postfix order of operands and operations!)

Now, let us define and byte-compile the following function.

(defun looking-at-with-save-match-data (regex)
  "Same as `looking-at-p`, but using `save-match-data`."
  (save-match-data
    (looking-at regex)))

And here is the result of disassembling it.

byte code for looking-at-with-save-match-data:
  doc:  Same as `looking-at-p`, but using `save-match-data`.
  args: (regex)
0	constant  match-data
1	call	  0
2	varbind	  save-match-data-internal
3	constant  <compiled-function>
      args: nil
    0	    constant  set-match-data
    1	    varref    save-match-data-internal
    2	    constant  evaporate
    3	    call      2
    4	    return

4	unwind-protect
5	constant  looking-at
6	varref	  regex
7	call	  1
8	unbind	  2
9	return

As you can see, there is a lot more to do for Emacs here. In order to understand a bit more of it, let us first use emacs-lisp-macroexpand. After positioning the point before the save-match-data form and calling that command, we get this:

(defun looking-at-with-save-match-data (regex)
  "Same as `looking-at-p`, but using `save-match-data`."
  (let
      ((save-match-data-internal
	(match-data)))
    (unwind-protect
	(progn
	  (looking-at regex))
      (set-match-data save-match-data-internal 'evaporate))))

As you can see, save-match-data introduces some overhead, like the unwind-protect form, evaluating match-data and some other stuff. (It also seems to call some other “compiled function”, and I am not sure what it means. It looks like unwind-protect wrapped its second argument internally in a lambda or something.) By the way, I am also wondering why there is an unwind-protect in save-match-data.

A natural question now is whether we could use inhibit-changing-match-data in another version of the save-match-data. Let’s try.

(defmacro save-match-data-icmd (&rest body)
      "An alternative definition of `save-match-data'."
      (declare (indent 0) (debug t))
      `(let ((inhibit-changing-match-data t))
	 ,@body))

Let us now define yet another version of looking-at-p, using the above macro.

(defun looking-at-with-save-match-data-icmd (regex)
  "Same as `looking-at-p`, but using `save-match-data`."
  (save-match-data-icmd
    (looking-at regex)))

After compiling and disassembling the above function, we get

byte code for looking-at-with-save-match-data-icmd:
  doc:  Same as `looking-at-p`, but using `save-match-data`.
  args: (regex)
0	constant  t
1	varbind	  inhibit-changing-match-data
2	constant  looking-at
3	varref	  regex
4	call	  1
5	unbind	  1
6	return

Hooray! We get exactly the same result as in the beginning.

Now the obvious question remains: why is this definition of save-match-data-icmd not used in Emacs sources? I asked this question on the Emacs mailing list, and the answer was so obvious that I laughed at myself. If the commands inside save-match-data want to use the match data (for instance, by using looking-at and doing something with the results), such an approach would obviously fail.

As a final note, remember that it is allowed for Emacs commands to modify match data, which sometimes leads to unexpected results. Since I don’t like my code producing unexpected results, I try to avoid doing this. This way my code is probably slightly slower (which usually doesn’t bother me for interactive commands – adding a overhead of a few microseconds to an interactive command doesn’t really matter a lot).

CategoryEnglish, CategoryBlog, CategoryEmacs