2016-01-09 mrr-auto-replace

It is often the case that I need to do some heavy search-and-replace in LaTeX files. Usually, this is when I get a file from someone else, and the person who wrote it does not know LaTeX very well. Fortunately, the errors in such files are usually quite predictable and repeatable. For instance, a very common problem is using $$...$$ instead of \[...\] (or other constructs), or {\em ...} instead of \emph{...}, or messing up with dashes, etc.

Some of these errors are quite easy (and safe) to correct automatically. For example, if the file compiles at all, it is fairly safe to assume that the double dollar signs are correctly paired up, so we may replace every other of them with \] and the rest with \[.

There is a function in Emacs which does (more or less) this. It is called map-query-replace-regexp. However, it is not exactly what I wanted. First of all, it seems it was not around when I first needed this (around Emacs version 23 or so);-). Then, it is a bit cumbersome to call it interactively all the time. Of course it is trivial to write an Elisp wrapper and calling it with the right arguments and with the point at the right place, but it will still ask annoying questions you’ll have to answer with ! if you want all your matches to be replaced. And if you peek into its source code, it turns out that the heavy lifting is done by perform-replace. And if you look that up, here’s a quote from its docstring:

Don't use this in your own program unless you want to query and set the mark
just as `query-replace' does.  Instead, write a simple loop like this:

  (while (re-search-forward "foo[ \t]+bar" nil t)
    (replace-match "foobar" nil nil))

which will run faster and probably do exactly what you want.

And so I did. The loop was not that simple, especially that there was one more functionality I needed: I wanted some replacements to take place only in math mode (or only outside math mode). For instance, in Polish typography you should not end a line with a one-letter word (and we have a few of them), so instead of i inni you should type i~inni. On the other hand, „correcting” $x + i y$ to $x + i~y$ would be disastrous (never mind that there is no good reason for those spaces to be there at all – $x+iy$ is much more compact and legible to me – but some people are very generous with spacing). Determining whether we are in math mode or not is easy, but I needed to be able to plug this detection into my search-and-replace engine.

And this is one of the two problems solved by my package, multi-replace-regex (soon to be released). One of the entry points is the command mrr-auto-replace. It doesn’t take any parameters – the patterns for replacing are supposed to be fixed, so they sit safely in the option mrr-auto-substitutions – it just operates on the visible portion of the buffer, starting with the position of the point. The function is not complicated, it has less than 20 SLOC, but is very useful. Here’s how you can set up mrr-auto-substitutions:

(setq mrr-auto-substitutions
  '(("\\$\\$" "\\\\[" "\\\\]")
    ("\\$" "\\\\(" "\\\\)")
    ("~" texmathp "")
    ("\\<\\([aeiouwzAEIOUWZ]\\)\\s +" (lambda () (not (texmathp))) "\\1~")
 "~\\\\cite"))

The first two patterns change single and double dollars to \(...\) and \[...\] respectively. The third one deletes all tildes in math mode (I haven’t yet encountered a single situation when a tilde in math mode was not a mistake!). The fourth one puts tildes where they should be (but only outside math mode!), and the last one puts tildes before the \cite command, where they should be. Note that I didn’t bother to put the not-in-math-mode stuff in this one, since if a \cite appears in math mode, you have worse problems than hard spaces.

Under the hood, mrr-auto-replace does its business with each regex sequentially, but this doesn’t really make any difference, unless the result of one transformation may be subject to another one. Such behavior was not only easier to code, but probably also beneficial; my use case excludes such situations, but I could imagine someone actually leveraging such “sequential replacing”.

In the future, we’ll look into mrr-auto-replace‘s brother, which is way cooler: mrr-replace-mode, which turns on a minor mode for interactive replacing a bunch of regexen with one of possible replacements. But this is another story.

CategoryEnglish, CategoryBlog, CategoryEmacs, CategoryTeX