2016-11-07 Displaying nonexistent text in Emacs buffers

Some time ago, I received yet another email containing a date in the MM/DD/YY format, which is probably the least reasonable format in existence. (Here in Poland, the customary date format is DD.MM.YYYY, which makes much more sense. Personally, however, I very much prefer the ISO-8601-sanctioned YYYY-MM-DD format.) Since it is quite difficult to deal with date formats one is unfamiliar with, I decided to do something about it.

There are several ways I could do something about it. One of them is overlays, and it might be the way to go provided not too many are needed. As the Emacs Lisp Reference dutifully records, /overlays generally don’t scale well (many operations take a time that is proportional to the number of overlays in the buffer)/. That means that in case of an email it could work, but imagine a long Org-mode document with lots of dates in an unwieldy format. (Of course, actually modifying the buffer contents is a bad idea for many reasons. For instance, I don’t want to risk my code changing something that looks like a date but actually is not. Also, I might actually want to preserve the contents in order to not introduce noise to the diffs. Yet another reason might be that the buffer is read-only and I’m not allowed/I don’t want to mess with it, even though this could be overcome.)

That leaves me with one option I know: text properties. Text properties are an incredibly powerful mechanism within Emacs. Basically, you can attach a property list to any single character (or a contiguous range of characters) in an Emacs buffer. (I do not know about the actual implementation, but you can at least think of text properties like that.) And a property list is a poor man’s hash map or dictionary (i.e., it is slower than a real hash map, but that doesn’t matter for short lists anyway). And that plist can actually affect the way Emacs displays the text in many ways (like change the color or the face, or even make Emacs display something completely else than the actual contents of the buffer).

Just imagine all these cool things you could do with that, like word processing, using Emacs to handle interactive forms and many others. In fact, these two examples are real, and Emacs has been able to do those for many years.

So, let’s get to business.

First of all, what we need here is the “display” property. There are numerous text properties in Emacs, and since they are stored in a plist, they are all identified by a symbol. The symbols are arbitrary, but some of them (so-called “special properties”) have special meaning, and display is one of them. It can do a few weird things, like displaying an image or some other text instead of the text having that property. If the value of the display property is a string, the portion of the buffer that the property is applied to disappears, and that string is displayed instead.

So, what we are going to do now is to write a function that scans the buffer looking for US-formatted dates and adds a suitable property to them. Pretty straightforward, right? Searching for US-formatted dates can be done with a regex: \(0?[1-9]\|1[012]\)/\(0?[1-9]\|[12][0-9]\|3[01]\)/\([0-9]\{2,4\}\) (note that it is a bit simplistic in the year part). Notice also that I used groups (so that after successful search for the above regex, match data will contain the information about the date).

And now the fun part – actually setting the display property. Here’s one way to do it:

(defvar us-date-regex "\\(0?[1-9]\\|1[012]\\)/\\(0?[1-9]\\|[12][0-9]\\|3[01]\\)/\\([0-9]\\{2,4\\}\\)"
  "Regex matching date in the US format (M/D/Y), with groups
capturing the month, day and year.")

(defun sanitize-us-dates-display ()
	(goto-char (point-min))
	(while (re-search-forward us-date-regex nil t)
	   (match-beginning 0)
	   (match-end 0)
	   (format "%s [%.4d-%.2d-%.2d]"
		   (match-string-no-properties 0)
		   (org-small-year-to-year (string-to-number (match-string-no-properties 3)))
		   (string-to-number (match-string-no-properties 1))
		   (string-to-number (match-string-no-properties 2)))))))
test 10/31/16, 8/8/18

Notice that we used the org-small-year-to-year function, which converts small numbers to big ones, like changing 16 into 2016. If you don’t use Org-mode, you might need to say (require 'org) first (or write your own version of that useful function).

Also, see how this is not really ideal: we can’t move point into either of the dates. Also, you only see the converted dates; you can’t e.g. copy them to the kill ring. (I’m not sure why you would want to, though.) Another issue is that you can’t isearch for them either.

The biggest issue, however, is the fact that what we did cannot be easily undone. We could of course just get rid of all display properties in the buffer, but how do we know there are not other ones, not introduced by our function?

All these reasons imply that our function is really a toy. It can be useful for reading email, which you don’t edit anyway (and which is usually discarded – i.e., its buffer killed – rather quickly), but for anything more serious, this solution is not very satisfying.

What would be probably more satisfying would be not to use text properties but overlays anyway. That is, however, another topic for another post. For now, that’s all – for me, the function above is enough anyway.

CategoryBlog, CategoryEnglish, CategoryEmacs