2021-02-13 Copying to clipboard with single spaces

I sometimes need to transfer some text from Emacs to another program, like a web browser or terminal. A few weeks ago I thought that it would be nice if I could somehow transform that text – my use-case is changing double spaces (which I habitually put after every sentence, so that I can use Emacs’ sentence-aware commands) to single spaces (which is what most people expect, and some people treat as the only correct option).

Well, it turns out that there is no hook in Emacs to do that – but it’s not really a problem, since there are two ways around it. For starters, I could change the interprogram-cut-function variable (which points to a function doing the actual copying to system clipboard – gui-select-text by default). Or even better, I can advise gui-select-text so that it begins with replacing multiple spaces with single ones. (Note that this means in particular that Emacs’ own kill ring won’t be affected – killing and yanking within the same Emacs session will preserve all the spaces. Going to a separate Emacs process will not, of course, since then the text travels through the system clipboard.)

Now one might think this is going to work:

(defun single-spacify (args)
  "Convert all consecutive spaces to a single one in STRING."
  (list (replace-regexp-in-string " +" " " (car args))))

(advice-add 'gui-select-text :filter-args #'single-spacify)

but it doesn’t. Before I explain why, let me mention two things which were not apparent to me at first. One is that I initially thought that I should use the :before advice combinator – but that was only because I was too lazy too read the manual. That one is only useful when you want to run some code, well, before the function advised, but both the piece of advice you provide and the function being adviced receive exactly the same arguments. In order to modify the arguments the advised function sees, you need to use :filter-args as above (or some more general combinator like :around, of course).

Another gotcha is that the “advisor” (i.e., my single-spacify function) receives a list of all arguments provided to the “advisee” (i.e., gui-select-text in the example above) – hence the car. And the advice mechanism expects it to also return a list of arguments which are then fed into the “advisee” – hence the list. (This is of course documented in the manual, although partially in code and not in prose – but that’s ok.)

Now, unfortunately this is not the best idea. The reason is fairly obvious – if I decide to transfer some code (as opposed to natural-language text) from Emacs to somewhere else via the clipboard, and the code uses spaces and not tabs for indentation, things will break horribly. (Notice that the single-spacify function does not touch tabs.) And before you start to complain that tabs should be used exclusively for indentation, please note that e.g. YAML explicitly disallows tabs for indenting (whether using YAML is a good idea itself is a separate topic).

So, what I think I should do instead is only “compress” multiple spaces to one when they are not at the beginning of a line.

In a typical programming language doing something like this with a (possibly multiline) string would follow a typical pattern: split the string into an array/list of strings on newlines, iterate over all the single lines and then join them back together. I could do this in Elisp, too, using the split-string and mapconcat functions. However, Emacs is a text editor, and much more natural (and idiomatic, I think) solution is to put the string into a temporary buffer, use Emacs’ editing functions to work on it and then convert it back into a string.

So, here is one possible solution.

(defun single-spacify (args)
  "Convert all consecutive spaces not at BOL to a single one in STRING."
  (list (with-temp-buffer
	  (insert (car args))
	  (goto-char 0)
	  (while (not (eobp))
	    (skip-chars-forward " \t")
	    (while (re-search-forward
		    "  +"
		    (save-excursion
		      (end-of-line)
		      (point))
		    t)
	      (replace-match " " t t))
	    (end-of-line)
	    (unless (eobp)
	      (forward-char 1)))
	  (buffer-string))))

Note that I tried to make this function slightly optimized – this might be premature optimization, and since this is going to be used as an advice to an interactive function anyway, these optimization probably don’t matter much, but they don’t really make the code less readable, so why not. For instance, (goto-char 0) is perhaps a tiny bit faster than (goto-char (point-min)), but since we call it right after inserting to a freshly created temporary buffer, (point-min) is going to be 0 anyway. Also, the two t​’s in the replace-match call make it skip two branches in its code – but again, this function is written in C, so most probably nobody would ever notice.

Now what would be interesting is this: how does this compare to a split-string​/​mapconcat solution? I didn’t actually write it (though it would be fairly easy, I guess), but I have a strong suspicion that the more lines we have, and the more actual replacements, the faster the buffer version is going to be. I base this guess on my earlier experiences with strings and buffers in Elisp, one of their differences being that strings are immutable and buffers are not – so if we perform a lot of string operations, the garbage collector will intervene and slow things down considerably.

Anyway, this is it for today. Happy hacking!

CategoryEnglish, CategoryBlog, CategoryEmacs