Blog

For the English part of the blog, see Content AND Presentation.

2019-03-25 Using benchmark to measure speed of Elisp code

Some time ago I promised that I’ll write something about measuring efficiency of some Elisp code.

Now, my guess was that the string version will be faster for short templates (due to the overhead of creating buffers), but the longer the template, the faster the buffer version.

That was right, though for other reasons.

Let us first present the methodology. This is completely non-scientific, though I think still fairly accurate. I used the built-in benchmark Emacs library. It provides (among others) the benchmark command, which takes the number of repetitions and the form to run as arguments. There are also two macros: benchmark-run (the benchmark function is basically a fairly thin wrapper around that macro) and benchmark-run-compiled (which does exactly what it says on the tin – it compiles the given form first). I first ran the benchmark-run-compiled macro so that I didn’t measure interpretation overhead, but it turned out that in my case, the difference was very small. For the first test, I decided to use a template of

Lorem ipsum dolor sit {{{beep}}}.

with the values parameter equal to (("beep" . "amet")). So I ran this code:

(benchmark-run
    100000
  (expand-template
   "Lorem ipsum dolor sit {{{beep}}}"
   '(("beep" . "amet"))))
(benchmark-run
    100000
  (string-expand-template
   "Lorem ipsum dolor sit {{{beep}}}"
   '(("beep" . "amet"))))

and got these results:

(5.100522955 7 1.1810125730000038)
(0.9608260630000001 2 0.3009825070000005)

(By the way, the compiled version turned out not to be really faster.) What to make of it? The returned list (which is displayed as a message with explanation if you decide to use the benchmark command) consists of three elements: the time (in seconds), the number of times garbage collection kicked in, and the time the garbage collection took (also in seconds). As you can see, the string version is about five times faster, both including and excluding GC.

Next, I took a longer version of Lorem ipsum – more than a hundred kilobytes, and about 15k words. I sprinkled a hundred beep‘s to replace and ran this

(benchmark-run 100 (expand-template super-long-template
				    '(("beep" . "amet"))))
(benchmark-run 100 (string-expand-template super-long-template
				    '(("beep" . "amet"))))

And here are the results:

(1.097874253 2 0.13561446199999994)
(29.576419415 417 28.299135672000013)

Wow. Everything proceeded as I have foreseen – almost. I was right that the buffer-based expansion would be faster for long templates, but I have misjudged the reasons. The gain coming from using buffers instead of strings (and, unlike strings, buffers need not be reallocated each time sth is added to them) is there (the buffer version is slightly faster excluding GC), but the real difference comes from the fact that GC went off like crazy for the string-based version.

What’s the lesson from that? I guess it is this: if you operate on long strings in Emacs, especially if you mangle them a lot, do use buffers and not strings per se. OTOH, if you want to write efficient code, take various factors (like GC) into consideration, and run tests to see if your predictions about speed were right.

A final reminder: remember that if using the benchmark command non-interactively, you have to remember to quote the form you are timing! If you don’t do it, then, according to normal evaluation rules, benchmark will evaluate the result of its evaluation as many times as you give, which is probably not what you want – in our case, that would actually measure the evaluation time of a string constant. The benchmark-run macro, however, is a macro, and normal evaluation rules need not (and do not) apply here. (If you use benchmark interactively, you must not quote it – then benchmark would measure the time of evaluation of the quoted form, which is basically negligible, since it evaluates to the s-expression itself!)

Happy hacking!

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

2019-03-18 Free Emacs key bindings

As we all know, most Emacs users customize Emacs in various ways. Usually, at some point in time, the built-in customization options cease to suffice. Then you proceed to writing your own functions and commands. Then, you want to bind them to some keys.

The purpose of this post is to list some default Emacs bindings which may be useless for some people, and so could be reused as command-invoking or even prefix keys. Of course, YMMV – for me, C-t is a very useful keybinding for a command I use quite often, but many will disagree, for instance.

As I mentioned some time ago, one of the keys which seems completely useless for anyone using Emacs in a GUI is C-z. Personally, I bound it to a prefix keymap so that I can launch a host of things using C-z combos, like turning on various modes or starting a few apps (like an email client or Beeminder). At some point I will probably turn my C-z keymap into a hydra, too.

Also, there are quite a few C-x bindings which just waste good key combinations. Here is a (non-exhaustive) list:

  • C-x C-l (downcase-region),
  • C-x C-n (set-goal-column),
  • C-x C-o (delete-blank-lines),
  • C-x C-p (mark-page),
  • C-x C-r (find-file-read-only).

Also, if you use home and end to move to the beginning and end of line, C-a and C-e may be worth rebinding. Depending on you usage, C-o (open-line) and a lot of movement keybindings which are available elsewhere (C-p, C-n, C-f, C-b, C-j, C-v, M-v) could be rebound. (The same applies to C-d, though I personally prefer it to delete.)

And then the meta commands follow. The ones that are probably useless for (almost?) anyone are:

  • M-i (tab-to-tab-stop),
  • M-o (a prefix for font related commands),
  • M-p and M-n (astonishingly, left undefined, though used heavily in minibuffer to traverse its history, and also in some other modes).

And then there are keys which some of you may use every day and some of you never:

  • M-a and M-e (moving by sentences, I use them a lot),
  • M-r (move-to-window-line-top-bottom, I use it sometimes),
  • M-j (indent-new-comment-line, I basically never use it).

Finally, let’s not forget about function keys. I use three of them: F12 for Emms, F10 for Org-mode related stuff (mainly clocking) and F8 for various other stuff, from displaying the battery state, to magit-blaming, to turning on various minor modes, to running Eshell.

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

2019-03-11 Name-based UUID generation

Some time ago, I had a very specific need. I had some data which had to be anonymized before sending somewhere. For example, assume that you have a CSV file with people’s names in one column, and some (possibly sensitive) data in the rest of the columns. I’d like to change all the names to some (pseudo)random stuff, but with the caveat that if some name appears more than once, it should be changed to the same thing every time.

Basically, I wanted my data to be hashed. And since I’m a bit paranoiac, I wanted my hashing to be salted. And the need arose at around 15:50, and I wanted to do that before I left office that day, so I needed a quick solution.

Enter uuidgen. It turns out that it has exactly what I needed: the -N parameter (the string to be hashed), -n (the “namespace”, i.e., my salt), -s (since I wanted SHA1 and not MD5). The “namespace” is another, fixed UUID, which can also be generated by uuidgen, e.g., with the -r argument.

The last piece of the puzzle is, how do I perform this operation on each cell of a CSV in one of the columns. I’m sure there are better ways, but what I wanted a quick solution. Since my hammer of choice is Emacs, here is what I did.

First, I opened my CSV in LibreOffice (this takes care of all the quoting madness CSV supports), selected my column and copied it to the clipboard. Then, I pasted it into Emacs (in some temporary buffer). Then, I selected the first line and performed C-u M-| uuidgen --sha1 -n <my_salt> -N $(cat) (to replace it inline with the generated UUID). Rinse and repeat – actually, I recorded a keyboard macro, so that part was easy.

Job done. Had I had more time, I’d probably use something more elegant – instead of doing my transformation in Emacs, I’d probably used xargs (syntax of which I tend to always forget), and instead of LibreOffice, I could probably use xsv to extract the column (and then put it back again in my file).

CategoryEnglish, CategoryBlog

Comments on this page

More...