Blog

For the English part of the blog, see Content AND Presentation.

2024-11-18 Discovering functions and variables in Elisp files

Sometimes I have an Elisp file which I suspect contains some useful functions. Even if the file is well-documented (for example, it belongs to Emacs itself), that does not mean that every function in it is described in the manual. What I need in such a case is a list of functions and variables (possibly also macros) defined in this file.

My usual solution was to isearch for defun (or (defun, or (def, etc.) and just skim the file, pressing C-s repeatedly. (Sometimes I look for the string (interactive, too, for obvious reasons.) It occurred to me recently (pun intended;-)) that I could use Occur for that. Typing M-s o, then (def and RET gives me a list of all defun​s, defvar​s, defcustom​s, defmacro​s etc. What’s even better, pressing n and p in the Occur buffer immediately moves point to the corresponding place in the searched buffer, so that I can easily see the docstrings of the things I found!

I can do better, though. This simplistic approach does not take into account the fact that some functions and variables are “internal” or “private” and are explicitly not the part of the “official” API of the package in question. These “private” entities are easily recognized, since the convention is to use a double dash in their names.

At first, I wanted to write some custom Elisp to generate the occur buffer first, then remove the double-dashed lines from it, or maybe don’t put them there at all, although that would probably require me to construct that buffer completely by hand, not relying on the occur command. (Of course, you can’t construct a regular expression matching “every line that starts with (def, but excluding lines containing --”, so it seemed that a bit of Elisp is necessary here.) Don’t get me wrong – I have nothing against writing Elisp;-) – but why work if you don’t have to? I had a bright idea and wrote this regex instead:

^(def[^ ]+ -?\(?:[^ -]+-?\)+\_>

Let’s analyze it. It matches first the string (def at the beginning of line, followed by one on more non-space characters and then a space – this is just any (defun, (defvar etc. Then the magic happens. The regex in the shy group matches one or more occurrences of one or more characters other than a space or a dash, followed by an optional dash. This means that any valid Elisp symbol except ones that contain double dashes should match it. (Well, a valid symbol could also begin with a dash – that’s why there is -? before the group. Also, there are things which are not valid symbols matching this regex, too, for example some strings containing parentheses – but they should not appear right after a def-something in a syntactically correct Elisp buffer, and even if they did, I don’t care about extremely rare false positives.) I ended the regex with \_> to make sure that the repeated shy group captures the whole symbol. Otherwise, when there is a double dash in it, the regex would just match its part until that double dash.

I have to admit that I am pretty proud of this – I think it is a pretty clever hack. It doesn’t mean that I won’t write any Elisp at all – I decided to wrap in in the following command:

(defun discover-public-api ()
  "Show all public functions, variables etc. in the `*Occur*' buffer."
  (interactive "" emacs-lisp-mode)
  (list-matching-lines "^(\\(?:cl-\\)?def[^ ]+ -?\\(?:[^ -]+-?\\)+\\_>")
  (select-window (get-buffer-window "*Occur*"))
  (message (substitute-command-keys "\\<occur-mode-map>Press \\[next-error-no-select] and \\[previous-error-no-select] to move around.")))

(I added one thing to the regex – an optional cl- prefix, for things like cl-defun and friends. Also, I used the mode indication for interactive.)

Note that the \<occur-mode-map> part is needed, because at the point when message is evaluated, the *Occur* buffer is not the current buffer – it will be made so by the command loop after the function finishes. Therefore I need to tell substitute-command-keys to use occur-mode-map explicitly. My approach here – to use select-window and count on the command loop to switch to the *Occur* buffer – is definitely not a good practice, but at least allows me to show the \[...] construct;-). Also, this will be changed in a few minutes anyway.

Now, this is enough for my use, but definitely not something production-grade and nice – for example, the lines in the occur buffer have the part matching the regex highlighted. For this regex, this includes everything from the opening parenthesis up to the end of the symbol, which looks a bit weird. (This could be remedied by adding .*$ to the regex, but it would mean that the highlighting face – which by default has a bright yellow background – would be used all over the place. While arguably more “consistent”, it would look even worse.) This can be fixed, too, of course – this is Emacs, after all – but it is slightly more work than one could expect.

The highlighting is done by adding the match face to the face property. This means that the value of that property is either just match or a list of faces beginning with match. This way, all the fontification is carried over from the Elisp buffer to the Occur buffer, which is a desirable behavior. For example, faces like font-lock-keyword-face, used in Elisp buffers, are still used in the *Occur* buffer. It is fairly easy to remove all faces from text properties:

(let ((inhibit-read-only t))
  (remove-text-properties (point-min)
                          (point-max)
                          '(face nil)))

(the *Occur* buffer is normally read-only, hence the inhibit-read-only), but this is not what I would like to do, since it also removes the faces installed there by font locking. It turns out that removing only the match face from the whole buffer is a surprising amount of work. Elisp has the add-face-text-property function which can add a face to a text (possibly resulting in combining more than one face), but no remove-face-text-property one. Of course, it’s not impossible to write it, but it is a bit tricky. One of the reasons is that the face text property can be either a symbol (denoting a face) or a list of such symbols. This is nothing difficult to deal with, but it adds complexity, and it helps to have some functions dealing with it:

(defun mbork/contains-or-equals (needle haystack)
  "Return non-nil if NEEDLE is `eq' to HAYSTACK or HAYSTACK cotains NEEDLE."
  (or (eq needle haystack)
      (and (listp haystack)
           (memq needle haystack))))

(defun mbork/remove-face (face face-prop)
  "Remove FACE from FACE-PROP and return the result.
If FACE-PROP is `eq' to FACE, return nil.  If FACE-PROP is a list,
return the result of `(remq face face-prop)'.  Otherwise, return
FACE-PROP."
  (cond ((eq face face-prop)
         nil)
        ((listp face-prop)
         (remq face face-prop))
        (t face-prop)))

If the face property were always a list (possibly containing just one element), it would suffice to use memq and remq in place of those functions. (Note that mbork/remove-face can output a list consisting of just one element, which could then be turned into just that element in the context of the face property – but I don’t think the added complexity would be worth it.)

Now, the code below employs even more trickery. First of all, keep in mind that while-let is more akin to let* in the sense that the bindings are evaluated one after another, so you can use the variable bound in the first one when creating the second one. Next, the very handy function text-property-search-forward called this way searches for a region where the face text property is the same across that region and not nil. (See its docstring for more details.) This means that if prop-match is nil, the while-let will end, but if it is not nil, neither is prop-match-value and the put-text-property will be evaluated, removing the match face whenever it finds one.

(defun mbork/remove-face-text-property (start end face)
  "Remove FACE from properties in the current buffer between START and END."
  (save-excursion
    (save-restriction
      (narrow-to-region start end)
      (goto-char (point-min))
      (while-let
          ((prop-match (text-property-search-forward 'face))
           (prop-match-value (prop-match-value prop-match)))
        (when (mbork/contains-or-equals face prop-match-value)
          (put-text-property (prop-match-beginning prop-match)
                             (prop-match-end prop-match)
                             'face
                             (mbork/remove-face face prop-match-value)))))))

So, to wrap it up, here is the “nicer” version of discover-public-api. Notice that we no longer need the \<occur-mode-map> part – as mbork/remove-face-text-property operates on the current buffer, we needed to switch to it using set-buffer, so occur-mode-map became the current map anyway.

(defun discover-public-api ()
  "Show all public functions, variables etc. in the `*Occur*' buffer."
  (interactive "" emacs-lisp-mode)
  (list-matching-lines "^(\\(?:cl-\\)?def[^ ]+ -?\\([^ -]+-?\\)+\\_>")
  (select-window (get-buffer-window "*Occur*"))
  (set-buffer "*Occur*")
  (let ((inhibit-read-only t))
    (mbork/remove-face-text-property (point-min) (point-max) 'match))
  (message (substitute-command-keys "Press \\[next-error-no-select] and \\[previous-error-no-select] to move around.")))

This is still not ideal – there are lots of ways this code could be improved (as is usually the case). For example, it shows all the opening parens but not the closing ones (unless the (def...) fits in one line, which it almost never does). Also, the pretty complicated regex is still visible at the top of the *Occur* buffer, and that is in fact an implementation detail the user should not be concerned with. In fact, the *Occur* buffer name itself is hardcoded, which is not very good, either (this could be fixed by using occur-1 instead of occur or its alias list-matching-lines, but that function is a bit weird as it does not have a docstring (!), which might mean that it should be considered an “internal” or “private” function). But right now the code is definitely very useful (at least to me) and nice enough to use without the garish yellow, and this post is already long enough, so let’s just stop here. The takeaway here is not that the code is 100% polished, but rather that Emacs has so many useful (and usually pretty general) features that even if you have a need which is somewhat atypical, you don’t always have to code it from scratch.

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

2024-11-11 A situated approach to passwords

It is a well-known mantra that when writing a web application or a similar thing, you should never store your users’ passwords unencrypted.

Well, I’m now going to challenge this idea (a bit). Note: I’m definitely not a security expert by any means, and it’s quite possible that I’m completely wrong. But I think I encountered two cases when storing passwords in plain text is actually pretty fine.

I guess what I’m really trying to say here is that security is just a bunch of trade-offs. If your program is somehow exposed on the network (for example, you can access the database from another machine), it is never “absolutely secure”. And even if you only allow access to your data from the machine the data is one, and that machine is air gapped, and all data reside on an encrypted partition with a very strong password, it is still not “absolutely secure”.

That said, an obligatory disclaimer. Let me repeat, it’s quite possible that I am wrong, and if you do not know a lot about security and you use this very article as a justification for storing passwords in plain text, you are doing it wrong. This is just some rambling, food for thought, written by someone who doesn’t like people telling what they think are absolute rules without any justification. And in fact, encrypting, or hashing, or better, key-stretching passwords is so cheap that there is almost no reason not to use it anyway, so even my use-cases I write about below are rather contrived.

So, when I personally consider storing passwords in plain text an acceptable trade-off? Well, some time ago I read an absolutely fascinating article about a software equivalent of a home cooked meal, also known as situated software. That resonated with me a lot, and I will write about it more in the following weeks. The idea is that some software isn’t meant for the general public (which almost inevitably will include some malicious actors), but for small groups of people instead. (In a special case, this small group is actuall a group of one.) The cases I used the simplistic, naive approach of not encrypting passwords fall into exactly this category.

Here is the first case. I once built a very simple app to render a visual representation of my home budget. I use ledger, and while I like it a lot, I wanted something visual to show me whether I am keeping within my budget or not. Also, I wanted it on the web so that I could open it on my phone while shopping and decide – in a rational way – whether I can afford that fancy thing or not. On the other hand, I wanted some kind of authorization so that only me and my family could access it. This is what I did. First and foremost, the app is 100% read-only, the only thing it does with the ledger file is read it (actually, transform it using ledger itself). And secondly, even if this app had a bug which allows to somehow modify the ledger file, it wouldn’t really matter, since it only has access to a copy of it. (This works in a very simple way: after I update my ledger file, I commit it to Git, and I have a post-commit hook copying it with scp to the server where the app resides.) Now, what could happen if someone somehow got my password? Nothing could be modified, or even if it could, I wouldn’t bother – the worst case would be that my fancy budget chart would be wrong. Equally importantly, I used a unique password I don’t use anywhere else (this is actually the most important thing!), so even if someone gained access to that password, they would only be able to see my monthly budget (and the current state of my monthly expenses). None of it is secret enough that it would be a problem for me (in fact, I probably could make this app totally open to everyone with the right URL and still nothing bad would happen). The method I settled on is trivially simple – the password is a part of a config file, stored in JSON alongside the app. (Interestingly, there are no usernames at all – just a list of passwords, and any of them can be used to log in. It turned out that this created an unexpected problem – I used Passport to implement authorization, and Passport requires a username to operate. I decided to inject a one-line middleware to the POST /login endpoint which just inserts the key username with the value username to the body of the request.) By the way, this approach is very similar to what Oddmuse does. In Oddmuse’s case, the rationale for being apparently lax with even admin passwords is that even an admin usually cannot permanently damage an Oddmuse site.

Another case is similar. Almost a decade ago, I built another app, this time not for myself, but for someone I know who needed such a thing. This app also needed some kind of authorization, but it didn’t contain anything crucial like financial or medical data. Again, the set of users would be very limited (this time it would be a bit higher than single digits number, but still not greater than about a dozene). One argument I had against encrypting passwords was that if I did that, I would have to provide the whole infrastructure for dealing with forgotten passwords. That means that I would have to store email addresses and provide “password reset” links via email. This in turn means that I would have to support sending emails, which is simple, but still needs work and perhaps some maintenance. All of that means work and time I didn’t necessarily want to spend on this. (After I wrote this, I discussed this idea with a friend, who suggested a more secure approach without the overhead – I’ll explain it in a minute.)

Instead, I settled down for another approach. First of all, I decided to store my passwords unencrypted. (In fact, encrypting passwords is in fact very little work, so I could actually change that.) More importantly, even if I decided to encrypt passwords, I would not hash them. Why? Because I decided to use usernames which are explicitly not email addresses, and even more importantly, I didn’t let the user choose their usernames and passwords at all. Yes, you heard that right. The username (aka the login) for every user was set by the admin. The password for every user was just set to a random string of characters. There was an option to reset the password to another random string, and that’s it. Here, the idea is that the administrator (for example me) can physically contact all users and give them their passwords via some other channel. Hence one of the main reasons for encrypting (or hashing passwords) – that any breach might expose the users’ data on other systems where they reuse the password – was gone. And even if someone gained access to this system, again – it didn’t contain any crucial (or personal) data.

After I wrote most of this, I consulted a friend who – unlike me – is a security expert, and he tried very hard to convince me that my approach is wrong. He partially succeeded – I admit that in the latter case I could do better with very little effort. There are two things I could have done to increase security (though let me stress that I still think these gains would be marginal). Firstly, I could avoid storing plain-text passwords and key stretch them instead (which is obviously better than just hashing). Of course, that would mean that the password reset procedure would have to be a bit more involved. The admin would initiate it, the system would show them the new password (just this one time), but then it would completely “forget” the password and only store the “enhanced key” (the result of key stretching) from now on. This way the time the system even has the plain-text password would be reduced to a minimum. (I would still disallow setting the passwords by the users.) Secondly, it is much better to concatenate a few words instead of selecting characters at random, since the resulting passphrase can have very high entropy while being easier to remember. (Predictably, a quick search turned out a lot of tools – from web-based, to command-line, to libraries – to generate diceware-like passwords. Also predictably, the quality of these tools seems to have, so to speak, extreme variability.)

So, this is it for today. I hope no security expert died of heart attack while reading this, and I’m curious if someone can prove me wrong (which, as I said, is quite possible, although I really did think my approach through, so I still would be a bit surprised). Generally, I think the main argument for not storing passwords in plain text is “but it is so little work to do it «correctly»”, and while I agree with that, I am aware that it’s still work. (For the record, I decided to no longer store passwords in plain text in my future projects, just in case and to promote better security practices.)

CategoryEnglish, CategoryBlog

Comments on this page

2024-11-04 Persisting variables across Emacs sessions

Today, I have a short tip to all people who write Elisp and want to preserve some variable other than user customizations between Emacs sessions. (For user settings, configuring them manually in init.el is the standard way to go.) A classic example would be histories of the user’s entries.

In fact, I already mentioned the persist package a long time ago. It is a fairly small but extremely useful tool. What it does is, well, persist a variable across Emacs sessions. As I said above, I find it especially useful for variables keeping various completion histories. If you write any Emacs tool which could benefit from being able to remember things even after quitting Emacs and starting it again, persist is definitely your friend!

CategoryEnglish, CategoryBlog, CategoryEmacs

Comments on this page

More...