2024-11-18 Discovering functions and variables in Elisp files

Sometimes I have an Elisp file which I suspect contains some useful functions. Even if the file is well-documented (for example, it belongs to Emacs itself), that does not mean that every function in it is described in the manual. What I need in such a case is a list of functions and variables (possibly also macros) defined in this file.

My usual solution was to isearch for defun (or (defun, or (def, etc.) and just skim the file, pressing C-s repeatedly. (Sometimes I look for the string (interactive, too, for obvious reasons.) It occurred to me recently (pun intended;-)) that I could use Occur for that. Typing M-s o, then (def and RET gives me a list of all defun​s, defvar​s, defcustom​s, defmacro​s etc. What’s even better, pressing n and p in the Occur buffer immediately moves point to the corresponding place in the searched buffer, so that I can easily see the docstrings of the things I found!

I can do better, though. This simplistic approach does not take into account the fact that some functions and variables are “internal” or “private” and are explicitly not the part of the “official” API of the package in question. These “private” entities are easily recognized, since the convention is to use a double dash in their names.

At first, I wanted to write some custom Elisp to generate the occur buffer first, then remove the double-dashed lines from it, or maybe don’t put them there at all, although that would probably require me to construct that buffer completely by hand, not relying on the occur command. (Of course, you can’t construct a regular expression matching “every line that starts with (def, but excluding lines containing --”, so it seemed that a bit of Elisp is necessary here.) Don’t get me wrong – I have nothing against writing Elisp;-) – but why work if you don’t have to? I had a bright idea and wrote this regex instead:

^(def[^ ]+ -?\(?:[^ -]+-?\)+\_>

Let’s analyze it. It matches first the string (def at the beginning of line, followed by one on more non-space characters and then a space – this is just any (defun, (defvar etc. Then the magic happens. The regex in the shy group matches one or more occurrences of one or more characters other than a space or a dash, followed by an optional dash. This means that any valid Elisp symbol except ones that contain double dashes should match it. (Well, a valid symbol could also begin with a dash – that’s why there is -? before the group. Also, there are things which are not valid symbols matching this regex, too, for example some strings containing parentheses – but they should not appear right after a def-something in a syntactically correct Elisp buffer, and even if they did, I don’t care about extremely rare false positives.) I ended the regex with \_> to make sure that the repeated shy group captures the whole symbol. Otherwise, when there is a double dash in it, the regex would just match its part until that double dash.

I have to admit that I am pretty proud of this – I think it is a pretty clever hack. It doesn’t mean that I won’t write any Elisp at all – I decided to wrap in in the following command:

(defun discover-public-api ()
  "Show all public functions, variables etc. in the `*Occur*' buffer."
  (interactive "" emacs-lisp-mode)
  (list-matching-lines "^(\\(?:cl-\\)?def[^ ]+ -?\\(?:[^ -]+-?\\)+\\_>")
  (select-window (get-buffer-window "*Occur*"))
  (message (substitute-command-keys "\\<occur-mode-map>Press \\[next-error-no-select] and \\[previous-error-no-select] to move around.")))

(I added one thing to the regex – an optional cl- prefix, for things like cl-defun and friends. Also, I used the mode indication for interactive.)

Note that the \<occur-mode-map> part is needed, because at the point when message is evaluated, the *Occur* buffer is not the current buffer – it will be made so by the command loop after the function finishes. Therefore I need to tell substitute-command-keys to use occur-mode-map explicitly. My approach here – to use select-window and count on the command loop to switch to the *Occur* buffer – is definitely not a good practice, but at least allows me to show the \[...] construct;-). Also, this will be changed in a few minutes anyway.

Now, this is enough for my use, but definitely not something production-grade and nice – for example, the lines in the occur buffer have the part matching the regex highlighted. For this regex, this includes everything from the opening parenthesis up to the end of the symbol, which looks a bit weird. (This could be remedied by adding .*$ to the regex, but it would mean that the highlighting face – which by default has a bright yellow background – would be used all over the place. While arguably more “consistent”, it would look even worse.) This can be fixed, too, of course – this is Emacs, after all – but it is slightly more work than one could expect.

The highlighting is done by adding the match face to the face property. This means that the value of that property is either just match or a list of faces beginning with match. This way, all the fontification is carried over from the Elisp buffer to the Occur buffer, which is a desirable behavior. For example, faces like font-lock-keyword-face, used in Elisp buffers, are still used in the *Occur* buffer. It is fairly easy to remove all faces from text properties:

(let ((inhibit-read-only t))
  (remove-text-properties (point-min)
                          (point-max)
                          '(face nil)))

(the *Occur* buffer is normally read-only, hence the inhibit-read-only), but this is not what I would like to do, since it also removes the faces installed there by font locking. It turns out that removing only the match face from the whole buffer is a surprising amount of work. Elisp has the add-face-text-property function which can add a face to a text (possibly resulting in combining more than one face), but no remove-face-text-property one. Of course, it’s not impossible to write it, but it is a bit tricky. One of the reasons is that the face text property can be either a symbol (denoting a face) or a list of such symbols. This is nothing difficult to deal with, but it adds complexity, and it helps to have some functions dealing with it:

(defun mbork/contains-or-equals (needle haystack)
  "Return non-nil if NEEDLE is `eq' to HAYSTACK or HAYSTACK cotains NEEDLE."
  (or (eq needle haystack)
      (and (listp haystack)
           (memq needle haystack))))

(defun mbork/remove-face (face face-prop)
  "Remove FACE from FACE-PROP and return the result.
If FACE-PROP is `eq' to FACE, return nil.  If FACE-PROP is a list,
return the result of `(remq face face-prop)'.  Otherwise, return
FACE-PROP."
  (cond ((eq face face-prop)
         nil)
        ((listp face-prop)
         (remq face face-prop))
        (t face-prop)))

If the face property were always a list (possibly containing just one element), it would suffice to use memq and remq in place of those functions. (Note that mbork/remove-face can output a list consisting of just one element, which could then be turned into just that element in the context of the face property – but I don’t think the added complexity would be worth it.)

Now, the code below employs even more trickery. First of all, keep in mind that while-let is more akin to let* in the sense that the bindings are evaluated one after another, so you can use the variable bound in the first one when creating the second one. Next, the very handy function text-property-search-forward called this way searches for a region where the face text property is the same across that region and not nil. (See its docstring for more details.) This means that if prop-match is nil, the while-let will end, but if it is not nil, neither is prop-match-value and the put-text-property will be evaluated, removing the match face whenever it finds one.

(defun mbork/remove-face-text-property (start end face)
  "Remove FACE from properties in the current buffer between START and END."
  (save-excursion
    (save-restriction
      (narrow-to-region start end)
      (goto-char (point-min))
      (while-let
          ((prop-match (text-property-search-forward 'face))
           (prop-match-value (prop-match-value prop-match)))
        (when (mbork/contains-or-equals face prop-match-value)
          (put-text-property (prop-match-beginning prop-match)
                             (prop-match-end prop-match)
                             'face
                             (mbork/remove-face face prop-match-value)))))))

So, to wrap it up, here is the “nicer” version of discover-public-api. Notice that we no longer need the \<occur-mode-map> part – as mbork/remove-face-text-property operates on the current buffer, we needed to switch to it using set-buffer, so occur-mode-map became the current map anyway.

(defun discover-public-api ()
  "Show all public functions, variables etc. in the `*Occur*' buffer."
  (interactive "" emacs-lisp-mode)
  (list-matching-lines "^(\\(?:cl-\\)?def[^ ]+ -?\\([^ -]+-?\\)+\\_>")
  (select-window (get-buffer-window "*Occur*"))
  (set-buffer "*Occur*")
  (let ((inhibit-read-only t))
    (mbork/remove-face-text-property (point-min) (point-max) 'match))
  (message (substitute-command-keys "Press \\[next-error-no-select] and \\[previous-error-no-select] to move around.")))

This is still not ideal – there are lots of ways this code could be improved (as is usually the case). For example, it shows all the opening parens but not the closing ones (unless the (def...) fits in one line, which it almost never does). Also, the pretty complicated regex is still visible at the top of the *Occur* buffer, and that is in fact an implementation detail the user should not be concerned with. In fact, the *Occur* buffer name itself is hardcoded, which is not very good, either (this could be fixed by using occur-1 instead of occur or its alias list-matching-lines, but that function is a bit weird as it does not have a docstring (!), which might mean that it should be considered an “internal” or “private” function). But right now the code is definitely very useful (at least to me) and nice enough to use without the garish yellow, and this post is already long enough, so let’s just stop here. The takeaway here is not that the code is 100% polished, but rather that Emacs has so many useful (and usually pretty general) features that even if you have a need which is somewhat atypical, you don’t always have to code it from scratch.

CategoryEnglish, CategoryBlog, CategoryEmacs