2018-01-22 Info-edit

There was an interesting (and sometimes quite amusing) discussion about Info on the help-gnu-emacs mailing list. Apart from learning that there exist people who do not even want to use the amazing Info mode, I learned that there is a very little known Emacs command Info-edit. It used to be bound to e in Info, but has been deprecated some time ago and now the only way to invoke it seems to be by M-: (Info-edit). It puts the browsed Info buffer into an editing mode; after the edit, you can press C-c C-c and be asked about where to save the edited file.

I do not plan to actually use it, but it is interesting to know that there exists such a function, possibly inspired by the American habit of making notes on the books’ margins. During the discussion, I also learned that before texinfo, Info files were created and edited by hand. (In that time, the Info-edit function made more sense, since nowadays they will get rewritten by each install of a new Emacs version – that’s exactly why I don’t expect to use Info-edit.)

Of course, if you want to save the Info file in its usual place (this is /usr/local/share/info, you need root privileges. This is not a problem in contemporary Emacsen – you just prepend /sudo:: to the (absolute) path, and Tramp kicks in: asks for the password and sudo-saves the file (assuming you have permissions to run sudo, of course).

This is yet another example of Emacs having many hidden gems sprinkled throughout its codebase. It is really a pity that this function is not useful anymore (both because each reinstall would overwrite these changes and because the command is deprecated). I wonder whether it would be possible to modify Info-edit so that it would jump to the texinfo source instead. In my case (I have full Emacs sources in the local clone of the Git repo), that could allow me to e.g. instantly fix some mistake and send a patch to the developers!

2018-01-15 Counting LaTeX commands in a bunch of files

I hope that I want bore anyone to death with blog posts related to the journal I’m working for, but here’s another story about my experiences with that.

I am currently writing a manual for authors wanting to prepare a paper for Wiadomości Matematyczne. We accept LaTeX files, of course, but we have our own LaTeX class (not yet public), and adapting what others wrote (usually using article) is sometimes a lot of work. Having the authors follow our guidelines could make that slightly less work, which is something I’d be quite happy with. (Of course, making a bunch of university mathematicians do something reasonable would be an achievement in itself.)

When I presented (the current version of) the manual to my colleagues in the editorial board, we agreed that nobody will read it anyway. And then I had an idea of preparing a TL;DR version, just a few sentences, where I could mention the one thing I want to get across: dear authors, please do not do anything fancy, just stick with plain ol’ LaTeX. And one component of that message could be a list of LaTeX commands people should stick to. (If you have never worked for a journal or somewhere where you get to look at other people’s LaTeX files, you probably have no idea about what they are capable of doing.)

So here I am, having 200+ LaTeX files (there are twice as many, but I had only about 200 on my current laptop), meticulously converted to our template (which means our class and our local customs, like special commands for various dashes or avoiding colons at all costs), and I want to prepare a list of LaTeX commands used throughout together with the information about the frequency of using them.

In ye olden days, people would use Perl for that. Nowadays, Python would be probably a more common choice. But if you learn to use a hammer, everything starts to look like a nail, no? Enter Emacs Lisp.

Actually, I decided to use it also because I have already written some stuff for parsing LaTeX files. (I’ll blog about it some day; the coolest thing I have there is the analogue of show-paren-mode for “pairs” like \bigl( ... \bigr], and the ability to change this into e.g. \Bigl( ... \Bigr] etc. with one command.) After all, it turned out that I didn’t need those features that much anyway. The only thing I used was the TeX+-info-about-token-beginning-at-point, which returns a cons cell whose car is the TeX token starting at point and whose cdr is a symbol describing its type.

I approached the problem in a truly Lispy, bottom-up style. I started with a count-TeX-macros-in-current-buffer function, receiving and returning an alist of macros and their frequencies. Then count-TeX-macros-in-file followed, which first visited a file (using with-temp-buffer and insert-file-contents-literally, of course). Finally, count-TeX-macros-recursively received a directory and a regex and performed the count in all files whose names matched the regex in and below the given directory. Sorting (by descending frequency) and displaying the results were just the topping.

The thing that astonished me the most was the speed of this. Since I did not attempt any premature optimization, I expected my code to work for anything between maybe ten seconds and a few minutes. I certainly did not expect less than one second, which was really cool.

Also, please note that this is a quick-and-dirty, one-shot code, which is therefore not very clean. I don’t intend to waste too much time polishing this, it’s simple enough that if you want to play with it, you should be able to understand the code in ten minutes or so.

Finally, I didn’t bother to count environments, only commands. I might extend my code to environments one day, too, but I do not expect ay surprises. (document, enumerate, maybe itemize, a sprinkling of figure, table and tikzpicture, and the obvious math stuff – that would pretty much be it, I guess.)

(require 'tex+)

(defun count-TeX-macros-in-current-buffer (histogram)
"Return an alist of macros in the current buffer.
HISTOGRAM is the input we should add to."
(save-mark-and-excursion
(save-restriction
(widen)
(goto-char (point-min))
(while (search-forward "\\" nil t)
(backward-char)
(freq (assoc (car token) histogram)))
(if (memq (cdr token) '(control-symbol control-word))
(if freq
(incf (cdr freq))
(setq histogram (cons (cons (car token) 1) histogram)))))
(skip-chars-forward "\\\\" (+ (point) 2)))
histogram)))

(defun count-TeX-macros-in-file (file histogram)
"Count TeX macros in FILE and add that info to HISTOGRAM."
(with-temp-buffer
(insert-file-contents-literally file)
(setq histogram (count-TeX-macros-in-current-buffer histogram))))

(defun count-TeX-macros-recursively (directory regex)
"Count TeX macros in files in DIRECTORY (recursively) whose
names match REGEX."
(let ((files (directory-files-recursively "." regex))
(histogram '()))
(while files
(message (concat "Analyzing " (file-name-nondirectory (car files)) "..."))
(setq histogram (count-TeX-macros-in-file (car files) histogram))
(message (concat "Analyzing " (file-name-nondirectory (car files)) "...done"))
(setq files (cdr files)))
histogram))

(defun sort-histogram (histogram)
"Sort HISTOGRAM (destructively) by frequency."
(sort histogram (lambda (a b) (> (cdr a) (cdr b)))))

(defun insert-histogram (histogram)
"Insert frequency data from HISTOGRAM in a human-readable
format."
(setq histogram (sort-histogram histogram))
(newline)
(while histogram
(insert (format "%-24s %d\n" (car (car histogram)) (cdr (car histogram))))
(setq histogram (cdr histogram))))


And the winner is, of course, the results. And they did surprise me. It turns out that the most common macros are $$ and $$ (which is not surprising, since we automatically convert $...$ to them). The silver medal goes to \emph (again, no surprise here). Then, we have (in roughly this order):

• \cite and \bib
• \begin and \end
• \ppauza (which is a Polish version of an en-dash, with proper spacing around and a non-breakable space before the dash; this one is defined in the polski package)
• \, (used in math a lot)
• \' (the first surprise)
• \item
• \dywiz (a Polish version of a hyphen, which, when the word is actually hyphenated, should be repeated at the end of the former line and at the beginning of the latter one; also defined in the polski package)
• \\
• \polishendash (which is a stupid name, but this is our macro which acts more or less like \dywiz, but has the length of an en-dash; this is different than \ppauza, since there is no spacing around it and it is repeated when hyphenated, just like \dywiz)
• \!, which we use quite a lot
• \label (which is – surprisingly – used more often than \ref!)
• \usepackage (on average, twice per document, and every one of them uses inputenc!)
• \newcommand (which was another surprise)
• $ and $
• \" – for some strange reason
• \citelist
• \ref, promptly followed by \eqref
• stuff like \documentclass and \footnote
• \section
• metadata like \author

All that interspersed by some of our internal macros, and a lot of stuff used in math, like \in and \int and \ln and \left and so on.

The bottom line of this research is this: if you are an author of a paper for Wiadomości Matematyczne (or most other math journals, I presume), you should not use any fancy TeX stuff. Basically, the only commands you will most probably need (outside the preamble/template, of course – I don’t count stuff like \author here), are \emph (never \em or even \textit!), \section (and sometimes \subsection), \label and \ref, probably \cite and an occasional \footnote or \item. And, of course, various math symbols. Anything above that and you may safely assume that you are a troublemaker for the editors. (And by the way, if you claim that “LaTeX is too hard”, here’s my (a bit unpleasant) answer: if you are a mathematician and can’t learn how to use about ten commands, probably another ten environments plus the math symbols you actually need, please stop whining about “difficulty” and choose another profession.)

2018-01-07 A small editing tool for work with AMSrefs

As I mentioned many times, I often edit LaTeX files written by someone else for a journal. One thing which is notoriously difficult to get right when writing academic papers is bibliographies. At Wiadomości Matematyczne, we use AMSrefs, which is really nice (even if it has some rough edges here and there). (BTW, BibLaTeX was not as mature as it is today when we settled on our tool; also, AMSrefs might be a tad easier to customize, though I’m not sure about that anymore…) One of the commands AMSrefs offers is \citelist. Instead of writing things like papers \cite{1}, \cite{2} and~\cite{3}, you write papers \citelist{\cite{1}\cite{2}\cite{3}}, and AMSrefs sorts these entries and compresses runs into ranges (like in [1-3]).

The only problem is that most authors have no idea that this exists, and we often have to convert “manual” lists of citations into \citelist‘s.

Well, as usual, Emacs to the rescue. Here’s what I have written.

(defun skip-cite-at-point ()
"Move point to the end of the \\cite at point."
(when (looking-at "\\\\cite")
(forward-char 5)
(cond ((= (char-after) ?\[)
(forward-sexp 2))
((= (char-after) ?\{)
(forward-sexp)
(when (and (not (eobp))
(= (char-after) ?*))
(forward-char)
(forward-sexp)))
(t (error "Malformed \\cite")))))

(defun cite-to-citelist ()
"Convert region to a \\citelist command.
All \\cite's are preserved and things between them deleted.
This command will be fooled by things like \"\\\\cite\"."
(interactive)
(if (use-region-p)
(let ((end (copy-marker (region-end))))
(goto-char (region-beginning))
(insert "\\citelist{")
(while (< (point) end)
(skip-cite-at-point)
(delete-region (point)
(if (search-forward "\\cite" end t)
(progn
(backward-char 5)
(point))
end)))
(insert "}"))
(message "Region not active")))


It might contain some subtle bug, but I really hope it doesn’t – and it will get thoroughly tested very soon.

Notice how nice it is to craft such little editing tools in Emacs. You basically mimic your editing process, i.e., tell the machine what you do by hand to accomplish the goal. And not only do you have obvious things like forward-char, but also more complicated building blocks like forward-sexp.

Also, in case you wonder about the intricacies of the skip-cite-at-point function, AMSrefs’s \cite supports the traditional \cite[p. 123]{1} syntax, but also introduces its own: \cite{1}*{p. 123}. While quite unorthodox for a LaTeX command, it makes life easier for all people who want to put a \cite in an optional argument to things like \begin{theorem} ... \end{theorem} (which is a very common use case). Since LaTeX does not do proper pairing of brackets when parsing optional parameters, normally you need to enclose the whole \cite[...]{...} in additional curly braces – AMSrefs’ syntax makes that unnecessary.

Anyway, in the case anyone needs something like that, here it is. And even if nobody does, maybe this can be an encouragement to write your own snippets like this to help automate your common tasks.