2019-12-23 Counting sentences in emails

Some time ago I wrote about sentence counting in Emacs buffers (or regions). I promised a sequel, and here it is.

The real reason to count sentences is that I wanted to be able to automatically put a signature in my email referring to one of the http://sentenc.es/ webpages.

It turns out, however, that counting sentences in email messages is much more difficult than it seems. First of all, you don’t want to count the headers and the signature – but that is trivial to accomplish. What is definitely non-trivial is how to exclude the quotations.

The reason behind the difficulties lies in the fact that quotations are line-based and sentences are not. In fact, while an Emacs “sentence” cannot transcend paragraph boudaries (i.e., blank lines), Emacs does not consider a line consisting only of the quotation prefix (“>”) blank.

There are two possible solutions to that problem. One may be changing the rules the paragraphs are separated. It might work, though I didn’t try that. One problem I’d expect with this approach is this: do we check whether we are in quotation at the beginning or at the end of each sequence? Either way, some partial sequences would be excluded.

I went with another – maybe a bit more complicated – approach. I loop over all parts of the email between quotations, and count sequences in each of them separately. I also take care for all the blank lines and some other things, like the singular/plural issue I mentioned in the previous post. Also, the citation line (which usually says something like “Aunt Milly wrote this on that day:”) is counted among the quotations.

One thing I’m not particularly happy about is that the user has to customize the variable message-quotation-regex. (I define it using a defcustom so that the user can use the Customize feature to set it.) On the other hand, I see no way of generating it programmatically, especially that the citation line may look very differently across setups. (Probably I could use a regex matching just the citation line and attach the "> \\|>$\\|"​ part myself. Since there are other possible quotation styles, e.g. using indentation instead of the greater-than-sign, I decided it’s not worth it.)

(defcustom message-quotation-regex
  "> \\|>$\\|On.*wrote:$"
  "A regular expression matching at the beginning of a quotation line.
Most probably should be an alternative of a quotation prefix (usually
\"> \"), an empty quotation line (usually \">$\") and a citation line
(e.g., \"On.*wrote:$\").")

(defun message-in-quotation-p ()
  "Return t if the point is within a quotation, including the
citation line."
  ;; We can't use `message-yank-prefix', since the quotation line may
  ;; be just the single ">", and the default value of
  ;; `message-yank-prefix' is "> ".
    (looking-at-p message-quotation-regex)))

(defun message-count-sentences (&optional print-message)
  "Count the sentences in the current message.
Exclude headers, signature and quotation lines.  Print the
resulting number if PRINT-MESSAGE is non-nil."
  (interactive "p")
	 (goto-char (point-min))
	 (search-forward (concat "\n" mail-header-separator "\n") nil t)
	 (goto-char (point-max))
	 (re-search-backward message-signature-separator nil t)
	 (skip-chars-backward " \t\n")
      (goto-char (point-min))
      (let ((sentences 0))
	(while (not (eobp))
	       (while (not (or (message-in-quotation-p)
		 (forward-line 1))
	       (skip-chars-backward " \t\n")
	    (while (not (eobp))
	      (forward-sentence 1)
	      (setq sentences (1+ sentences))))
	  (unless (eobp)
	  (while (and (not (eobp))
		      (or (message-in-quotation-p)
			  (looking-at-p "^[ \t]*$")))
	    (forward-line 1)))	
	(if print-message
	     "%s sentence%s in this message."
	     (if (= 1 sentences) "" "s"))

This is not the end (yet), though. In the future, I’m going to show how I used this function to automatically insert a suitable signature in my email. Also, I’ll most probably define a hydra for insertion of boilerplate stuff like greetings, dividing the quotation into blank-separated paragraphs and moving between them. This is not yet written, but I have a strong incentive to do that: I spend some time reading and writing/responding to email (sometimes on the order of half an hour a day – I know, businesspeople would laugh at this, but I consider it twice as much as I’d like…), and every bit of Elisp which could streamline that is worth quite a bit for me.

CategoryEnglish, CategoryBlog, CategoryEmacs