Some time ago I wrote about sentence counting in Emacs buffers (or regions). I promised a sequel, and here it is.
The real reason to count sentences is that I wanted to be able to automatically put a signature in my email referring to one of the http://sentenc.es/ webpages.
It turns out, however, that counting sentences in email messages is much more difficult than it seems. First of all, you don’t want to count the headers and the signature – but that is trivial to accomplish. What is definitely non-trivial is how to exclude the quotations.
The reason behind the difficulties lies in the fact that quotations are line-based and sentences are not. In fact, while an Emacs “sentence” cannot transcend paragraph boudaries (i.e., blank lines), Emacs does not consider a line consisting only of the quotation prefix (“>”) blank.
There are two possible solutions to that problem. One may be changing the rules the paragraphs are separated. It might work, though I didn’t try that. One problem I’d expect with this approach is this: do we check whether we are in quotation at the beginning or at the end of each sequence? Either way, some partial sequences would be excluded.
I went with another – maybe a bit more complicated – approach. I loop over all parts of the email between quotations, and count sequences in each of them separately. I also take care for all the blank lines and some other things, like the singular/plural issue I mentioned in the previous post. Also, the citation line (which usually says something like “Aunt Milly wrote this on that day:”) is counted among the quotations.
One thing I’m not particularly happy about is that the user has to customize the variable message-quotation-regex. (I define it using a defcustom so that the user can use the Customize feature to set it.) On the other hand, I see no way of generating it programmatically, especially that the citation line may look very differently across setups. (Probably I could use a regex matching just the citation line and attach the "> \\|>$\\|" part myself. Since there are other possible quotation styles, e.g. using indentation instead of the greater-than-sign, I decided it’s not worth it.)
(defcustom message-quotation-regex
"> \\|>$\\|On.*wrote:$"
"A regular expression matching at the beginning of a quotation line.
Most probably should be an alternative of a quotation prefix (usually
\"> \"), an empty quotation line (usually \">$\") and a citation line
(e.g., \"On.*wrote:$\").")
(defun message-in-quotation-p ()
"Return t if the point is within a quotation, including the
citation line."
;; We can't use `message-yank-prefix', since the quotation line may
;; be just the single ">", and the default value of
;; `message-yank-prefix' is "> ".
(save-excursion
(beginning-of-line)
(looking-at-p message-quotation-regex)))
(defun message-count-sentences (&optional print-message)
"Count the sentences in the current message.
Exclude headers, signature and quotation lines. Print the
resulting number if PRINT-MESSAGE is non-nil."
(interactive "p")
(save-excursion
(save-restriction
(narrow-to-region
(progn
(goto-char (point-min))
(search-forward (concat "\n" mail-header-separator "\n") nil t)
(point))
(progn
(goto-char (point-max))
(re-search-backward message-signature-separator nil t)
(skip-chars-backward " \t\n")
(point)))
(goto-char (point-min))
(let ((sentences 0))
(while (not (eobp))
(save-restriction
(narrow-to-region
(point)
(save-excursion
(while (not (or (message-in-quotation-p)
(eobp)))
(forward-line 1))
(skip-chars-backward " \t\n")
(point)))
(while (not (eobp))
(forward-sentence 1)
(setq sentences (1+ sentences))))
(unless (eobp)
(forward-char))
(while (and (not (eobp))
(or (message-in-quotation-p)
(looking-at-p "^[ \t]*$")))
(forward-line 1)))
(if print-message
(message
"%s sentence%s in this message."
sentences
(if (= 1 sentences) "" "s"))
sentences)))))
This is not the end (yet), though. In the future, I’m going to show how I used this function to automatically insert a suitable signature in my email. Also, I’ll most probably define a hydra for insertion of boilerplate stuff like greetings, dividing the quotation into blank-separated paragraphs and moving between them. This is not yet written, but I have a strong incentive to do that: I spend some time reading and writing/responding to email (sometimes on the order of half an hour a day – I know, businesspeople would laugh at this, but I consider it twice as much as I’d like…), and every bit of Elisp which could streamline that is worth quite a bit for me.