Home About me Blog Po polsku

Content AND Presentation

(Note to English-speaking readers: the links entitled Komentarze na tej stronie lead to comment pages.)

2015-01-21 Tree Edit Distance

Today I attended a very interesting lecture on so-called tree edit distance. The lecturer, Mateusz Pawlik, who graduated from my university, has recently earned his PhD at Free University of Bozen-Bolzano in Italy and is currently working at Salzburg.

As I am not very well versed in combinatorial algorithms, the technicalities were a bit overwhelming for me (even though the presentation was really very well done – the balance between ideas and formality seemed just about right). The basic idea is that tree edit distance (as the name suggests) is an analogue of the Levenshtein metric, but for trees. Here, the basic operations are node deletion and insertion, and relabeling (we work on trees with ordered and labeled nodes).

It seems that Mateusz and the team he’s a member of are doing a very good job. What was especially interesting for me were two things. Since mathematical formulae are a good example of the numerous things with tree-like structure, one could apply the algorithm discussed to measure similarity between equations. This might help with the problem on searching for equations in e.g. databases of mathematical papers. (Apparently, some work has been done on exactly that; I will look into this in some Spare Time™.)

The second thing – when I realized that this might be used for that – immediately sent shivers down my spine. As all Lispers know, Lisp programs are basically abstract syntax trees without any syntactic sugar (well, almost any). What if we were able to find the (minimal) set of basic operations to get from tree A to tree B, and use it to display a diff between two versions of some Lisp code? It might be extremely useful (not that line-based diffs are useless, but they don’t survive any major reformatting too well). Of course, this is not trivial: their algorithm computes the distance and not the edit script (as Mateusz explained to me during the Q&A session after the lecture). But from what I understood, in principle it could be done. Of course, there remains the question of the exact repertoire of the “basic operations”: they might not be e.g. paredit's slurping and barfing, for instance. Nevertheless, it looks promising.

Now the “official” implementation (i.e. the one done by the researchers) was done in Java. Just wondering whether redoing it in a more reasonable language would be an interesting project (not one I would like to tackle, though, at least not in the foreseeable future: neither my Lisp-fu nor my combinatorics-fu are sufficient for that, I’m afraid.) Definitely, using Emacs Lisp might not be the best idea (even though being able to diff ASTs in Emacs would be so cool): even though their algorithm is fast, it would probably choke on any medium-sized Lisp project (a wc on one of my projects – and a small one, for that matter – showed 1240 words for about 200 SLOC. Even taking into account that many of these are in the docstrings, this amounts to, say, five or so nodes of the AST per one line of code. Now take simple.el‘s circa 8000 lines…)

As a side note: many of my fellows working in mathematical analysis seem to be thinking that algorithms/programming/graphs/combinatorics are trivial because everything is, well, finite. You know what? It bugs me. It’s really a pity that they did not attend the lecture. Let me dedicate to them this quote of E. W. Dijkstra:

From a bit to a few hundred megabytes, from a microsecond to a half an hour of computing confronts us with completely baffling ratio of 10^9! The programmer is in the unique position that his is the only discipline and profession in which such a gigantic ratio, which totally baffles our imagination, has to be bridged by a single technology. He has to be able to think in terms of conceptual hierarchies that are much deeper than a single mind ever needed to face before. Compared to that number of semantic levels, the average mathematical theory is almost flat. By evoking the need for deep conceptual hierarchies, the automatic computer confronts us with a radically new intellectual challenge that has no precedent in our history.

Summing up: it seems that a lot of fascinating research is being done. We live in an interesting world!

CategoryEnglish, CategoryBlog, CategoryLaTeX, CategoryEmacs

Komentarze na tej stronie

2015-01-17 A quick-and-dirty hack with a dollar sign

I sometimes yank some fragments of text into my LaTeX files. This creates some problems, which can be more or less easily solved by query-replace and friends. The real problem is with math formulae. In general, anyone expecting that copying a formula from a pdf and yanking it into a LaTeX file will Just Work™, will be seriously disappointed. Converting such yanked formulae to correct LaTeX syntax is a painful process.

One can, however, use Emacs to ease the pain a bit. After a handful of such situations, being quite frustrated, I came up with this solution:

(defun TeX-smart-dollar (count)
  "Insert a dollar sign (first skipping any spaces).  If not at end of
line, assume we are at the beginning of a formula with missing dollars
and insert a closing one after a given number of words (one by
  (interactive "p")
  (skip-syntax-forward " " (line-end-position))
  (insert "$")
  (unless (eolp)
    (forward-word count)
    (insert "$")))

(eval-after-load 'tex '(define-key TeX-mode-map (kbd "$") 'TeX-smart-dollar))

The “not at eol” condition, BTW, is here so that this function does not interefere with ordinary typing of TeX formulae (usually when I type, I am at the end of line!).

This isn’t very robust, but it isn’t supposed to be – if the closing dollar is not at the right place, it is usually earlier in the buffer, and then I just press C-t and hold it until the dollar flies to the right spot. (Also, I could have written a piece of code to disable this binding, but I can just use local-set-key and that’s it.) So it is what it is: a quick-and-dirty hack, but still one that can make life easier. And – more importantly – this is again an example of a hack which can be implemented in just a few minutes.

CategoryEnglish, CategoryBlog, CategoryTeX, CategoryEmacs

Komentarze na tej stronie

2015-01-10 A few random Emacs tips

I’ve quite busy recently, so instead of a full-blown post I’ll share just a few random Emacs tips. Hopefully someone will find them useful.

  1. Don’t know where you can find your init.el? (Or – like me – you want to help some poor Windows guy set up his Emacs?) Try this: C-h v user-init-file.
  2. Do you want to stay in Emacs, even if you accidentally hit C-x C-c (yes, this does happen)? Try putting this in your config: (setq confirm-kill-emacs (quote yes-or-no-p)) (You can also check y-or-n-p and y-or-n-p, or – if you want to be fancy – use a lambda expression to perform partial application on y-or-n-p-with-timeout.
  3. Do you use emacsclient? If yes, you must admit that typing C-x # is a pain in the neck. Why don’t you try (global-set-key (kbd "C-x C-3") 'server-edit)?
  4. Wouldn’t you like AUCTeX’s C-c C-c not to ask whether to save the file you are editing? Put (setq TeX-save-query nil) in your config.
  5. This is trivial, but useful. Let’s create some shortcuts for commands you use often enough to be bothered by their long names, but not often enough to give them keybindings. For instance:
    (defun cal () "Shortcut for CALENDAR (without any argument)" (interactive) (calendar)) (Of course, you can use a more elaborate version to allow for an argument, like M-x calendar does.)
  6. If you happen to create shell (or Perl, or Python) scripts etc. in Emacs, adding this to your init.el might come handy:
(add-hook 'after-save-hook

It checks (on saving) whether the file you edit contains a shebang, and if yes, makes it executable.

In fact, there’s nothing here that is not described in the respective manual. You did read them, didn’t you;-)?

CategoryEnglish, CategoryBlog, CategoryEmacs

Komentarze na tej stronie


(Więcej means More in Polish; click it to see older entries.)

CategoryEnglish, CategoryBlog