For the English part of the blog, see Content AND Presentation.
I’ve been using Borg for many years now. Today, however, I had a very specific need. I needed to back up files in some directory (and its subdirectories, of course) but with the exception of “large” files. Usually I exclude files based on their names, but since I started using gocryptfs, I didn’t really know the names of the files I wanted to omit from the backup.
Borg’s manual contains an example of exactly that:
find ~ -size -1000k | borg create --paths-from-stdin /path/to/repo::small-files-only
but it didn’t work for me. The problem was simple – either you provide the list of files to back up on the command line or in stdin
, but not both:
borg: error: Must not pass PATH with ``--paths-from-stdin``.
On the other hand, I needed to provide the explicit list of files and directories in my home directory to back up. What to do, then?
It turns out that there exists a simple solution. Borg create
command has the --exclude
and --exclude-from
parameters which allow to exclude given files – either giving a list of paths (possibly using wildcards) or a file containing such list.
So, here is my solution. I used process substitution to create a “temporary filename” to pass the names of all the “big” files to Borg’s --exclude-from
. Assuming that I want to back up the directory to-backup
, I did this:
borg create --exclude-from <(find to-backup/ -size +100k) repository::archive to-backup/
That’s it for today, see you next time!
A few days ago I was scripting a Node.js project and I had a very specific need. I wrote a script run as postinstall
, but I wanted to launch it only in specific circumstances. When I found out that npm
clean-install
has an --ignore-scripts
option, I was really glad – until I tried it out.
This is what the documentation says about this option:
If true, npm does not run scripts specified in package.json files.
Note that commands explicitly intended to run a particular script, such as npm start, npm stop, npm restart, npm test, and npm run-script will still run their intended script if ignore-scripts is set, but they will not run any pre- or post-scripts.
Does that imply that – for example – the postinstall
script is not run? Yes, it does. Does it imply that when installing individual packages into node_modules
, their postinstall
scripts are not run? Hard to say, but that is exactly what happens! In my case, the project contained the esbuild package, which relies on postinstall
to do some rather mysterious stuff related to installing its binary components. (I tried to fathom its postinstall
script, but gave up after a few minutes – it’s certainly doable, but I don’t need this knowledge that much.)
One way to solve this is to issue the npm rebuild command, either in every subdirectory of node_modules
(which, I’m afraid, might take ages…) or just in those ones which actually need it. The question is, how do I know which ones do? It occurred to me later that I can easily write some code to parse package.json
of every installed module, but this is definitely a case of fighting tools which were supposed to help you…
(Warning: rant incoming, again…)
What’s even worse, while writing this post, I discovered that the npm documentation’s search feature is practically worthless. For example, typing postinstall
(or post-install
, for that matter) into the search box reveals no results. Interestingly, while I type postinstall
, different things happen. After typing p
, a bunch of 20 suggestions appears; funnily enough, one of them is a page not found link. Well, it actually starts with p
, so it seems legit… When I type o
(so I have po
in the search box), another set of 20 suggestions appears. I have a hunch that there are less pages matching po
than just p
, so there must be a limit of 20 suggestions shown. Why someone decided that a 404 page is useful enough to include in the suggestions for the p
prefix is of course beyond me.
But wait, there’s more. When I type the next few letters, arriving at posti
, I get no suggestions at all. But typing n
(so that now I have postin
in the search box) yields a suggestion of Reporting malware in an npm package, a page which does not seem to contain the string postin
anywhere! I have no idea how this can happen… Even assuming it has postin
in some kind of metadata, how come posti
didn’t reveal it as well?
Last but not least, as I said, typing postinstall
(or post-install
) does not reveal anything at all.
Now, I am pretty accustomed to using the Info documentation system present in Emacs. It works infinitely better than npm’s docs:
Compared to Emacs manuals in Info, npm docs are technologically inferior and have significantly lower quality. It appears that I’m better off using DuckDuckGo (or better, Kagi) with a site:
filter.
Anyway, it’s time for a conclusion – except that I don’t have any, apart from the obvious one that both the design and documentation of npm are less than ideal. Just be warned that the npm
CLI tool does not always do what you wish it were doing…
So, it’s been over a year since the previous part of my attempt to introduce “irregular, but recurring TODOs” into my workflow. This is something I really need, but it’s complicated enough that my inner procrastinator kept putting it off, unfortunately.
Let’s start differently now. First of all, let’s set the “boundary conditions”, or the design goals, more explicitly than before. I don’t want the number of reviews per day to fluctuate too much – that is a given. I also do not want the intervals between reviews to rise dramatically. My goal is not to learn the material, but to be periodically reminded about it, so I prefer reviews in fairly uniform (as opposed to exponentially increasing) intervals. (The first few intervals being larger and larger seems fine, but after that they should stabilize.) Let’s determine the minimum and maximum interval this time. I think that the first interval should be about one week long, the second one about a month, and every subsequent one should be about 2 months. (These numbers are pretty arbitrary, but “seem reasonable”.) Also, every one of these may be made longer, but not too much. So, here is another idea.
Let’s begin with setting the maximum daily number of reviews, and set it to one (initially). After every review of an item, we will schedule the next one for that item, according to the following algorithm:
d
, maximum number of reviews per day m
and the number of reviews n
this item already had, the function i
mapping the number of the reviews to the minimum interval to the next one, and a fixed quotient q
,{d+i(n),...,d+qi(n)}
when the number of reviews scheduled is less than m
, and schedule the review then, if such a day exists,m
and schedule the review on a random day in {d+i(n),...,d+qi(n)}
otherwise.(In the first draft, I wrote that the next review should happen in the first day of the respective set, but I changed it to be more random. The reason was that I was afraid that the order of reviewing the items will stay the same – when some item B is shown after some item A, it might always be shown after A, and I didn’t want that.)
The drawback of this idea is that I’ll need a way to easily answer the question: given a particular day, how many reviews are scheduled then? Assuming that the date of review is associated with the item (for example, stored in its properties), this means being able to scan all the items rather quickly to find out how many are scheduled to d+i(n)
, to d+i(n)+1
and so on. Org mode is not best suited for that, even though I suspect that the number of items will not be big enough to create a performance problem for the user. Still, maybe it would be better to have a real database for that.
Well, good thing Emacs comes with one, then! Since Emacs 29, the sqlite3
library is one of Emacs’ components. Why not utilize that? One reason is that it would be premature optimization. Let’s keep the idea of using SQLite in mind, but for now Org properties should be more than enough.
But first let’s make another simulation. This time, though, I think I can at least try to predict what is going to happen. Previously, I had no idea what intervals would be selected – my formula determined them only implicitly. Now, the interval lengths are going to be more or less predetermined, so the non-obvious variable is the number of daily reviews. But this time, it can be estimated. Let us assume the following “interval function”:
(defun recurring-next-interval (review-number) "Return the minimum interval for the next review." (cl-case review-number (1 7) (2 30) (t 60)))
and let us set q=2
(so that the second review after the initial one will happen in between 7 and 14 days, for example). This means that every item will be reviewed at least once every 120 days and at most once every 60 days (after a few initial reviews which are going to be a bit more frequent). While the first few repetitions will happen more often, let’s assume that the average interval between repetitions is going to be 90 days. Assume also that I will add one item per 8 days to the system (and I think this is a safe upper bound). After a year, I’m going to have about 45 items then, so one review per day will stop being enough after two years. In other words, every two years of using the system will add roughly one review per day to my load. This seems to be acceptable for me – if it turns out it’s not, I can always increase the intervals after some time.
Ok, so let’s confirm these back-of-the-envelope calculations. Beware, a long piece of not-the-best-quality Elisp code follows! (Since this is throwaway code, I didn’t bother with good practices etc.)
;; Recurring TODOs - simulation, second attempt (require 'cl-lib) (defvar recurring-todos () "A list of \"TODO items\" as plists -- the properties are :id (an integer), :reviews (dates of review, integers, starting with the most recent one) and :next (date of the next review).") (defvar recurring-counter 0 "The value of :id for the next item created.") (defvar recurring-date 0 "The \"date\" (number of days elapsed from the beginning of the experiment).") (defvar recurring-buffer-name "*Recurring TODOs simulation data*" "Data about recurring TODOs simulation as csv. Every row corresponds to one review (including the first one, i.e., addition of the item to the system).") (get-buffer-create recurring-buffer-name) (with-current-buffer recurring-buffer-name (insert "date,id,review,interval\n")) (defun recurring-add-review-datapoint (date id review interval) "Add a datapoint about a review to buffer `recurring-buffer-name'." (with-current-buffer recurring-buffer-name (goto-char (point-max)) (insert (format "%s,%s,%s,%s\n" date id review interval)))) (defun recurring-add-empty-row () "Add an empty row to buffer `recurring-buffer-name', signifying that the maximum number of repetitions per day was increased." (with-current-buffer recurring-buffer-name (goto-char (point-max)) (insert "\n"))) (defun recurring-add-todo () "Add a new recurring todo to `recurring-todos'." (let ((new-item (list :id recurring-counter :reviews () :next nil))) (recurring-review-item new-item) (push new-item recurring-todos) (cl-incf recurring-counter))) (defun recurring-next-day () "Increment `recurring-date'." (cl-incf recurring-date)) (defun recurring-last-review (todo) "The date of the last review of TODO." (car (plist-get todo :reviews))) (defun recurring-number-of-reviews (todo) "The number of reviews of TODO so far." (length (plist-get todo :reviews))) (defun recurring-next-review (todo) "The date of the next review." (plist-get todo :next)) (defun recurring-next-interval (review-number) "Return the minimum interval for the next review." (cl-case review-number (1 7) (2 30) (t 60))) (defvar recurring-factor 2 "The maximum factor an interval may be multiplied by.") (defvar recurring-max-per-day 1 "The maximum number of reviews per day. Initially 1.") (defun recurring-number-of-reviews-on-day (date) "The number of reviews scheduled for DATE." (cl-reduce (lambda (count todo) (if (= date (recurring-next-review todo)) (1+ count) count)) recurring-todos :initial-value 0)) (defun recurring-compute-next-review (todo) "Return the date of the next review of TODO." (let* ((interval (recurring-next-interval (recurring-number-of-reviews todo))) (min-date (+ recurring-date interval)) (max-date (+ recurring-date (ceiling (* recurring-factor interval)))) (possible-dates (cl-remove-if-not (lambda (date) (< (recurring-number-of-reviews-on-day date) recurring-max-per-day)) (number-sequence min-date max-date)))) (if possible-dates (seq-random-elt possible-dates) (cl-incf recurring-max-per-day) (recurring-add-empty-row) (recurring-compute-next-review todo)))) (defun recurring-review-item (todo) "Review the TODO item." (recurring-add-review-datapoint recurring-date (plist-get todo :id) (1+ (length (plist-get todo :reviews))) (- recurring-date (or (car (plist-get todo :reviews)) recurring-date))) (push recurring-date (plist-get todo :reviews)) (setf (plist-get todo :next) (recurring-compute-next-review todo))) (defun recurring-review-for-today () "Review items for the current day." (mapc #'recurring-review-item (cl-remove-if-not (lambda (todo) (= (recurring-next-review todo) recurring-date)) recurring-todos))) (defun recurring-reset () "Reset the recurring reviews simulation." (setq recurring-todos () recurring-counter 0 recurring-date 0 recurring-max-per-day 1)) (defun recurring-simulate (iterations new-frequency) "Simulate ITERATIONS days of reviewing TODOs. NEW-FREQUENCY is the probability of adding a new TODO every day. Do not reset the variables, so that a simulation can be resumed." (dotimes-with-progress-reporter (_ iterations) "Simulating reviews..." (when (< (cl-random 1.0) new-frequency) (recurring-add-todo)) (recurring-review-for-today) (recurring-next-day)))
This time I ran the simulation for 4 years assuming that I add one item every 8 days on average at first, just to see what happens. (In fact, I’ve been actually gathering items for repeating in this system for about one and a half years now, and I have 51 of them so far.) It turned out that I reached 3 repetitions per day (which is roughly consistent with my expectations), and the average interval between repetitions was about 70 days (almost 80 in the fourth year alone). This looks very promising. The second experiment involved 10 years with one item added to the system every 5 days, and the average interval turned out to be 82 days (87 in the last year); the maximum number of repetitions per day reached 9, which is a tiny bit worrying, but probably still acceptable. (Assuming many of my TODOs are of the form “read this article again to be reminded of it”, 9 potentially long articles per day doesn’t look very good – but it just occurred to me that as part of an earlier repetition I might want to summarize the article, which is also very good for keeping it in long-term memory.) Also, if I decide that the daily load is too high, I can just increase the intervals, or even drop some of my items if I decided I no longer need to be reminded of them. Either way, 10 years is long enough that I most probably don’t need to worry about it.
So, the next time I write about this, I really hope to have a functional if minimal setup – in fact, I am (slowly) working on it.
That’s it for today, see you in a few days with the next article!
CategoryEnglish, CategoryBlog, CategoryEmacs, CategoryOrgMode