2018-03-26 Human-readable filesizes

A few days ago I needed to display a filesize (in one of the next blog posts I’ll show why I needed this!). The easiest way is to just display the integer – size in bytes – but of course this is not very human-friendly. The “right” way should use whichever units fit the number.

Therefore, we have two problems. The former one is what unit to choose. This is actually quite easy – my suggestion would be to use the largest unit that doesn’t make the size less than 1. For instance, if the file has a size of 1024×1024=1048576 bytes, you should use MiB, but for a size less than that, KiB (or just bytes) would be better. The latter is to decide on the exact precision we should use.

Let us solve the former one first. This question on StackOverflow contains a few nice ideas. As a mathematician, I decided to use a logarithm-based one.

(defconst iec-prefixes ["" "Ki" "Mi" "Gi"])

(defun human-readable-size (size)
  "Return SIZE as human-readable string, using IEC prefixes."
  (let* ((order (1- (max 1 (ceiling (log (max size 1) 1024)))))
	 (prefix (elt iec-prefixes (min order (length iec-prefixes))))
	 (size-in-unit (/ size (expt 1024.0 order)))
	 (precision
	  (max 3 (+ 2 (floor (log (max size-in-unit 1) 10)))))
	 (size-str
	  (format (format "%%.%dg%%sB" precision)
		  size-in-unit prefix)))
    size-str))

Notice a few oddities here. First of all, I use 1024 as the way of measuring the order of magnitude (obviously, since I want to use the IEC binary prefixes). A trickier part is the precision setting: the larger the size, the more precision I want, except that I never want it to drop below 3 significant figures. (What I do in the above code is basically calculate th number of SO so that I get a fixed number of digits after the decimal point. I could not use %f, however, since it leaves trailing zeros after the point, which I didn’t want.) The trickiest part is actually the three invocations of max. While I do not assume that I will need handling empty files a lot, I definitely do not want my function to crash in such a (possible) situation. Also, for size equal to 1, (log size 1024) is exactly 0, so we would get order equal to -1 – not what I want.

Finally, in order to put the computed precision into the format string, I use format inside format, which is kind of sweet.

As it turns out, the deceptively simple task of displaying a filesize in a “nice” way is quite complicated. I am sure my way is not the only possible one, but it looks satisfactory – I’ve been using it for some time now and it works well enough for me.

CategoryEnglish, CategoryBlog, CategoryEmacs