2019-03-11 Name-based UUID generation

Some time ago, I had a very specific need. I had some data which had to be anonymized before sending somewhere. For example, assume that you have a CSV file with people’s names in one column, and some (possibly sensitive) data in the rest of the columns. I’d like to change all the names to some (pseudo)random stuff, but with the caveat that if some name appears more than once, it should be changed to the same thing every time.

Basically, I wanted my data to be hashed. And since I’m a bit paranoiac, I wanted my hashing to be salted. And the need arose at around 15:50, and I wanted to do that before I left office that day, so I needed a quick solution.

Enter uuidgen. It turns out that it has exactly what I needed: the -N parameter (the string to be hashed), -n (the “namespace”, i.e., my salt), -s (since I wanted SHA1 and not MD5). The “namespace” is another, fixed UUID, which can also be generated by uuidgen, e.g., with the -r argument.

The last piece of the puzzle is, how do I perform this operation on each cell of a CSV in one of the columns. I’m sure there are better ways, but what I wanted a quick solution. Since my hammer of choice is Emacs, here is what I did.

First, I opened my CSV in LibreOffice (this takes care of all the quoting madness CSV supports), selected my column and copied it to the clipboard. Then, I pasted it into Emacs (in some temporary buffer). Then, I selected the first line and performed C-u M-| uuidgen --sha1 -n <my_salt> -N $(cat) (to replace it inline with the generated UUID). Rinse and repeat – actually, I recorded a keyboard macro, so that part was easy.

Job done. Had I had more time, I’d probably use something more elegant – instead of doing my transformation in Emacs, I’d probably used xargs (syntax of which I tend to always forget), and instead of LibreOffice, I could probably use xsv to extract the column (and then put it back again in my file).

CategoryEnglish, CategoryBlog