2018-05-20 Collaborating with non-Git-users - workflow and basic setup

This is the first post in a three-part series describing my setup designed to work flawlessly and almost automatically when collaborating with people not using Git. It describes the premise and the basic elements of the machinery I use.

I am now engaged in a project involving a few collaborators. It turned out that I am the only one who knows how to use a version control system and is willing to do so.

Reminder to myself: never, ever work with people who insist on sending intermediate files via email and not learning a VCS.

(It’s not that I insist on Git. The fact is – as we will see later – Git is sometimes horrible. But it has Magit, and after some time using it, coming back to command line feels sort of like going back to a cave dwelling.)

So, there we are, a bunch of people who write emails of the form “Hey, I added something to the file such-and-such, here’s the new version, what d’ya think?”

Of course, everyone of us works with different frequency (and has different sleep patterns – for instance, I started writing this blog post before 5:30, when most people are still sleeping). And while technically we could just number our versions, and lock files (like in RCS, only in a human version – “please do not touch this file, I’m working on it now”) – this just doesn’t make sense in the twenty-first century.

So, I have a Git repo on my computer (backed up regularly and pushed to a server just in case), and I commit each and every change I get over email.

That alone doesn’t solve the problem yet. Assume that I send out version 5 today at 6:00. Then I leave to other projects for the rest of the day. Meanwhile, person X makes some edits (call them version 6x). Tomorrow at 5:20, I sit down to do some more work, and commit my version 6. Person X does a similar thing later (v7x), and only then sends me an email with a new version of some file. (They didn’t send me anything earlier, since there’s no point in sending work in progress, right?) So, here is what we should have in our repo:

      person-x
      6x---7x
     /
    /
---5---6
   master

(I assume that I am on the master branch, since I am the great Git mastermind here;-).)

Now I know that I should commit v7x somewhere, but how do I know where exactly? From my point of view, the situation looks like this:

  person-x
     7x

---5---6
   master

In other words, I got v7x, and it’s “floating in the air” – remembering where was the point when I sent the email to the other people is troublesome, and attaching v7x in the wrong place might result in data loss! If I add something in v6, copy v7x’s files on top of v6 and then commit, then we will lose my changes. (This is not a problem when every commit is a small one. But in a culture of “let’s not send this WiP to everyone until I do a lot of work”, which is the opposite of Git good practices, carefully reviewing each and every line of a 100-line commit is no fun.)

I can see two possible ways of solving this problem (assuming that making people use Git is not a viable solution). One is to create branches corresponding to every person in the project and use them to track which version I send to whom. (The mess is made worse because of the fact that if I collaborate on some part – which would be a feature branch in a normal workflow – with person Y, I might not want to even bother person X with emails regarding parts they are not very interested in.)

This, however, is not optimal, since it requires manual work, and manual work leads to errors. Also, you have no guarantee that people will start editing with the latest version you send them – they may have already started with one of the previous ones (this actually happened to me the other day).

Happily, it turns out Git has you covered. Here is what I came up with.

First, I wanted each copy of each file to have some kind of stamp which would tell me which version it originated from.

Initially, I wanted Git to perform some trick so that the current commit SHA could be itself part of the commit (I think SVN does something similar with revision numbers). This is of course impossile (or at least extremely hard – and I mean getting-the-Turing-award-and-breaking-Git-for-everyone-level hard), since it would entail knowing the SHA beforehand, and making a commit containing this very SHA somehow hash to it. (Similar things have been accomplished, although they are way easier, since zipping was never meant to be a cryptographically secure hash function.)

So, let’s forget about the fully automated way and do something less convenient, but possible.

Git has something called attributes. They can do quite a few things (filters are especially interesting, although, as mentioned above, they couldn’t solve my problem). There is, however, an attribute called export-subst. It works when using git archive, a Git command used seldom enough that there is no corresponding Magit command (!). With it, you can put somehwhere a comment line containing the string $Format:...$, and use git-log placeholders in the ellipsis part.

So, I decided to put this at the beginning of every file in the repo:

%
% Please never remove or alter the following line!
% $Format:This file is based on commit %h, authored by %an on %ai.$

and then edited .git/info/attributes to contain the line

*.tex export-subst

Of course, I now have to remember to use git archive every time I send files to other people in the project. This is not a problem, however, since it is no worse than manually zipping them.

In the next part, I will describe my Emacs setup which helps me with this, as well as committing stuff on people’s behalf.

CategoryEnglish, CategoryBlog, CategoryGit