Marcin Borkowski: 2014-10-18 Version Control Systems

It is very often the case that one works on a certain document (it might be a scientific paper, it might be a book, it might be a newspaper article, it might be a master’s thesis – whatever) and from time to time one wants to revert some changes („this wasn’t that brilliant idea I thought it was”), or highlight the changes made since the last reading by the advisor, or be able to check which parts of the file were last edited on what date, or any similar thing. In general, knowing what part of the file changed from what and when is occasionally quite useful.

Of course, you can save subsequent versions of your paper under filenames like paper-v1.tex, paper-v2.tex, …, paper-v23.tex and so on (actually, a better idea is to use paper-v01.tex – the padding zeros make lexicographic sorting yield better results). Or better, paper-2014-10-18.tex, using the ISO 8601 standard. But this method is error-prone and cumbersome. And usually, if there is something which is error-prone because it needs a lot of bookkeeping but no creative thinking, it is better to leave that task to a machine.

What is needed in such situations is called a version control system, or VCS. There are plenty of such systems out there. Personally, I used to use Mercurial, but after a few years it became clear that Git is a better choice, and so I made a switch. (Mercurial is easier to use for beginners. Git is more powerful. Don’t care about other ones unless you really have to. For instance, I heard rumors of Subversion (SVN) being a nasty beast, though, so you might want to avoid that one – in general, a so-called distributed VCS seems a more versatile tool anyway.)

What are the advantages of using such systems?

Since VCSs were devised by programmers for programmers, they are very good at dealing with text files. But TeX files are text files, so they work quite nicely with TeX.
Many such systems are free software, so you can download them from the Internet and use for no charge.
Since there are some nice tutorials present on the web, you can learn to use the basic functions of Mercurial in less than half an hour, and maybe a bit longer for Git. And yes, it is easy (at least for typical tasks).
Of course, you can retrieve any version of the file(s) you are working on (and do much more).
Since you can configure a VCS to synchronize with a repository on a remote server, you get (remote) backups for free (well, almost – you usually have to issue the command to sync with the remote repo manually, and if you know how to automate this task, you most probably don’t have to read this text anyway).
In case you are unsure in what direction some part of your book/paper/whatever is going to aim, you can create /branches/. In other words, you don’t have to follow a linear sequence of changes – you can start with (say) version A, change some parts obtaining version B, then come back to A and change some other (or the same) parts obtaining version C. Later you might decide that the B variant was better and forget about C, or combine B and C into their conglomerate D. (If the changes you made in B and C where in different part(s) of the file(s), this combining – called merging – is fully automatic!)
In particular, it is possible for more than one person to collaborate on some project. You can easily configure your setup so that if everybody starts their work by pulling the current state from a central repository, and finish it by pushing their changes there, every project member at every moment works on the newest version of each file. What’s more, if there is an editing conflict (two or more people are changing the same file at the same moment), it is easy to merge their changes later into one version (of course, if they changed the same part of some file, merging needs manual intervention to tell the computer which version to keep and which to abandon). (In fact, there are different styles for working on collaborative projects.)
The most popular VCSs can be used from a command line or using various front-ends. (One of the best front-ends for Git is Magit, written on top of Emacs.)

What are the disadvantages? Well, I can think of one: there is some overhead connected with „committing” each change (that is, telling the VCS „hey, this is the next version, please remember it!”). Of course, you can commit only after some major rewrites, but it is usually better to do it more often.

To sum it up: if you work alone, on small projects (say, less than 2 pages of text each), and do backups regularly, you might not need a VCS (but it might come handy anyway). In all other cases, you probably do need it.

CategoryEnglish, CategoryBlog, CategoryTeX, CategoryEmacs