Recently, I had a very specific need. I wanted to move a directory to another Git repo, but I really needed to preserve its history.
There is a quite well-known instance of a similar thing – the famous coolest merge ever is basically importing one project into another, preserving its history. My use-case, however, was a bit more difficult because I wanted to simultaneously move things to a directory with another name. (I could, of course, start with making a temporary clone – or a branch – in the “source” repo, delete everything I do not want to merge in it, change the directory structure to reflect what I want in the “destination” repo and commit all these changes. I wanted to avoid that, though. One of the reasons is that file renaming, while supported by Git, introduces unnecessary complications when analysing history.)
So, let’s get started. Assume that we have two Git repos, source
and dest
. In the source
repo we have (among other things) a subdirectory called directory
, and we want to move all files from it to a directory called folder
in dest
.
To begin with, let us create the repos. (On my machine, /mem
is a scratch directory, much like /tmp
, with the advantage that it only contains stuff I put here instead of a whole lot of things some random programs decide to save in /tmp
. Also, as the name suggests, it resides in a ramdisk, so nothing there sticks for too long.)
cd /mem rm -rf source dest git init source git init dest cd source mkdir directory echo "some file in the root dir" > some-file.txt echo "another file in the directory dir" > directory/another-file.txt git add . git status git commit -m "Initial commit" echo "added line" >> directory/another-file.txt git commit -am "Add a line" echo "An unrelated commit" >> some-file.txt git commit -am "An unrelated commit" echo "A commit spanning everything" >> some-file.txt echo "A commit spanning everything" >> directory/another-file.txt git commit -am "Make huge changes" cd ../dest echo "The destination repo" > README.txt git add README.txt git commit -m "Add README.txt"
We now have two simple Git repos to experiment. (Note that because of rm -rf
, the snippet above will recreate them from scratch every time it is run, which is quite convenient for experimentation.)
If we were happy to just merge everything from source
to dest
, things would be very easy:
git remote add source ../source git fetch source git merge source/master --no-edit --allow-unrelated-histories
Note the options for git merge
. The man page says explicitly that usually you do not want --no-edit
, but since I want a smooth presentation of the main ideas here and not manually crafted merge commit messages, this is exactly what I need. The option --allow-unrelated-histories
(which was the default in older versions of Git) is pretty self-explanatory.
This approach works well if we want just to merge two repositories (solving conflicts should they arise, but this is another story), but it is not what we want here. The first problem is that it imports also the some-file.txt
, and we only want directory
and its contents. (For bonus points, notice how the second commit in source
touches both a file in directory
and a file outside of it – we would like to perform some surgery on this commit to preserve only the modification to another-file.txt
.)
Well, this is Git, so all this is perfectly doable. There is even a dedicated Git command solving a very similar issue, called git-subtree
. We will not resort to it, however (I will probably write another post on git-subtree
one day), using lower-level git-filter-branch
, git-read-tree
and a few other commands instead.
In fact, a ready solution for the hard part can be easily found on the Internet. What I aim to do here is to (try to) explain the meaning of the commands involved. Note that, like in my previous post, I found this out by careful experiments and studying the manual, not by reading the Git sources, so there may be mistakes. Please point them out in the comments should you spot any.
So, let’s get down to business. First, we will excise everything but directory
from the source
repo (this is actually the easy part):
cd /mem rm -rf source-tmp git clone source source-tmp cd source-tmp git filter-branch --prune-empty --subdirectory-filter directory -- --all
Note how the contents of directory
have just migrated to the root directory of our repo. Also, inspect the history and note how the “Make huge changes” commit now only touches the another-file.txt
(which is logical, since it has nothing else to touch now, but still nice).
Interestingly, we have given the --all
parameter to make Git operate on all the references (not “all the commits”!), local and remote. Without it, the history on branches other than “master” would be unaffected, thus leaving a terrible mess. Also, a good thing to know is that git-filter-branch
will create a directory called .git/refs/original
, where it stores all references it has changed. This means that the whole operation is easily undoable – just move .git/refs/original/refs
to .git/refs
, overwriting everything in the process, and you are done. (In particular, we did not really have to create source-tmp
– but throwing it away is easier than manipulating stuff within .git
.) You may read more about --all
in e.g. the manpage of the plumbing command git-rev-list
.
Another thing worth mentioning is the --prune-empty
switch. Here, things get a bit hazy for me. The manual says that its aim is to remove empty commits (apart from merge commits), but a quick experiment showed that the commands seems to work the same way without it. (I asked about it on StackOverflow and learned that indeed, --prune-empty
is superfluous in this case.)
Now we need to import (merge) our temporary repo into dest
.
cd /mem/dest git remote add -f source ../source-tmp git merge -s ours --no-commit --allow-unrelated-histories source/master
Now this is where the fun starts. First we add our temporary repo as a remote (and immediately fetch from it, using the -f
option) – this is simple. Then, we prepare the merge. First of all, we supply the ours
merge strategy. (Note: this must not be confused with the ours
option for the recursive
merge strategy. See the manpage of git-merge
for more information.) This means that the “merge” will actually completely disregard the tree in the merged-in heads. In other words, after an ours
merge, while the history will look as if two (or more) branches has been merged, all the changes from the merged-in branches will be completely lost. (This may be actually useful in rare cases, I guess.)
The next thing is the --no-commit
option. It seems obvious, but actually it is not so in this case. I mean, with a “normal” merge, this just leaves the last step (the actual commit) to the user (much like in the case of conflicts). However, you might wonder what this does in the case of the our
stategy we have used. Turns out, the only thing our merge
command does is update a few files in the .git
directory: ORIG_HEAD
(the reference to the head before the merge started – this reference is actually written by more operations in Git so that undoing is easier), MERGE_MSG
(pretty obvious), MERGE_MODE
(no-ff
in our case, which is not surprising) – this seems a bit, erm, underdocumented, but I found some information here, and – most importantly – MERGE_HEAD
, which contains the reference to the branch we are merging in (source/master
). (In the case of an octopus merge, this file contains more references, of course.)
The --allow-unrelated-histories
option we already mentioned, and there is not much to explain here.
So, if we commit now, the commit would be “empty” (i.e., it would not introduce any changes), but the history would show that we have merged the source/master
branch (and we would have two roots now). What we need to do is to put the contents of source/master
(i.e., the current state of the directory
in the source
repo) into the folder
directory. This is the easier part and can be done with yet another plumbing Git command, read-tree
.
cd /mem/dest mkdir folder git read-tree --prefix=folder/ -u source/master
Now we could just copy the files from source-tmp
and stage them instead. (This is a bit hazy again. I performed an experiment to check if copying and staging from source-tmp
would lead to the same result. It did, in the sense that in both cases the .git
directory contained almost exactly the same stuff (in particular, the objects were bit by bit the same). The “almost” part was the index file (i.e., the “staging area”). While git ls-files --stage
also showed the same output, there were binary differences in .git/index
. If some brave soul wants to perform a similar experiment and dig even deeper into this, here is the official description of the index file format. Also, to make sure the objects in both cases are the same so that the comparison is fair, you have to make sure that the timestamps of all commits are the same in both cases. One way to ensure that is to set the environment variables GIT_AUTHOR_DATE
and GIT_COMMITTER_DATE
(see e.g. here for some explanation) or use the datefudge utility with the -s option, which I did.)
However, git-read-tree
does this in one step instead of two: it puts everything from the source/master
commit into the index (aka staging area), which is its basic aim, but the -u
option makes it also update the working directory. (Again, the man page is not very precise here – it says what the -u
switch does after a successful merge. In our case, we do not request a merge, but OTOH we will not have any conflicts, since we assume the folder
is empty. I made a few experiments, and it seems that -u
is only meaningful with -m
, --prefix
or --reset
. That kind of makes sense, although is not said explicitly in the manual.)
Since we now have everything we want in the index, the only thing that’s left is to commit the changes and delete the temporary repo:
git commit -m "Merge source into dest under folder" git remote rm source rm -rf ../source-tmp
And we are done!
Of course, again, this is Git, so this is definitely not the whole story. We could – instead of deleting source-tmp
– pull further changes (under the assumption that source
is being worked upon, we could repeat the filter-branch
stuff in the future and pull the resulting changes into desc
. In case you are afraid that this will mess up the history: no, it won’t, git-filter-branch
generated perfectly deterministic commit hashes every time you repeat it (which is not surprising, taking into account what exactly goes into a commit hash). Also, Git has the very useful in this case subtree
merge strategy (which I admit I haven’t experimented with) which apparently does not even require you to specify the folder
explicitly. Also, there is the git-subtree
command I mentioned. In any case, the above was enough for me, so I decided to share it.