Sometimes I want to dig deep in Git repo history and find some information about the files that are no longer there – either they were deleted or renamed. This is easy if I know the name of the file in question, but what if not? I tried to find some information on the Internet about this, and it turned out to be surprisingly difficult. For instance, this StackOverflow answer gives a nice way to obtain the list of all files that were ever added to the repo – but it is possible that a file is (or was) present in the repo but was never added to it (how? it’s actually pretty easy, just add a file to the repo, then rename it and commit the rename!). Of course, you can use --diff-filter=AR
(or even --diff-filter=D
if you really only want to see the deleted files), but the whole thing, with the empty --pretty=format:
and sort -u
seems a kludge.
I found a nice Python script here – it looked great, but didn’t work. Turns out that it is not Python3-compatible. (The Python2/3 issue is IMHO in the top ten of the most moronic things in programming ever, right next to stuff like null-terminated strings and, yes, left-pad.) I have next to zero knowledge of Python, but I did write a couple of lines in it many years ago, and with the help of the interwebs I was apparently able to fix the issues of that script. If you install the pygit2
library and put the script somewhere in your $PATH
, you can just say git ls-all-files
(because of how Git implements its commands) and see all files that were present in the repo. Hopefully, it won’t break anything (doesn’t look like it could, but this is IT, so you never know…). If it breaks in some scenario, let me know, I’ll see if I can cargo-cult-code a fix.