2019-09-30 diff and ignoring lines

One of the most well-known commandline tools is the classical diff program. On my system, it is (of course) the GNU diff, which is a part of the GNU diffutils package.

Recently, I found out that GNU diff has an interesting option, -I (or --ignore-matching-lines). You can give it a regex and it will ignore added or deleted lines if they contain a match for this regex.

This may be useful in many circumstances. Consider, for instance, INI-style files, with sections and assignments, like this:

[section]
variable=setting
another=something

[default]
this=doesnt
make=sense

[third]
one=more

Assume that you have another one, with identical sections but different settings:

[section]
variable=value
another=whatever


[default]
this=does
make=install

[different]
something=else

Of course, in real life, the files could be quite long, and we would like to know if they follow the same structure – in other words, disrespect the settings (and blank lines) and only compare the section names. This is quite easy: diff -I '^$' -I '^[^[]' 1.ini 2.ini. (In fact, ignoring blank lines has its own shortcut, -B, so the first -I can be replaced by it.)

One caveat (which can be seen from the above example) is that diff only ignores the specified lines if the whole hunk consists of lines matching the regex. This may or may not be what you want, but remember that you can always pipe the result of diff through grep.

Consult the manpage of diff to learn more about its options. Various whitespace-ignoring possibilities may be of special interest.

As a side note, what I really miss is an AST-aware diff. Line-by-line comparing is nice, but often unsuitable for programs, which have an inherent tree structure.

CategoryEnglish, CategoryBlog