Bug report: sub-optimal diff alignment
(1) By Richard Hipp (drh) on 2022-01-20 12:35:14 [source]
A recent check-in on SQLite shows spectacularly bad diff alignment:
The change involves indenting a large block of code by 4 spaces and making a few small changes in the middle. The diff logic fails utterly to capture the simplicity of the change.
In fairness, GitHub doesn't do any better: https://github.com/sqlite/sqlite/commit/d6b44ec32a5709cb56cc6f9d9a80c1a336f1a53c?diff=split
(2.1) By ddevienne on 2022-01-20 13:18:06 edited from 2.0 in reply to 1 [link] [source]
Just toggling ignore whitespace seems to make things OK, no?
(3) By Martijn Coppoolse (vor0nwe) on 2022-01-20 12:54:50 in reply to 1 [link] [source]
It appears to work fine when ignoring whitespace.
Should the diff logic do a first pass ignoring whitespace, to identify significant change blocks?
(4) By Florian Balmer (florian.balmer) on 2022-01-20 13:15:00 in reply to 1 [link] [source]
Given that the effective changes are darkgreen/bold, it looks fine?
(5) By Martin Gagnon (mgagnon) on 2022-01-20 13:26:26 in reply to 4 [link] [source]
The problem is that the diff get miss-aligned at some point.
Like with “ignore whitespace” we see diff on 6 lines total. We see a lot of change other than whitespace change when “ignore whitespace” is off.
(6) By Richard Hipp (drh) on 2022-01-20 13:30:26 in reply to 3 [link] [source]
Yes, your point (and Florian's and ddevienne's) is well taken: Perhaps the alignment algorithm could first do an ignore-whitespace pass to help it figure out where the important changes are, then go back in and fill in the indentation changes.
(7) By Florian Balmer (florian.balmer) on 2022-01-21 05:22:53 in reply to 5 [link] [source]
Ah, now I see, thanks!
I've been using the JS Diff Match and Patch libraries for years. When pasting the contents of the two files modified by the referenced SQLite check-in to the Demo of Diff, the results seem to be very good, and the diff is reduced to the effective changes.
(The output on the demo page is a bit misleading, as it looks like it's ignoring
whitespace, but it's just invisible because the output element lacks
style="white-space:pre;" -- so the results really include whitespace, and
look very similar to what Fossil produces when ignoring whitespace.)
This is all way over my head, but maybe the documentation and implementation of the JS Diff algorithm has valuable inspiration.
(8) By Richard Hipp (drh) on 2022-01-23 21:37:12 in reply to 1 [link] [source]
(9) By Florian Balmer (florian.balmer) on 2022-01-24 06:18:56 in reply to 8 [link] [source]
Impressive, both in terms of the result and the time required!