Lockup during merge
(1) By andygoth on 2020-08-18 02:05:13 [link] [source]
Today I attempted a large merge which included a particular file that caused the merge process to lock up.
The baseline version of the file is 64,064 lines long and 4,384,682 bytes in size.
Both the local and contributor versions of the file are 53,203 lines long and 2,968,292 bytes in size. Yes, they are identical to each other. No, the merge code doesn't optimize for that special case.
GNU diff comparing from baseline to local/contrib. takes 0.44 seconds to run and finds 82 hunks, 53,014 adds, and 63,874 deletes. That's right, the file is almost totally changed.
(2) By andygoth on 2020-08-19 03:59:12 in reply to 1 [link] [source]
Forget about merges. Let's just start with diff performance.
First, make the two test files:
yes a | head -n 64000 > baseline yes a | head -n 1000 > final yes a | head -n 1000 > final yes $'a\nb' | head -n 48000 >> final yes a | head -n 1000 >> final
How big are these files?
$ wc -l * 64000 baseline 50000 final 114000 total
Now compare them with GNU diff:
$ time diff baseline final > /dev/null real 0m0.018s user 0m0.015s sys 0m0.003s
What about Fossil diff? (Thanks you drh for telling me about test-diff.)
$ time fossil test-diff baseline final > /dev/null
Umm, I gave up after five minutes.
(3) By Stephan Beal (stephan) on 2020-08-19 04:09:46 in reply to 2 [link] [source]
$ wc -l ... 114000 total
The curious thing about that is that we get side-by-side diffs of sqlite3.c via the timeline all the time, and that file has 230k lines, so we know it can handle inputs of that size. Your test dataset is apparently a pathological case for fossil's diff algo.
(4) By andygoth on 2020-08-19 13:13:17 in reply to 2 [source]
Nine hours later, it still hasn't completed. I'm killing it.
(5) By andygoth on 2020-08-19 13:53:07 in reply to 3 [link] [source]
The algorithm is right, the implementation is the problem. See https://fossil-scm.org/fossil/info/e2b7dca948da84b7.
(6) By Stephan Beal (stephan) on 2020-08-19 18:09:52 in reply to 5 [link] [source]
The algorithm is right, the implementation is the problem.
Nice catch :). Respect!