Fossil Forum

Lockup during merge
Login

Lockup during merge

Lockup during merge

(1) By andygoth on 2020-08-18 02:05:13 [link] [source]

Today I attempted a large merge which included a particular file that caused the merge process to lock up.

The baseline version of the file is 64,064 lines long and 4,384,682 bytes in size.

Both the local and contributor versions of the file are 53,203 lines long and 2,968,292 bytes in size. Yes, they are identical to each other. No, the merge code doesn't optimize for that special case.

GNU diff comparing from baseline to local/contrib. takes 0.44 seconds to run and finds 82 hunks, 53,014 adds, and 63,874 deletes. That's right, the file is almost totally changed.

(2) By andygoth on 2020-08-19 03:59:12 in reply to 1 [link] [source]

Forget about merges. Let's just start with diff performance.

First, make the two test files:

yes a | head -n 64000 > baseline
yes a | head -n 1000 > final
yes a | head -n 1000 > final
yes $'a\nb' | head -n 48000 >> final
yes a | head -n 1000 >> final

How big are these files?

$ wc -l *
 64000 baseline
 50000 final
114000 total

Now compare them with GNU diff:

$ time diff baseline final > /dev/null
real 0m0.018s
user 0m0.015s
sys  0m0.003s

What about Fossil diff? (Thanks you drh for telling me about test-diff.)

$ time fossil test-diff baseline final > /dev/null

Umm, I gave up after five minutes.

(3) By Stephan Beal (stephan) on 2020-08-19 04:09:46 in reply to 2 [link] [source]

$ wc -l ... 114000 total

The curious thing about that is that we get side-by-side diffs of sqlite3.c via the timeline all the time, and that file has 230k lines, so we know it can handle inputs of that size. Your test dataset is apparently a pathological case for fossil's diff algo.

(4) By andygoth on 2020-08-19 13:13:17 in reply to 2 [source]

Nine hours later, it still hasn't completed. I'm killing it.

(5) By andygoth on 2020-08-19 13:53:07 in reply to 3 [link] [source]

The algorithm is right, the implementation is the problem. See https://fossil-scm.org/fossil/info/e2b7dca948da84b7.

(6) By Stephan Beal (stephan) on 2020-08-19 18:09:52 in reply to 5 [link] [source]

The algorithm is right, the implementation is the problem.

Nice catch :). Respect!