More "infinite loop in DELTA table" "Aborted"
(1.1) By John Rouillard (rouilj) on 2021-09-21 17:02:39 edited from 1.0 [source]
I have a fossil server running:
This is fossil version 2.17 [5d9a7442fa] 2021-09-18 21:25:37 UTC
It hosts a fossil repo cloned from https://www.fossil-scm.org/home.
I cloned that repo to raspberry pi using a tarball from the tip of the trunk. The version is:
This is fossil version 2.17 [898b8f2082] 2021-09-21 09:47:24 UTC
I can use 898b8f2082
to clone https://www.fossil-scm.org/home without an
issue. I can open it and get the expected files.
If I clone from my copy hosted using 5d9a7442fa
, it seems to clone ok but trying to open it using:
fossil open ~/.museum/fossil1.fossil
Autosync: https://host.name/fossil/fossil/home
Round-trips: 1 Artifacts sent: 0 received: 174
infinite loop in DELTA table
Aborted
I have rebuilt the fossil1 repo client side without any change. I have rebuilt the fossil repo on my server, deleted fossil1.fossil and recloned. Still fails to open.
Running fossil dbstat --db_verify -R ...
on the repo cloned from
fossil-scm.org shows no errors. Running it on the repo on my server reports no issues other than phantoms: 53610 non-phantom blobs (out of 53691 total) checked: 0 errors
.
However on the repo cloned from my server I see:
Full repository verification follows:
wrong hash on artifact 2
wrong hash on artifact 3
wrong hash on artifact 4
wrong hash on artifact 5
wrong hash on artifact 6
wrong hash on artifact 7
...
wrong hash on artifact 94
skip phantom 95 06decb89c665feb897540613d7d12527eb65dfe5
wrong hash on artifact 96
...
It looks like every artifact is reported as having the wrong hash.
Any idea on how I can fix this?
Thanks. -- rouilj
(2) By Warren Young (wyoung) on 2021-09-21 17:21:04 in reply to 1.1 [link] [source]
What does a fresh clone-and-open operation do on each system?
$ cd /tmp
$ fossil clone https://fossil-scm.org/home
(3) By John Rouillard (rouilj) on 2021-09-21 21:41:40 in reply to 2 [link] [source]
Hi Warren:
Is this what you wanted?
time fossil clone https://fossil-scm.org/home
Round-trips: 8 Artifacts sent: 0 received: 53794
Clone done, wire bytes sent: 2129 received: 38011454 ip: 45.33.6.223
Rebuilding repository meta-data...
100.1% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: b5c9efc01b735e5f0bd3a67807ea7701a86b908c
admin-user: pi (password is "H5RiQFP7T9")
opening the new ./home.fossil repository in directory ./home...
Autosync: https://fossil-scm.org/home
Round-trips: 1 Artifacts sent: 0 received: 0
Pull done, wire bytes sent: 3294 received: 5074 ip: 45.33.6.223
.dockerignore
.editorconfig
.fossil-settings/binary-glob
.fossil-settings/clean-glob
.fossil-settings/crlf-glob
...
www/whyusefossil.wiki
www/wikitheory.wiki
www/xkcd-git.gif
project-name: Fossil
repository: /tmp/f/home.fossil
local-root: /tmp/f/home/
config-db: /home/pi/.config/fossil.db
project-code: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
checkout: 60206ef512f1ef5a2f691298abb0fb1c00f242d6 2021-09-21 20:07:13 UTC
parent: 3524f72e6b3b44884d08449322abd812169c0983 2021-09-21 19:45:58 UTC
tags: trunk
comment: Internal cleanups of how /chat config area is built up. No
significant visible changes. (user: stephan)
check-ins: 16230
real 22m9.596s
user 16m2.090s
sys 3m55.740s
Clone from fossil-scm is fine. But...
$ time fossil clone https://host.name/fossil/fossil/home
Round-trips: 9 Artifacts sent: 0 received: 8360
Clone done, wire bytes sent: 3361 received: 59930584 ip: 172.25.1.10
Rebuilding repository meta-data...
100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: 1aea26f7abd98ef4006b5c102adb1b89043c6fd0
admin-user: pi (password is "pGd5tHYLr6")
opening the new ./home.fossil repository in directory ./home...
Autosync: https://host.name/fossil/fossil/home
Round-trips: 1 Artifacts sent: 0 received: 174
infinite loop in DELTA table
Aborted
real 3m39.611s
user 2m31.561s
sys 0m31.556s
(5) By Stephan Beal (stephan) on 2021-09-22 02:58:41 in reply to 3 [link] [source]
time fossil clone https://fossil-scm.org/home
real 22m9.596s
...
$ time fossil clone https://host.name/fossil/fossil/home
real 3m39.611s
Why the huge time difference there? A (clone + rebuild) time of less than 4 minutes on such a device seems suspiciously fast.
One of my pi zeroes takes just over 11 minutes to clone and rebuild on an external SSD drive over USB2, and a rebuild alone takes almost 8.5 minutes. My pi4 on a USB3 SSD takes a bit more than 2 minutes for a rebuild.
(4.1) By Stephan Beal (stephan) on 2021-09-22 02:00:27 edited from 4.0 in reply to 1.1 [link] [source]
I have rebuilt the fossil1 repo client side without any change. I have rebuilt the fossil repo on my server, deleted fossil1.fossil and recloned. Still fails to open.
Out of curiosity, did you do a full clean rebuild or a fossil update
then run make? A small handful of times (not recently) i've sen the latter approach lead to mismatched object files, resulting in binary incompabilities which lead to weird errors, e.g. running command X actually runs command Y. If you have not done a clean rebuild, please try that.
What model of pi are you on? i've got a couple of pi4's and several pi Zeroes here on which i regularly build fossil. i'll try out 5d9a7442fa on one of the zeroes in a bit (but those takes aaaaagggggges to build fossil).
Please also try:
fossil test-integrity -R the-broken-clone
This is a huge mystery at this point and i'm just shooting in the dark. The infinite delta loop case is one of those which falls into the category of "cannot happen," so my very vague, largely unwarranted suspicion, is that your SD card (it is an SD, i presume?) is at the core of the problem. If your pi has enough memory, it "would be interesting" to repeat the clone into a ramdisk or /tmp, rather than on the SD, and see what happens. Edit: Warren already ruled that out for you.
(6) By John Rouillard (rouilj) on 2021-09-22 03:48:53 in reply to 4.1 [link] [source]
I was bootstrapping on the pi 2B+ since the precompiled copy on the download page doesn't work.
So I downloaded the tip tarball, unpack, configure, make/compile. I'll make clean and reconfig/remake.
The server fossil was an update/compile. I'll try a clean rebuild there too
as cloning to a windows laptop using This is fossil version 2.17 [701c6dc284] 2021-09-08 17:23:37 UTC
fails the same way. db-verify also crashes, so it
points to a server issue. I can however fossil pull an already cloned fossil
repo, get updates and dbstat --db-verify reports no errors.
So it just seems to be a new clone that is the problem.
My rebuild on the server just finished and I spun it up. Cloning on the windows box is now working. --dbverify completed without reporting an error. However it threw a segfault:
53334 non-phantom blobs (out of 53415 total) checked: 0 errors
low-level database integrity-check: ok
Exception: STATUS_ACCESS_VIOLATION at rip=00100519629
rax=0000000000000000 rbx=00000000FFFFFFFF rcx=0000000800065910
rdx=00000000FFFFCAB8 rsi=00000000FFFFCAB8 rdi=00000000FFFFCAB4
r8 =00000000FFFFCAB4 r9 =0000000000000011 r10=0000000100000000
r11=0000000800067400 r12=0000000000000011 r13=00000000FFFFCAD0
r14=0000000100762EDD r15=0000000000000000
rbp=00000000FFFFCAB4 rsp=00000000FFFFC9E0
program=C:\cygwin64\usr\local\bin\fossil.exe, pid 1137, thread main
cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B
Stack trace:
Frame Function Args
000FFFFCAB4 00100519629 (00100762EDD, 0010084FBE0, 001008500C8, 001FFFFFFFF)
000FFFFCAB4 00100645EA1 (00000000000, 00000000001, 001008500C8, 000FFFFCCE0)
000FFFFCCE0 001006462A9 (00000000000, 00000000000, 00000003000, 00000000001)
000FFFFCCE0 001005B0CDB (00000000000, 000FFFFCCE0, 000FFFFCC70, 000FFFFCDF0)
000FFFFCCE0 001006723BE (00180363408, 000FFFFCDF0, 00000000030, 8080808080808080)
000FFFFCCE0 00180049EFD (00000000000, 00000000000, 00000000000, 00000000000)
000FFFFFFF0 00180047856 (00000000000, 00000000000, 00000000000, 00000000000)
000FFFFFFF0 00180047904 (00000000000, 00000000000, 00000000000, 00000000000)
End of stack trace
Segmentation fault (core dumped)
I'll try doing a new clean build on the windows box.... sigh.
Thanks for pointing to the build. Bummer that make isn't compiling things right.
(7) By Stephan Beal (stephan) on 2021-09-22 04:08:55 in reply to 6 [link] [source]
Thanks for pointing to the build. Bummer that make isn't compiling things right.
i don't know that that's the problem. It's been ages, perhaps years, since i've seen problems related to that in this tree, but have seen them before and they can lead to truly odd results. When in doubt, "make clean; make" is a safe guideline to follow when an oft-rebuilt binary seems to be doing completely inexplicable things.
As far as the cygwin stuff goes, i'm at a loss - that's far outside of my element.
(8) By John Rouillard (rouilj) on 2021-09-22 04:29:53 in reply to 6 [link] [source]
Ok still waiting for fossil to rebuild on the pi, but both the windows system and linux boxes are running 4888719588 and getting:
rouilj@uland:/tmp/z$ fossil clone https://host.name/fossil/fossil
Round-trips: 9 Artifacts sent: 0 received: 8360
Clone done, wire bytes sent: 3318 received: 59930575 ip: 172.25.1.10
Rebuilding repository meta-data...
100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: 8eee686244d4056ab8d4d2cbfd051c87994c8b48
admin-user: rouilj (password is "S6PhzXRkoB")
opening the new ./fossil.fossil repository in directory ./fossil...
Autosync: https://host.name/fossil/fossil
Round-trips: 1 Artifacts sent: 0 received: 174
infinite loop in DELTA table
Aborted (core dumped)
SQLITE_NOTICE(539): recovered 192 pages from /tmp/z/fossil.fossil-journal
SQL: SELECT value FROM config WHERE name=$n
this is the same for both windows and linux (with different passwords and server-id obviously). I am no longer getting a segfault on windows, so that's something.
If you want to try cloning from my server let me know and we can figure out a way to get the url to you.
(9) By Stephan Beal (stephan) on 2021-09-22 04:35:49 in reply to 8 [link] [source]
If you want to try cloning from my server let me know and we can figure out a way to get the url to you.
That would be interesting, yes: sgbeal@googlemail.com
(12) By Stephan Beal (stephan) on 2021-09-22 05:23:16 in reply to 8 [link] [source]
Okay, this is weird:
Identical results on a pi4, pi0, and an x86/64 linux box except that the timings varied wildly:
Round-trips: 9 Artifacts sent: 0 received: 8360
Clone done, wire bytes sent: 3336 received: 59930575 ip: <SNIP>
Rebuilding repository meta-data...
100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: b3aeefd865258b242f4cec45eb28ac8443837cac
admin-user: pi (password is "yG3QLzrtnP")
opening the new ./fossil.fossil repository in directory ./fossil...
Autosync: <SNIP>
Round-trips: 1 Artifacts sent: 0 received: 174
infinite loop in DELTA table
Aborted
...
(The final 20-ish artifacts took 95%+ of the time, which is curious in and of itself.)
But... that "artifacts received" value is way too low. There's "something horribly wrong" with that copy. If it's being hosted on an SD/microSD card then i'm 100% with Warren: "Pi corrupts data. News at 11."
(Sidebar: i build and use fossil regularly on pi units, and one of them runs my nightly fossil remote repo backups, but use it almost exclusively from external USB drives, not an SD.)
The current artifact value from the main repo is:
Round-trips: 8 Artifacts sent: 0 received: 53796
(13) By John Rouillard (rouilj) on 2021-09-22 13:23:57 in reply to 12 [link] [source]
Hi Stephen:
The server is a spinning rust disk on an x86 box (Warren note). Only the client is a pi. Maybe the fossil repo on the server is corrupted in some odd way that db-verify can't find?
I can clone successfully another repo on that server (Stephen try roundup_sysadmin as the last element of the path rather than fossil.)
So this is what I have currently (all https use reverse proxy): ``` server linux spinning rust x86 - using version 4888719588 (server and client) clone fossil (https) fails with DELTA error clone fossil (http) fails with DELTA error no proxy (new info) clone fossil using file:/// works (new info) clone other repo works
client windows ssd - using 4888719588 clone fossil (https) fails with DELTA error clone (https) other repo works
client pi 2b+, raspian (flashed Tuesday) with 32 GB sd card - using 898b8f2082 clone fossil (https) fails with DELTA error clone (https) other repo works ```
so looks like a fossil repo issue on the server. However, running:
fossil dbstat --db-verify -R ~/.museum/fossil.fossil
reports no errors. On the server running: fossil clone file:///home/rouilj/.museum/fossil.fossil
works. So something in the sync protocol??
I run the fossil server behind a reverse proxy (hiawatha web server) this has worked for literally years without issues.
Hmm this is interesting. Hiawatha has mitigation techniques for common issues: SQL injection, XSS, CSRF, etc. Looks like the SQL injection detection is being triggered when cloning the fossil repo for some reason. But that shouldn't cause a problem as it's not mitigating it, just detecting. Let me try cloning from the fossil server itself without hiawatha in the mix.
Still fails:
rouilj@uland:/tmp/z$ fossil clone http://localhost:8082/fossil Round-trips: 9 Artifacts sent: 0 received: 8360
Clone done, wire bytes sent: 3179 received: 59930746 ip: 127.0.0.1
Rebuilding repository meta-data...
100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: beea7b62ec8c0d7372153da892d2c64d748b33ff
admin-user: rouilj (password is "F8kHJLBx3Y")
opening the new ./fossil.fossil repository in directory ./fossil...
Autosync: http://localhost:8082/fossil
Round-trips: 1 Artifacts sent: 0 received: 174
infinite loop in DELTA table
Aborted (core dumped)
SQLITE_NOTICE(539): recovered 192 pages from /tmp/z/fossil.fossil-journal
SQL: SELECT value FROM config WHERE name=$n
Suggestions? I can send the fossil repo for dissection/necropsy if that would help.
(14) By Stephan Beal (stephan) on 2021-09-22 13:40:35 in reply to 13 [link] [source]
try roundup_sysadmin as the last element of the path rather than fossil.
That works fine for me.
i just tried re-cloning the failed fossil clone i pulled from you:
[pi@pi4b8:~/fossil/jpr]$ f server fossil.fossil --localauth
Listening for HTTP requests on TCP port 8080
...
[pi@pi4b8:~/tmp]$ f clone http://localhost:8080 x.f
Round-trips: 2 Artifacts sent: 0 received: 8336
Clone done, wire bytes sent: 520 received: 5003770 ip: 127.0.0.1
Rebuilding repository meta-data...
100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id: 9f4cdb1ae72f79f524e272fd494d5c7c153a3d18
admin-user: pi (password is "xGCfYcFAVe")
That works but there's no way it's a complete repo: it's missing more than 40k artifacts.
Trying to open that repo results in it opening only the initial empty checkin:
[pi@pi4b8:~/tmp]$ mkdir x
[pi@pi4b8:~/tmp]$ cd x
[pi@pi4b8:~/tmp/x]$ f open ../x.f
Autosync: http://localhost:8080
Round-trips: 3 Artifacts sent: 0 received: 1
Pull done, wire bytes sent: 14005 received: 8635 ip: 127.0.0.1
project-name: Fossil
repository: /home/pi/tmp/x.f
local-root: /home/pi/tmp/x/
config-db: /home/pi/.config/fossil.db
project-code: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
checkout: a28c83647dfa805f05f3204a7e146eb1f0d90505 2007-07-21 14:09:59 UTC
tags: trunk
comment: initial empty baseline (user: drh)
check-ins: 1
What does this query say on the server:
fossil sql "select count(*) from blob" -R the-repo.fossil
If that number is only 8000-odd then that copy is broken. If it's 50k-odd then you may well have uncovered a bug in the sync protocol.
What happens if you try to fossil pull
on your copy of the ostensibly broken repo, pulling from the original one at fossil-scm.org?
Just spitballing here - i have no clue what the problem it.
(16) By John Rouillard (rouilj) on 2021-09-22 13:56:28 in reply to 14 [link] [source]
i just tried re-cloning the failed fossil clone i pulled from you:
Ugh. I renamed the fossil repo to run Warren's testcase. I have put the fossil repo back. Try recloning again using .../fossil/fossil. It should fail/be incomplete.
fossil sql "select count(*) from blob" -R fossil.fossil
53771
so right size....
When I open the repo: ``` fossil open ~/.museum/fossil.fossil .dockerignore .editorconfig [file list elided] www/whyusefossil.wiki www/wikitheory.wiki www/xkcd-git.gif project-name: Fossil repository: /home/rouilj/.museum/fossil.fossil local-root: /tmp/zz/ config-db: /home/rouilj/.fossil project-code: CE59BB9F186226D80E49D1FA2DB29F935CCA0333 checkout: 48887195881200aea08df9050d467c3f9498cfc9 2021-09-22 03:04:05 UTC parent: 60206ef512f1ef5a2f691298abb0fb1c00f242d6 2021-09-21 20:07:13 UTC tags: trunk comment: Diff context loading: replaced string.replaceAll() with a more portable construct, as reported in [forum:c1f198f6993cd603 | forum post c1f198f6993cd603]. (user: stephan) check-ins: 16234
that seems correct as well (haven't pulled into it since yesterday).
Pulling updates:
$ fossil pull -R ~/.museum/fossil.fossil Pull from https://www.fossil-scm.org/home Round-trips: 3 Artifacts sent: 0 received: 11 Pull done, wire bytes sent: 1371 received: 9601 ip: 45.33.6.223 ```
$ fossil time -R ~/.museum/fossil.fossil
=== 2021-09-22 ===
12:22:07 [62deb8f794] Micro-adjustments to /chat CSS to squeeze a tiny bit more
space from the bottom of the screen. (user: stephan tags: trunk)
11:15:21 [593d3a3a1e] Simplified and consolidated how /chat internally manages
its 3 separate main views, with an eye towards making it easy to add
additional views. No user-visible changes. (user: stephan tags: trunk)
08:46:07 [9c38d83547] Edit [15d58775a75f0946|15d58775a7]: Edit check-in
comment. (user: stephan)
so seems to be working.
(17) By Stephan Beal (stephan) on 2021-09-22 14:13:44 in reply to 16 [link] [source]
Try recloning again using .../fossil/fossil. It should fail/be incomplete.
Right. Fails at the 8360 mark, as before.
... so seems to be working.
Indeed.
Would it be possible to get a "raw" copy of that repo from you, as opposed to a clone of it? If we can reproduce that problem from a copy of it then it's just a matter of time before it gets narrowed down (with the caveat that we have only one person who's familiar enough with the sync code to be likely to spot whatever weirdness is being triggered (and that person ain't me)).
If you could put a copy of the repo where i can nab it over http, that would be ideal. Alternately, stick it on your cloud storage of choice and send a link here or via email.
i'm about to be away for probably the rest of the day (CET), but would love to have a copy to try to reproduce this with tomorrow.
(18) By John Rouillard (rouilj) on 2021-09-22 17:20:01 in reply to 17 [link] [source]
If you could put a copy of the repo where i can nab it over http, that would be ideal.
Links (compressed, uncompressed, sha1sums) sent via email.
(19) By Stephan Beal (stephan) on 2021-09-23 07:26:11 in reply to 18 [link] [source]
Links (compressed, uncompressed, sha1sums) sent via email.
Bug reproduced and Richard has been pinged with a reproducible case, in the hope that his intimate knowledge of the relevant internals will enable him to quickly spot the problem.
(10) By John Rouillard (rouilj) on 2021-09-22 04:35:52 in reply to 4.1 [link] [source]
Also fossil test-integrity -R the-broken-clone reports bad hash starting at artifact 2. Final report is
8208 non-phantom blobs (out of 8382 total) checked: 5117 errors
low-level database integrity-check: ok
Running the same on the source repo reports:
53690 non-phantom blobs (out of 53771 total) checked: 0 errors
low-level database integrity-check: ok
(11) By Warren Young (wyoung) on 2021-09-22 05:02:06 in reply to 10 [link] [source]
And if you replace your internal server's broken clone with the one you just pulled for my test, do these problems go away?
I'm expecting the news report to be "Raspberry Pi corrupts data. Film at 11."
(15.1) By John Rouillard (rouilj) on 2021-09-22 13:57:08 edited from 15.0 in reply to 11 [link] [source]
Hi Warren:
If I do a fossil clone https://myserver/fossil/fossil
with the
fresh clone of fossil-scm.org as the repo, the clone works.
Note the server is not the pi. But it does seem to indicate some sort of undetected corruption/network sync issue with the original fossil repo on spinning disk on my linux box.
(20) By Richard Hipp (drh) on 2021-09-23 18:19:08 in reply to 1.1 [link] [source]
Please rebuild the client-side Fossil using check-in ea5afad31f478396, or later, redo the clone, and let us know if that clears your problem.
The ea5afad31f478396 checkin tries to resolve a problem in which many unversioned file transfers deceive the clone logic into stopping early, leaving some artifacts untransferred. The untransferred artifacts are used as baselines for other delta artifacts, thus rendering the repository unusable.
(21) By John Rouillard (rouilj) on 2021-09-23 19:21:31 in reply to 20 [link] [source]
Hi Richard:
Rebuilt fossil on my pi using ea5afd as a patch on the tip of the trunk version I got Tuesday.
Round-trips: 17 Artifacts sent: 0 received: 53909
looks much better. The pi is still rebuilding meta-data (40%). I updated to the current trunk on my windows pc. That has finished cloning and checked out the fossil source. So I claim this is fixed.
I noticed all the unversioned files syncing when I ran clone -v
during
debugging. In retrospect, the transfer stopped after unversioned files
were transferred. So more confirmation that you have the right diagnosis.
Why was my repo unable to be cloned while its parent fossil-scm.org was cloneable? What was special that caused a run of unversioned artifacts and killed the clone.
(22) By Richard Hipp (drh) on 2021-09-23 19:49:41 in reply to 21 [link] [source]
The original clone aborted early due to an incorrect determination that it has "finished" when it really had not. The incorrect determination was caused in part by the large number of unversioned files that were part of the clone.
(23) By John Rouillard (rouilj) on 2021-09-23 21:45:12 in reply to 22 [link] [source]
I understand that, what I don't understand is how the fossil repo got into that state. Why isn't fossil-scm.org having the same issue? All the uv files are from there. Why did my clone of the fossil repo differ from the repo it was cloned from?
Did I do something that caused the weird cloning issue?
(24) By Richard Hipp (drh) on 2021-09-23 22:16:40 in reply to 23 [link] [source]
The problem only comes up in a new database that is freshly cloned. The fossil repo at https://fossil-scm.org/home has not been cloned. It is the source of truth, not a clone of that source.
(25) By John Rouillard (rouilj) on 2021-09-23 22:21:38 in reply to 24 [link] [source]
So it would only show up in a clone of a clone? That's why I was able to clone fossil-scm.org but not my clone of fossil-scm.org?
How weird, I have multiple clones of that repo (from 4-5 years ago) and never had an issue. The pi was a new setup (had to wipe and reinstall) so the first time I cloned my fossil repo in probably 2 years.
Anyway thanks for the fix. Have a great day.