getting error - there are unresolved deltas - the clone is probably incomplete and unusable

(1) By abhijit (abhijitnandy) on 2021-12-17 12:02:32 [source]

My fossil version on my server is -

2.16 [7aedd56758]

On my local machine, my fossil version is -

2.17 [f48180f2ff]

While cloning, I get the error -

there are unresolved deltas - the clone is probably incomplete and unusable

and the repository file gets removed.

Are there any breaking changes between these two versions? Should I update fossil on the server?

(2) By Richard Hipp (drh) on 2021-12-17 12:51:54 in reply to 1 [link] [source]

There are no breaking changes in between those two versions, at least none that anybody knows about. I suspect that something else is going wrong.

Did you try the clone again? Could this be caused by an anomaly on the network?

What happens if you run "fossil test-integrity" on the server side?

(3.1) By abhijit (abhijitnandy) on 2021-12-17 14:21:05 edited from 3.0 in reply to 2 [link] [source]

I see some errors -

> fossil test-integrity repository.fossil
skip phantom 2818 bbab12b3d80963c2e4eadc81482191359c2d9c11
wrong hash on artifact 2819
skip phantom 2820 5988f42669fb887e20af3955b6dc76ec34eb7473
wrong hash on artifact 2821
skip phantom 2822 76c0ed28839168a39e75e7ab0df88d9c8d580b67
wrong hash on artifact 2823
wrong hash on artifact 2824
skip phantom 2825 c6552ae3f7d0a1b43e3463a96e30d4314ac2cef3
wrong hash on artifact 2826
wrong hash on artifact 2827
skip phantom 2828 154fc0c71f877d7ee6c1c4b7d2107e997d22c494
skip phantom 2830 de1efd4ae1986c4fe6273f9da9f98c0e9c07fa1d
3162 non-phantom blobs (out of 3168 total) checked:  6 errors
low-level database integrity-check: ok

How do I resolve these?

And what could have caused these?

(4) By Stephan Beal (stephan) on 2021-12-17 14:20:57 in reply to 3.0 [link] [source]

How do I resolve these?

The phantoms are generally harmless: those are references to blobs which your repo has seen a hash for somewhere but for which it does not have content.

Wrong hashes are invariably caused by data corruption...

And what could have caused these?

... from the storage media.

When importing files, fossil reads them back from the database to ensure that it can read, with 100% fidelity, what it's about to commit to the database. Thus "wrong hashes" are essentially impossible to introduce from fossil itself because such an error would be caught during the import and the database transaction would be rolled back, ensuring the db integrity remains solid.

Hash errors are, insofar as we've ever witnessed, only ever caused by an error in the storage itself. i.e. hard drive/SD/etc. failure.

Restoration from a backup (or another clone) is the only solution there.

(5) By abhijit (abhijitnandy) on 2021-12-17 14:44:41 in reply to 4 [link] [source]

Ok.

Also, I think I know what happened here.

Some time back an new team member tried to checkin a huge binary which was taking forever so they interrupted it. I think that resulted in these errors.

The question that remains, though, is - If I use fossil version 2.16 (locally), I don't get any issues cloning this repository. Only if use fossil version 2.17, I get the error -

there are unresolved deltas - the clone is probably incomplete and unusable

If these phantoms are truly harmless, how come having fossil version 2.17 (locally) is not bypassing this?

(6) By Stephan Beal (stephan) on 2021-12-17 15:01:56 in reply to 5 [link] [source]

Some time back an new team member tried to checkin a huge binary which was taking forever so they interrupted it. I think that resulted in these errors.

It "shouldn't" unless the storage is faulty. sqlite is very reliable against being interrupted.

The question that remains, though, is - If I use fossil version 2.16 (locally), I don't get any issues cloning this repository. Only if use fossil version 2.17, I get the error -

i suspect that you do have issues with 2.16, you're just not seeing an error message.

If you would, please, show us the output of the following using each version:

The clone process
Running: fossil test-integrity -R the-clone.fossil
Running: fossil dbstat -R the-clone.fossil (that will tell use the artifact count in each copy)

test-integrity messages about "skip phantom" can be ignored/removed for this purpose (but see the notes below).

If these phantoms are truly harmless, how come having fossil version 2.17 (locally) is not bypassing this?

They're harmless in the sense that they don't refer to corruption. They're not necessarily semantically harmless: they refer to "missing" content, but the important part is why they are missing. Often that's because a clone is incomplete, but it can also happen if you've used "shun" to remove things from the repository. (Pro tip: never shun anything unless it's absolutely necessary to eliminate "problematic" content.)

There was a change shortly before the 2.17 release which corrected a misdiagnosis of corruption which could be triggered by a repository having many "unversioned" files, and that change was closely related to the error message you're reporting. It's possible that the difference in messages you are seeing traces back to that fix.

(7) By abhijit (abhijitnandy) on 2021-12-17 15:12:58 in reply to 6 [link] [source]

The cloned repository doesn't get created at all.

D:\temp> fossil.2.17.exe clone https://username:password@url/reponame reponame.fossil
remember password (Y/n)? Y
Round-trips: 4   Artifacts sent: 0  received: 3190
Clone done, wire bytes sent: 1276  received: 11416350  ip: 139.59.70.48
there are unresolved deltas - the clone is probably incomplete and unusable.

D:\temp> dir

 Volume in drive D is Data              Serial number is 3608:bbc0
 Directory of  D:\temp

12/17/2021  20:36         <DIR>    .
12/17/2021  20:36         <DIR>    ..
                 0 bytes in 0 files and 2 dirs
   188,651,286,528 bytes free

(8) By Stephan Beal (stephan) on 2021-12-17 15:19:40 in reply to 7 [link] [source]

The cloned repository doesn't get created at all.

Please show us the same with 2.16, followed by test-integrity.

If you can give us access to the failing repository, that would be very helpful. If you can't post a link to it here, but can send one privately, my contact details can be found at https://wanderinghorse.net/home/stephan/ (and you have my promise that the link would only be shared via private channels with other fossil devs).

(10) By abhijit (abhijitnandy) on 2021-12-17 15:28:39 in reply to 8 [link] [source]

Sharing a link won't be a problem. I can send it over by email.

I'll create a new user in this repository and set the password before I send you all the details.

Would it be fine if I give this user only clone access?

Also, I forgot to mention while responding to your previous comment. This repository has a shunned artifact. The team member checked in an environment file which had database credentials.

(9) By abhijit (abhijitnandy) on 2021-12-17 15:22:43 in reply to 6 [link] [source]

However, when I clone with fossil version 2.16 -


D:\temp> fossil.2.16.exe clone https://username:password@url/reponame reponame.fossil
remember password (Y/n)? Y
Round-trips: 3   Artifacts sent: 0  received: 3190
Clone done, sent: 946  received: 11416036  ip: 139.59.70.48
Rebuilding repository meta-data...
  100.0% complete...
Extra delta compression...
Vacuuming the database...
project-id: f109b832d5c8f7937702a7527b074b056196efcd
server-id:  623e71678d706fcc1e41ec3e01bc4cea4007bb3e
admin-user: username (password is "WDgb473Uqn")



D:\temp> fossil.2.16.exe test-integrity -R reponame.fossil
skip phantom 2901 bbab12b3d80963c2e4eadc81482191359c2d9c11
wrong hash on artifact 2902
skip phantom 2903 5988f42669fb887e20af3955b6dc76ec34eb7473
wrong hash on artifact 2904
skip phantom 2905 76c0ed28839168a39e75e7ab0df88d9c8d580b67
wrong hash on artifact 2906
wrong hash on artifact 2907
skip phantom 2908 c6552ae3f7d0a1b43e3463a96e30d4314ac2cef3
wrong hash on artifact 2909
wrong hash on artifact 2910
3162 non-phantom blobs (out of 3166 total) checked:  6 errors
low-level database integrity-check: ok

D:\temp> fossil.2.16.exe dbstat -R reponame.fossil
project-name:      REPO NAME
repository-size:   12,582,912 bytes
artifact-count:    3,162 (stored as 442 full text and 2,720 deltas)
artifact-sizes:    34,989 average, 36,532,736 max, 110,390,684 total
compression-ratio: 8:1
check-ins:         642
files:             430 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            0
tag-changes:       4
latest-change:     2021-12-17 13:45:06 - about 0 days ago
project-age:       150 days or approximately 0.41 years.
project-id:        f109b832d5c8f7937702a7527b074b056196efcd
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (msc-19.00)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    1,536 pages, 8192 bytes/pg, 0 free pages, UTF-8, delete mode

(11.1) By Stephan Beal (stephan) on 2021-12-17 15:45:35 edited from 11.0 in reply to 9 [link] [source]

However, when I clone with fossil version 2.16 -

As suspected...

wrong hash on artifact 2902 ...

Apparently that check was changed in 2.17 so that it's caught earlier than 2.16 would have.

The phantom messages are not, in and of themselves, harmful, but the wrong-hash messages indicate corruption.

Please run test-integrity directly on the server copy (or have a sysadmin do so). My current guess is that that copy is physically corrupted. That's essentially always caused by storage-related issues. e.g. if the remote server is a Raspberry Pi hosting the repository from an SD card, that's a recipe for eventual disaster. If that copy is corrupted, and no non-corrupt clone or backup can be found with which to replace it, there is no recovery strategy. That doesn't mean all of the data are lost, though.

We can explore the available data recovery steps if the server-side copy is shown to be corrupt.

From your response which came in while i was typing this:

Would it be fine if I give this user only clone access?

Absolutely.

This repository has a shunned artifact. The team member checked in an environment file which had database credentials.

That explains the phantoms. Don't worry about those, though - they're not a db integrity problem, just a semantic hole in the project history.

(12) By abhijit (abhijitnandy) on 2021-12-17 16:07:05 in reply to 11.0 [link] [source]

I've sent the link.

Please run test-integrity directly on the server copy

The server is an Ubuntu 18.04 server. I just checked - we have 86 repositories there and never faced any problems. Of course, this is the only one with a shunned artifact!

I ran test-integrity and dbstat on the repository on the server. The results -

$ > fossil test-integrity -R reponame.fossil
skip phantom 2818 bbab12b3d80963c2e4eadc81482191359c2d9c11
wrong hash on artifact 2819
skip phantom 2820 5988f42669fb887e20af3955b6dc76ec34eb7473
wrong hash on artifact 2821
skip phantom 2822 76c0ed28839168a39e75e7ab0df88d9c8d580b67
wrong hash on artifact 2823
wrong hash on artifact 2824
skip phantom 2825 c6552ae3f7d0a1b43e3463a96e30d4314ac2cef3
wrong hash on artifact 2826
wrong hash on artifact 2827
skip phantom 2828 154fc0c71f877d7ee6c1c4b7d2107e997d22c494
skip phantom 2830 de1efd4ae1986c4fe6273f9da9f98c0e9c07fa1d
3162 non-phantom blobs (out of 3168 total) checked:  6 errors
low-level database integrity-check: ok

$ > fossil dbstat -R reponame.fossil
project-name:      REPO NAME
repository-size:   13,099,008 bytes
artifact-count:    3,162 (stored as 445 full text and 2,717 deltas)
artifact-sizes:    34,989 average, 36,532,736 max, 110,390,684 total
compression-ratio: 8:1
check-ins:         642
files:             430 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            0
tag-changes:       4
latest-change:     2021-12-17 13:45:06 - about 0 days ago
project-age:       150 days or approximately 0.41 years.
project-id:        f109b832d5c8f7937702a7527b074b056196efcd
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (gcc-5.4.0 20160609)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    1,599 pages, 8192 bytes/pg, 1 free pages, UTF-8, delete mode

(13) By Scott Robison (sdr) on 2021-12-17 16:25:20 in reply to 12 [link] [source]

I would want someone else to concur with this idea, but it seems to me that there is also a possibility for corruption not just on the server side (though I wouldn't expect it) but also if someone committed data to a flaky hard drive and subsequently pushed the data up from that source. But I don't know that to be the case, and as I said, it seems unlikely to happen with the integrity checks fossil uses.

(14) By Stephan Beal (stephan) on 2021-12-17 16:25:29 in reply to 12 [link] [source]

I ran test-integrity and dbstat on the repository on the server. The results -

Since the server-side copy is "bad," there will be no hope for cloning a working copy from it. i have cloned the link you sent me and will see what i can do about recovery steps. We've never needed to do any, but things i will try in my copy...

Shun the corrupted artifacts. That may get us to a point where we can cleanly clone again. If that fails...
Use the deconstruct command to recover as much content as possible (which "should" be all except the 6 broken blobs). Restoring the proper filenames and checkin order from that heap of blobs will be a major chore, so i'm hoping the first option will suffice.

i'll be in touch when there's something to report.

@Devs if you'd like to take your own whack at this, just say so in /chat and i'll post the link there.

(15) By Stephan Beal (stephan) on 2021-12-17 17:01:58 in reply to 12 [link] [source]

I ran test-integrity and dbstat on the repository on the server. The results -

Here's what i know so far...

Here are the hashes of the 6 RIDs which are being reported:

55674073deac0bdeb768708b0b89694c1d026e87
f094a770e1f43de4e695783e301ec6435d0fdc50
60e73e9c1a8b989b04b27f76e03916d76a9a39cd
c413a30c26c29aeb84ca6a50597b2060e85536ea
053e335e9e6a463420e6f60afd1c3f163a8a5b6c
459b9cd17293541e5d03924caee5252a96534fe5

The interesting thing is that their sizes are all recorded as being zero:

rid	size	uuid
2902	0	55674073deac0bdeb768708b0b89694c1d026e87
2904	0	f094a770e1f43de4e695783e301ec6435d0fdc50
2906	0	60e73e9c1a8b989b04b27f76e03916d76a9a39cd
2907	0	c413a30c26c29aeb84ca6a50597b2060e85536ea
2909	0	053e335e9e6a463420e6f60afd1c3f163a8a5b6c
2910	0	459b9cd17293541e5d03924caee5252a96534fe5

(Sidebar: the RID values may very well differ in any given clone - they are not stable across different copies of a repository. The hashes are the definitive names of the blobs.)

All size-0 blobs have an identical hash value, but those blobs have different hashes, which indicates that the sizes of these blobs somehow got screwed up.

sqlite> create table h(u unique);
sqlite> insert into h select uuid from blob where rid in (2902,2904,2906,2907,2909,2910);

sqlite> select * from h;
'55674073deac0bdeb768708b0b89694c1d026e87'
'f094a770e1f43de4e695783e301ec6435d0fdc50'
'60e73e9c1a8b989b04b27f76e03916d76a9a39cd'
'c413a30c26c29aeb84ca6a50597b2060e85536ea'
'053e335e9e6a463420e6f60afd1c3f163a8a5b6c'
'459b9cd17293541e5d03924caee5252a96534fe5'

sqlite> select rid, size, length(content) from blob where uuid in h;
2909,0,32
2910,0,201
2902,0,166
2906,0,33
2907,0,240
2904,0,33

That shows us that there is some data there, but the small sizes imply (but do not necessarily prove) that those are fossil deltas. However, fossil is unable to extract the data from those:

sqlite> select content('c413a30c26c29aeb84ca6a50597b2060e85536ea');
NULL

For a "healthy" artifact that would have output a binary dump of the artifact, like so:

sqlite> select cast(content('rid:1') as text);
'C initial\sempty\scheck-in
D 2021-07-21T10:38:41.433
R d41d8cd98f00b204e9800998ecf8427e
T *branch * trunk
T *sym-trunk *
U douser
Z f66f8146cd108ccfce018f8e5e955be0
'

My current recommendation, after having tried it locally is:

Back a backup of the server-side repo directly on the server (as opposed to a clone).
Shun the artifacts listed above. To do so, log in to the server repo via the fossil UI, go /shun, and paste in the following list of hashes into the top-most data entry field:

55674073deac0bdeb768708b0b89694c1d026e87
f094a770e1f43de4e695783e301ec6435d0fdc50
60e73e9c1a8b989b04b27f76e03916d76a9a39cd
c413a30c26c29aeb84ca6a50597b2060e85536ea
053e335e9e6a463420e6f60afd1c3f163a8a5b6c
459b9cd17293541e5d03924caee5252a96534fe5

then tap the "Shun" button. It will then provide instructions at the top of the screen about the next step, but they're about impossible to read with the darkmode skin, so: just tap the "rebuild" button at the bottom of the page. That will take a brief moment to complete, but will REMOVE those 6 artifacts from the repository.

That will get you back into a state where you can clone, but i cannot say which data those 6 blobs reflect so cannot be sure that your repository will be semantically sane afterwards.

So far i've been unable to figure out what filenames (if any) those 6 blobs refer to. They might be/have been wiki pages or tickets or anything at all. i'll continue to try to figure out why those blobs are unreadable, but i'm quickly reaching the end of my options.

(16) By Stephan Beal (stephan) on 2021-12-17 17:19:29 in reply to 12 [link] [source]

I ran test-integrity and dbstat on the repository on the server. The results -

After discussing this with Warren and Richard via /chat, we have another option to propose:

It seems very possible that someone on your development team has a copy of the repo where these 6 blobs are intact. The way to find that out is a bit hacky, but if you can track down such a copy, it can be used to replace the broken server-side copy.

First, this note from Richard regarding replacing the server-side db:

If replacing the server repo with one of the remotes, first save off a copy of the busted repo, of course. Don't overwrite anything. But also, do "fossil config export all SOMEFILE" on the old server repo first. Then bring in the clone. Then on the clone do "fossil config import SOMEFILE" in order to save all of the USER table.

Probably the easiest way to check each developer's copy of the repo is via this SQL:

select size,uuid from blob where uuid in (
'55674073deac0bdeb768708b0b89694c1d026e87',
'f094a770e1f43de4e695783e301ec6435d0fdc50',
'60e73e9c1a8b989b04b27f76e03916d76a9a39cd',
'c413a30c26c29aeb84ca6a50597b2060e85536ea',
'053e335e9e6a463420e6f60afd1c3f163a8a5b6c',
'459b9cd17293541e5d03924caee5252a96534fe5'
);

Put that into a file and run something like the following against your available copies of that repo:

$ fossil sql -R REPOFILE.fossil < X.sql 
0,'053e335e9e6a463420e6f60afd1c3f163a8a5b6c'
0,'459b9cd17293541e5d03924caee5252a96534fe5'
0,'55674073deac0bdeb768708b0b89694c1d026e87'
0,'60e73e9c1a8b989b04b27f76e03916d76a9a39cd'
0,'c413a30c26c29aeb84ca6a50597b2060e85536ea'
0,'f094a770e1f43de4e695783e301ec6435d0fdc50'

You're looking for a copy where the first column of the results is non-0. That's a non-corrupt copy. (Or not corrupted in this particular way. ;)

Can you try that out?

(17.1) By abhijit (abhijitnandy) on 2021-12-17 18:20:49 edited from 17.0 in reply to 16 [link] [source]

Thanks Stephan. Will try that and let you know by Monday (20 Dec).

Edit -

I found an older version of the repository in another folder in my computer (Older by 17 days)

Running the SQL provided by you gives me this output -

D:\temp> fossil.2.16.exe sql -R old.reponame.fossil < X.sql
3626,'053e335e9e6a463420e6f60afd1c3f163a8a5b6c'
25550,'459b9cd17293541e5d03924caee5252a96534fe5'
25678,'55674073deac0bdeb768708b0b89694c1d026e87'
3290,'60e73e9c1a8b989b04b27f76e03916d76a9a39cd'
25562,'c413a30c26c29aeb84ca6a50597b2060e85536ea'
523,'f094a770e1f43de4e695783e301ec6435d0fdc50'

So, this looks like an un-corrupted copy.

The test-integrity on this gives -

D:\temp> fossil.2.16.exe test-integrity -R old.reponame.fossil
2845 non-phantom blobs (out of 2845 total) checked:  0 errors
low-level database integrity-check: ok

The dbstat on this gives -

D:\temp> fossil.2.16.exe dbstat -R old.reponame.fossil
project-name:      REPO NAME
repository-size:   76,587,008 bytes
artifact-count:    2,845 (stored as 536 full text and 2,309 deltas)
artifact-sizes:    94,590 average, 41,563,648 max, 269,014,158 total
compression-ratio: 35:10
check-ins:         561
files:             407 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            0
tag-changes:       4
latest-change:     2021-12-01 06:58:49 - about 16 days ago
project-age:       150 days or approximately 0.41 years.
project-id:        f109b832d5c8f7937702a7527b074b056196efcd
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (msc-19.00)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    9,349 pages, 8192 bytes/pg, 1 free pages, UTF-8, delete mode

However, since then, 81 checkins have happened and 23 new files added.

Is there anything I can do to get this information to the current repository?

(18.1) By Stephan Beal (stephan) on 2021-12-17 18:50:50 edited from 18.0 in reply to 17.1 [link] [source]

So, this looks like an un-corrupted copy.

Definitely :). First...

Be aware that we don't get email notifications when you edit posts, so i was only aware of your update by accident. Please post new messages for new info so that we get email notifications.

Here's how you should be able to recover relatively painlessly...

First, make backups of both the server and that clone "just in case."

Secondly, make sure no other developers are going to try to commit while you're doing this. It won't take long, but we don't want any interference.

Then...

From the working clone, run:

fossil sync

That will pull in any checkins/etc which it might not yet have from the server. That won't pull the broken blobs into your repo because fossil considers all blobs with identical hashes to be identical blobs (and therefore has no reason to want to sync them).

To confirm that sync did not break it, run the SQL again on that repo after running sync. Also run test-integrity on it, just to be absolutely sure.

Next, from the server's repo (directly on that machine), perform a config dump as described above in post 177a574cfd7e6591.

Now replace the server's repo copy with a direct copy of the above clone, then re-import the config as described in that previous link.

That "should," unless i've overlooked something significant, get you back up and running. You'll need to re-clone (not just pull/update) from any clients which got the broken blobs.

If something goes horribly wrong, restore the server-side repo from the corrupted backup and we can try something else.

Either way, please let us know how this works. We'd be particularly interested in figuring out how those 6 blobs (which i suspect to have been part of a single checkin because of how their RIDs are grouped) got out of whack. We've never (in 14+ years) seen that happen.

(19) By Warren Young (wyoung) on 2021-12-17 19:50:09 in reply to 18.1 [link] [source]

More salient details.

(21) By abhijit (abhijitnandy) on 2021-12-17 20:00:49 in reply to 19 [link] [source]

Thanks Warren.

I'm setting the SQL-Level Backup up as a cron right now!

(20) By abhijit (abhijitnandy) on 2021-12-17 19:53:43 in reply to 18.1 [link] [source]

Thanks a lot Stephan.

I can also guess why this happened.

Our fossil repositories are behind an Nginx server to serve the SSL certs. The Nginx config for the repositories URL is limited to 5 MB.

The uncorrupted repository that I had was from the team member who had tried to checkin multiple 40 MB binary executables. That checkin had failed.

When I tried the fossil sync command, I got this error -

D:\temp> fossil.2.16.exe sync -R reponame.fossil
Round-trips: 1   Artifacts sent: 0  received: 0
server says: 413 Request Entity Too Large

I'm guessing the Nginx server had that time also stepped in and interrupted the fossil checkin from happening.

This time around also, I had to change the nginx.conf to

client_max_body_size            50m;

for the fossil sync to work.

I've understood all the steps that you and Richard have mentioned.

However, if I do as you suggested, I'll end up with this huge repository, with all those unnecessary binary files which were not supposed to be checked in in the first place!

So, I'm going to reach out to my team tomorrow and see if I can get a similar uncorrupted copy and try to restore from that.

Will update here.

Be aware that we don't get email notifications when you edit posts, so i was only aware of your update by accident. Please post new messages for new info so that we get email notifications.

That thought crossed my mind too after I replied. Will post new messages from next time.

(22) By Stephan Beal (stephan) on 2021-12-18 01:39:25 in reply to 20 [link] [source]

However, if I do as you suggested, I'll end up with this huge repository, with all those unnecessary binary files which were not supposed to be checked in in the first place!

That is a conundrum :/. One option would be to let the sync run and then shun those particular binaries.

(26) By abhijit (abhijitnandy) on 2021-12-18 20:11:31 in reply to 22 [link] [source]

One option would be to let the sync run and then shun those particular binaries.

Thanks Stephan. That resolved it for the local repositories. The server repository will of course stay huge and I'm fine with that.

Thanks for all your help.

(27) By Stephan Beal (stephan) on 2021-12-18 20:20:14 in reply to 26 [link] [source]

That resolved it for the local repositories. The server repository will of course stay huge and I'm fine with that.

It doesn't have to stay huge. Shun them there then rebuild:

fossil -R the-repo.fossil rebuild --vacuum

(29) By abhijit (abhijitnandy) on 2021-12-18 20:32:47 in reply to 27 [link] [source]

fossil rebuild the-repo.fossil --vacuum

Woah! That's done then! Thanks a bunch!!

(25) By Richard Hipp (drh) on 2021-12-18 12:37:23 in reply to 20 [link] [source]

I don't believe that an interrupted transfer would cause this problem. One of the benefits of using SQLite for storage is that if a transaction is interrupted, it automatically rolls back, so interrupting a sync operation in progress should be harmless. Furthermore, the sync protocol is designed to be stateless (on the server side) so that you end up with a fully consistent repository after each HTTP request, even if multiple HTTP requests are required to complete the sync.

So, unless you have hard evidence to the contrary, I do not believe that a "client_max_body_size" or similar restriction on the web server will cause this problem.

Note that the althttpd.c webserver used by fossil-scm.org has a hard-coded "client_max_body_size" of 25MB. I have, once or twice, hit that limit for private repos, and had to complete the sync using ssh: rather than https:. But the https: operations simply failed - they didn't corrupt the repository.

I also don't believe that running "fossil shun" will cause this problem, because (IIRC) the shun command makes sure the artifact being shunned is not used as the source of a delta.

My theory: Somebody has gone into your repo using "fossil sql" or just a plan "sqlite3" command-line tool and run "DELETE" on the BLOB table to get rid of the oversized artifacts.

There could be some obscure bug in Fossil that is causing this problem. But until I have a repro case or other evidence, I'm going with the theory of manual human corruption of the BLOB table.

(28) By abhijit (abhijitnandy) on 2021-12-18 20:29:47 in reply to 25 [link] [source]

I do not believe that a "client_max_body_size" or similar restriction on the web server will cause this problem.

I agree with you Richard. I was thinking about this afterwards, because otherwise the fossil sync command itself would have caused a similar problem.

Somebody has gone into your repo using "fossil sql" or just a plan "sqlite3" command-line tool and run "DELETE" on the BLOB table to get rid of the oversized artifacts.

The team members don't have access to the server, so that wouldn't happen. I'll confirm, however. Either way, something like that would be an absolute no-no in our organization.

Also, when situations like this occur, usually we end up trying multiple things before we reach out to help. I think that is what occurred in this case, before they approached me. It's probably a little difficult to trace back all the steps taken when things go unexpectedly!

There could be some obscure bug in Fossil that is causing this problem. But until I have a repro case or other evidence, I'm going with the theory of manual human corruption of the BLOB table.

I have no issues with sharing the repository. I've already shared the access details with Stephan and he has cloned the corrupted repository.

As of now though, since the corruption is fixed, cloning will not show any issues in the repository.

It would be great if you can look into the corrupted repository as well. I can email it to you directly if that's what you prefer.

Fossil has been my go-to version management system for over a decade now and I would be grateful if I can do anything to improve it in my own small way!

Thank you for the wonderful piece of software. Besides SQLite of course :)

(30) By Stephan Beal (stephan) on 2021-12-18 20:40:40 in reply to 28 [link] [source]

I have no issues with sharing the repository. I've already shared the access details with Stephan and he has cloned the corrupted repository.

We have the corrupted clone but we'd ideally need access to the original broken server copy, as opposed to a clone. If you could set up a copy of that for us, you can purge any private info from it with:

$ fossil scrub --verily the-copy-of-the-repo.fossil

See...

$ fossil help scrub
Usage: fossil scrub ?OPTIONS? ?REPOSITORY?

The command removes sensitive information (such as passwords) from a
repository so that the repository can be sent to an untrusted reader.
...

(31.1) By abhijit (abhijitnandy) on 2021-12-18 21:23:44 edited from 31.0 in reply to 30 [link] [source]

I've sent the fossil repository in an email to you and Richard.

Thanks again.

(23) By anonymous on 2021-12-18 03:50:00 in reply to 16 [link] [source]

> Probably the easiest way to check each developer's copy of the repo is
> via this SQL:

Why not just use "fossil test-integrity" on all the developers clones to
find out which one does not report the bad hash errors?

Andy

(24.1) By Stephan Beal (stephan) on 2021-12-18 04:03:48 edited from 24.0 in reply to 23 [link] [source]

Why not just use "fossil test-integrity" on all the developers clones to find out which one does not report the bad hash errors?

~~That would also be an option, it's just not the one which came to mind at the time.~~

Edit: nevermind, that wouldn't be an option. test-integrity reports the RIDs of the corrupted blobs, not their hashes, and the RIDs are not stable across different repository clones.