Fossil produces invalid tarballs
(1.1) By ET. on 2022-04-10 06:51:54 edited from 1.0 [source]
$ fossil tar release fossil.tgz -R ~/sqlite3.fossil
$ tar axvf fossil.tgz
...
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/LockScreenLogo.scale-200.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/SplashScreen.scale-200.png
gzip: stdin: invalid compressed data--crc error
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/Square150x150Logo.scale-200.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/Square44x44Logo.scale-200.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/Square44x44Logo.targetsize-24_altform-unplated.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/StoreLogo.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Assets/Wide310x150Logo.scale-200.png
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/MainPage.xaml
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/MainPage.xaml.cpp
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/MainPage.xaml.h
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/Package.appxmanifest
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/pch.cpp
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/pch.h
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/vsixtest.sln
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/vsixtest.tcl
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/vsixtest.vcxproj.data
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/vsixtest.vcxproj.filters
SQLite_2022-03-26_135110_d33c709cc0/vsixtest/vsixtest_TemporaryKey.pfx
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Either this or one of the png files causes this.
(2) By Stephan Beal (stephan) on 2022-04-11 06:35:28 in reply to 1.1 [link] [source]
tar axvf fossil.tgz
i'm unable to reproduce that.
[stephan@nuc:~/fossil]$ f tar release x.tgz -R sqlite3.fsl
[stephan@nuc:~/fossil]$ tar xzf x.tgz
(no output)
[stephan@nuc:~/fossil]$ rm -fr SQLite_2022-03-26_135110_d33c709cc0
Can you give us more details, e.g. what platform and fossil version?
(3) By ET. on 2022-04-11 11:35:03 in reply to 2 [link] [source]
$ f v
This is fossil version 2.17 [f48180f2ff] 2021-10-09 14:43:10 UTC
$ uname -a
Linux archlinux 5.17.1-zen1-1-zen #1 ZEN SMP PREEMPT Mon, 28 Mar 2022 21:56:46 +0000 x86_64 GNU/Linux
I've replicated it using the latest fossil version 2.18 and with tip as well to the same effect. I also made a fresh clone of sqlite only to produce the same tarball.
I'm at a loss right now on this how this can be. I get two different sha1 of the release tarballs with each different version of fossil.
My tar is not broken because busybox tar and 7z also report the crc error. Interestingly the crc error only occurs randomly in between extracting anyone of the png files, but I'm not sure what this tells me.
$ f v
This is fossil version 2.17 [f48180f2ff] 2021-10-09 14:43:10 UT
$ fossil tar release sqlite.tgz
$ sha sqlite.tgz
a3dc5cad81de349f3faeba1455e7c24744dd4148 sqlite.tgz
$ ./fossil v
This is fossil version 2.18 [84f25d7eb1] 2022-02-23 13:22:20 UTC
$ fossil tar release sqlite.tgz
$ sha sqlite.tgz
feb62c845673e450bfaab7aa6f565f3102a958a9 sqlite.tgz
(4.2) By Warren Young (wyoung) on 2022-04-11 13:14:47 edited from 4.1 in reply to 3 [link] [source]
Doesn't repro on macOS 12.3.1, either with bsdtar
or GNU tar
. Tested with both the 2.18 release version and with version 2.19 [1bb4147fd2] 2022-03-28 08:34:25 UTC
.
(7) By Andy Bradford (andybradford) on 2022-04-12 01:46:27 in reply to 3 [link] [source]
What does file(1) say that the tar.gz is? When I export from fossil using the same hash as you, I get: $ file sqlite-d33c709cc0.tar.gz sqlite-d33c709cc0.tar.gz: gzip compressed data, last modified: Sat Mar 26 13:51:10 2022, max compression Also, regarding the fact that different versions of fossil produce different sha sums is interesting. What happens if you do two "fossil tar" using the same build? Do the sha sums differ then? I observe the same difference in sha1 sums when I move between some different versions of Fossil: version 2.19 [caba4b0188] produces (consistently): SHA1 (sqlite-d33c709cc0.tar.gz) = c1ba6b32fd0ec5cf41d3d01903e065a239ee0a29 version 2.18 [84f25d7eb1] produces (consistently): SHA1 (sqlite-d33c709cc0.tar.gz) = c1ba6b32fd0ec5cf41d3d01903e065a239ee0a29 version 2.17 [f48180f2ff] produces (consistently): SHA1 (sqlite-d33c709cc0.tar.gz) = 3d35d3a06512700adf53340ad348f27392e552c9 So, I wonder why 2.17 produced a different hash than 2.18 and 2.19 as nothing changed between the builds (e.g. no difference in libz). Also of note is that gzip(1) produces yet another sha sum: $ file sqlite-d33c709cc0.tar.gz sqlite-d33c709cc0.tar.gz: gzip compressed data, was "sqlite-d33c709cc0.tar", last modified: Tue Apr 12 01:34:54 2022, max compression, from Unix $ sha1 sqlite-d33c709cc0.tar.gz SHA1 (sqlite-d33c709cc0.tar.gz) = ba1da59343731eec7feb6cc2e8213437f5f40920 I don't know if it can be expected that the sha sum matches between various libz compression implementation outputs: $ pigz -v9 -k sqlite-d33c709cc0.tar sqlite-d33c709cc0.tar to sqlite-d33c709cc0.tar.gz $ file sqlite-d33c709cc0.tar.gz sqlite-d33c709cc0.tar.gz: gzip compressed data, was "sqlite-d33c709cc0.tar", last modified: Tue Apr 12 01:34:54 2022, max compression, from Unix $ sha1 sqlite-d33c709cc0.tar.gz SHA1 (sqlite-d33c709cc0.tar.gz) = 45a0fd91fc1aaaf75d7dd4b85c5dc42e41874cb2 In other words, I'm not sure that it matters that between different versions of fossil the sha sums are different. Andy
(8) By Stephan Beal (stephan) on 2022-04-12 02:43:32 in reply to 7 [link] [source]
Also, regarding the fact that different versions of fossil produce different sha sums is interesting.
That's a known issue and has come up before in the sqlite forum. Any single byte of difference changes the hash, and nether fossil nor (AFAIK) zlib guaranty that they will always produce an identical byte stream for the same inputs. They only guaranty that what you put in is what you'll get out. How it's stored is their business and they're free to change it.
(9) By Andy Bradford (andybradford) on 2022-04-12 02:51:50 in reply to 8 [link] [source]
> Any single byte of difference changes the hash, and nether fossil nor > (AFAIK) zlib guaranty that they will always produce an identical byte > stream for the same inputs. That's pretty much the conclusion that I was coming to near the end of my post. Not even gzip and pigz produce identical outputs. Andy
(10) By Warren Young (wyoung) on 2022-04-12 03:33:31 in reply to 8 [link] [source]
I wasn’t even bothering with hashes since the test input can also change, depending on when you last synced the repo in question.
It didn’t help that “sha” is ambiguous. By the length I assume it’s SHA-1, but…
(11) By Andy Bradford (andybradford) on 2022-04-12 04:57:59 in reply to 10 [link] [source]
> I wasn’t even bothering with hashes since the test input can also change Of course using a sha1 hash to compare anything must necessarily assume a static input, and if one exports a tar from a hash with fossil it should always produce the exact same files coming from a Fossil repository that hasn't been altered. It's debatable whether or not the resulting tar will always have the same ordering of files, but I don't see why it would change between invocations, and even builds unless the code has changed sorting order of files input into the tar routine. That being said, let's see what's brewing in Fossil: Here are all the various "fossil tar" exports for SQLite hash d33c709cc0: $ sha1 sqlite-d33c709cc0-2.1*.tar SHA1 (sqlite-d33c709cc0-2.17.tar) = b56458ad783930f507d7aa53590c42bcf2224eaa SHA1 (sqlite-d33c709cc0-2.18.tar) = 36ad7f9801dfb109890fc2610dd4fd94aee7c5e9 SHA1 (sqlite-d33c709cc0-2.19.tar) = 36ad7f9801dfb109890fc2610dd4fd94aee7c5e9 $ ls -l sqlite-d33c709cc0-2.1*.tar -rw-r--r-- 1 amb amb 107231744 Apr 11 22:25 sqlite-d33c709cc0-2.17.tar -rw-r--r-- 1 amb amb 107231744 Apr 11 22:20 sqlite-d33c709cc0-2.18.tar -rw-r--r-- 1 amb amb 107231744 Apr 11 22:24 sqlite-d33c709cc0-2.19.tar So it looks like the actual tar code, not the libz code is responsible for changing the tar file contents. Now I wonder if they actually contain the same files (which would be a bug in my opinion if they do not). It's really interesting that the resulting tar files all have the exact same size but only 2 of them share the same SHA1 hash. Now, I output the file listing to make things easier to compare: $ for t in sqlite-d33c709cc0-2.1*.tar; do tar tvf $t > $t.fs; done And now look what a difference we find between 2.17 and 2.18: $ diff -u sqlite-d33c709cc0-2.17.tar.fs sqlite-d33c709cc0-2.18.tar.fs --- sqlite-d33c709cc0-2.17.tar.fs Mon Apr 11 22:33:45 2022 +++ sqlite-d33c709cc0-2.18.tar.fs Mon Apr 11 22:33:45 2022 @@ -1,5 +1,5 @@ drwxr-xr-x 2 nobody nobody 0 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0 --rw-r--r-- 1 nobody nobody 151865 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0/manifest +-rw-r--r-- 1 nobody nobody 151805 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0/manifest -rw-r--r-- 1 nobody nobody 65 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0/manifest.uuid drwxr-xr-x 2 nobody nobody 0 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0/.fossil-settings -rw-r--r-- 1 nobody nobody 7 Mar 26 07:51 SQLite_2022-03-26_135110_d33c709cc0/.fossil-settings/empty-dirs The size of the manifest is different by 60 bytes. That's interesting. How is it possible for the manifest to have changed even when the repository is static and no commits have happened except that some code change introduced it? Guess we'll have to inspect the actual contents of the manifest. $ for t in sqlite-d33c709cc0-2.1*.tar; do mkdir $t.d && tar xvf $t -C $t.d >/dev/null; done $ diff -ur sqlite-d33c709cc0-2.17.tar.d sqlite-d33c709cc0-2.18.tar.d diff -ur sqlite-d33c709cc0-2.17.tar.d/SQLite_2022-03-26_135110_d33c709cc0/manifest sqlite-d33c709cc0-2.18.tar.d/SQLite_2022-03-26_135110_d33c709cc0/manifest --- sqlite-d33c709cc0-2.17.tar.d/SQLite_2022-03-26_135110_d33c709cc0/manifest Sat Mar 26 07:51:10 2022 +++ sqlite-d33c709cc0-2.18.tar.d/SQLite_2022-03-26_135110_d33c709cc0/manifest Sat Mar 26 07:51:10 2022 @@ -1950,4 +1950,3 @@ T +sym-version-3.38.2 * U drh Z 1c2a1cc2a9288218f72e9d4db2ebcd7a -# Remove this line to create a well-formed Fossil manifest. So it looks like for some reason it was decided to just get rid of that bogus line in the exported manifest between 2.17 and 2.18. This explains why "fossil tar" in 2.17 and "fossil tar" in 2.18 do not produce the same output. And here is the reason: https://www.fossil-scm.org/home/info/722c248d5381b3e8 Again, this just reaffirms the statement that the hash differences by themselves don't necessarily indicate a problem. Also, I fail to see how this change would be the cause of the originally reported problem that the tarball is invalid. Andy
(12) By Stephan Beal (stephan) on 2022-04-12 08:33:45 in reply to 11 [link] [source]
Of course using a sha1 hash to compare anything must necessarily assume a static input, and if one exports a tar from a hash with fossil it should always produce the exact same files coming from a Fossil repository that hasn't been altered.
That would be ideal, but in practice we cannot guaranty that:
- It uses a 3rd-party library for the compression and that library is free to behave how it wants, so long as it does the job we pay it to do.
- Fossil is free to rearrange how it creates the zip files from one version to the next. Perhaps it changes their order or fiddles with timestamps or switches to upper-cases hashes for the top-most dir name, or whatever.
If we impose the restriction that it must produce the same hashcode for the same repo version, ignoring for a moment that we can't because of the first point, fossil would be required for forever generate tar/zip files in exactly the same way it does today, with no wiggle room whatsoever for further future improvements.
An example: we recently changed how the manifest files which get added to a zip/tar are added, eliminate the "#" line which "artificially invalidated" the manifest. (That invalidation was initially done to to avoid confusion if the zip's contents, including that manifest, were later re-imported into fossil.) Thus building a tar from a pre-Christmas fossil version will, with 100% likelihood, result in a different hash for the same project version. The hashes of the individual files within the tar are all recorded in that aforementioned manifest file, though, and they are 100% immutable.
(15) By Andy Bradford (andybradford) on 2022-04-12 14:44:55 in reply to 12 [link] [source]
> It uses a 3rd-party library for the compression and that library is > free to behave how it wants, so long as it does the job we pay it to > do. Understood. Again, this is pretty much what I already said. Also, in the post to which you replied I showed that it isn't the compression library that is responsible for the difference but rather a difference in the tar file, and more specifically a difference in one particular file in the tar (the manifest). > An example: we recently changed how the manifest files which get added > to a zip/tar are added, eliminate the "#" line which "artificially > invalidated" the manifest. This again was pointed out in the very post to which you replied. Is your medium for viewing the posts not showing you everything? :-) In that same post I also pointed out that a difference in hash isn't necessarily indicative of an "invalid tarball". Basically, we need more information or steps to reproduce it. Finally, I'm not suggesting that Fossil should always produce the same exported hash---quite the contrary as I was basically demonstrating WHY it did not---however, it certainly does do that (except in cases where the code was intentionally changed like the manifest handling). Thanks, Andy
(17) By Stephan Beal (stephan) on 2022-04-12 15:18:55 in reply to 15 [link] [source]
Is your medium for viewing the posts not showing you everything? :-)
It's an admittedly bad habit of mine to stop reading and start replying at the first convenient point, and then forgetting entirely that there's still more to read :/.
(13) By ET. on 2022-04-12 10:50:56 in reply to 7 [link] [source]
file says it's a tarball
$ file sqlite.tgz
sqlite.tgz: gzip compressed data, last modified: Sat Mar 26 13:51:10 2022, max compression, original size modulo 2^32 107231744
But right now I'm going to absolve fossil of any blame and call it by saying something must be funky with my system because even if I test-tar the folder the crc-error still occurs.
$ f v
This is fossil version 2.18 [84f25d7eb1] 2022-02-23 13:22:20 UTC
$ f test-tar ../src.tgz *
$ tar axvf ../src.tgz
aclocal.m4
art
autoconf
compat
config.guess
config.h.in
config.sub
configure
configure.ac
contrib
doc
ext
.fossil-settings
.fslckout
install-sh
LICENSE.md
ltmain.sh
magic.txt
main.mk
Makefile.in
Makefile.linux-gcc
Makefile.msc
manifest
gzip: stdin: invalid compressed data--crc error
manifest.uuid
mkso.sh
mptest
README.md
spec.template
sqlite3.1
sqlite3.pc.in
sqlite.pc.in
src
test
tool
VERSION
vsixtest
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Previously it was complaining about the png files now its anywhere.
$ f test-tar ../vsixtest.tgz vsixtest/*
$ tar axvf ../vsixtest.tgz
gzip: stdin: invalid compressed data--crc error
vsixtest
vsixtest/App.xaml
vsixtest/App.xaml.cpp
vsixtest/App.xaml.h
vsixtest/Assets
vsixtest/MainPage.xaml
vsixtest/MainPage.xaml.cpp
vsixtest/MainPage.xaml.h
vsixtest/Package.appxmanifest
vsixtest/pch.cpp
vsixtest/pch.h
vsixtest/vsixtest.sln
vsixtest/vsixtest.tcl
vsixtest/vsixtest_TemporaryKey.pfx
vsixtest/vsixtest.vcxproj.data
vsixtest/vsixtest.vcxproj.filters
tar: Child returned status 1
tar: Error is not recoverable: exiting now
At my whits' end now.
(14) By Stephan Beal (stephan) on 2022-04-12 11:58:41 in reply to 13 [link] [source]
But right now I'm going to absolve fossil of any blame and call it by saying something must be funky with my system...
i'm gonna go out on a limb and suggest that it might be bad RAM. That's what it sounds like to me.
(16) By Andy Bradford (andybradford) on 2022-04-12 14:47:27 in reply to 13 [link] [source]
> $ tar axvf ../src.tgz What if instead of using tar in this fashion you break it up into individual commands? For example: $ gzip -o /tmp/src.tar -d ../src.tgz $ file /tmp/src.tar $ tar -C /tmp/src xvf /tmp/src.tar Something like that? Andy
(5) By Andy Bradford (andybradford) on 2022-04-12 00:27:49 in reply to 1.1 [link] [source]
My version of tar (BSD) doesn't even have an 'a' option: $ tar avfz new.tar.gz tar: unknown option a usage: tar {crtux}[014578befHhjLmNOoPpqsvwXZz] [blocking-factor | archive | replstr] [-C directory] [-I file] [file ...] tar {-crtux} [-014578eHhjLmNOoPpqvwXZz] [-b blocking-factor] [-C directory] [-f archive] [-I file] [-s replstr] [file ...] What does it do? Andy
(6) By Martin Gagnon (mgagnon) on 2022-04-12 00:57:33 in reply to 5 [link] [source]
I was wondering too, it seems to be to auto detect compression using file extension.
From manpage:
Compression options -a, --auto-compress Use archive suffix to determine the compression program.
(18) By Luna Gräfje (LunaLikesSpace) on 2022-04-20 06:08:33 in reply to 1.1 [link] [source]
Does your fossil link against zlib-1.2.12? I think I'm facing the same problem.
In my case, I had the luxury of having fossil on another machine that still produced working tarballs and found out that the working and non-working tarballs are identical except for the gzip crc at the end.
I think this is caused by the fact that the crc is treated as a signed integer in gzip.c while zlib wants an unsigned one. The following patch to fossil fixes the issue for me (and doesn't break the behavior with zlib-1.2.11).
Index: src/gzip.c
==================================================================
diff --git a/var/tmp/gzip~orig.c b/home/luna/museum/fossil-scm/src/gzip.c
index 7f4ca36..730f1bc 100644
--- a/var/tmp/gzip~orig.c
+++ b/home/luna/museum/fossil-scm/src/gzip.c
@@ -31,7 +31,7 @@
*/
struct gzip_state {
int eState; /* 0: idle 1: header 2: compressing */
- int iCRC; /* The checksum */
+ uLong iCRC; /* The checksum */
z_stream stream; /* The working compressor */
Blob out; /* Results stored here */
} gzip;
Index: src/zip.c
==================================================================
diff --git a/var/tmp/zip~orig.c b/home/luna/museum/fossil-scm/src/zip.c
index 9af5859..7612b5d 100644
--- a/var/tmp/zip~orig.c
+++ b/home/luna/museum/fossil-scm/src/zip.c
@@ -257,7 +257,7 @@ static void zip_add_file_to_zip(
int nameLen;
int toOut = 0;
int iStart;
- int iCRC = 0;
+ uLong iCRC = 0;
int nByte = 0;
int nByteCompr = 0;
int nBlob; /* Size of the blob */
(19.1) By Stephan Beal (stephan) on 2022-04-20 09:53:38 edited from 19.0 in reply to 18 [link] [source]
I think this is caused by the fact that the crc is treated as a signed integer in gzip.c while zlib wants an unsigned one.
Nice detective work :).
The following patch to fossil fixes the issue for me (and doesn't break the behavior with zlib-1.2.11).
An equivalent patch has been applied to the trunk. Please try it out and report any issues. "Works for me," but it also did before.
One detail about this bugs me, though: the sizeof(long)
is guaranteed to be at least as large as sizeof(int)
but is not required to be larger. Many people assume long is 64-bit, but that's very often not the case in C. The crc in question is written to the archive using only 32-bits, so on systems where long is 64-bits (like the one this message is being typed on) the crc will be truncated. Whether or not that will lead to a bug is unclear. It "works for me" but really should be tested on a wider variety of systems and repositories.
Edit: on the other hand, crc32()
explicitly generates a 32-bit checksum, so it's presumably taking care of the case that sizeof(long)>4
.
(20) By Luna Gräfje (LunaLikesSpace) on 2022-04-20 10:30:57 in reply to 19.1 [link] [source]
This fixes my problem. Thank you for the quick response.
I think the reason why zlib has a long there is that int is only guaranteed to be 16 bits long and if you want a data type that will be able to hold a 32 bit crc on every single platform that has a standards compliant C compiler, you have to use long (uint32_t might not be defined if the platform doesn't have a data type that is exactly 32 bits large (though like CHAR_BIT > 8, I've never seen that in practice)).
The problems come in if long is actually more than 32 bits large and a "negative" signed 32 bit number gets converted to it (0x80000000 becomes 0xffffffff80000000). What I do not understand is why it seemed to work with the previous zlib version (and why it still mostly worked with the new one if the buffer being crc'ed was not extremely small) but I've given up on trying to understand crc implementations a while ago :P
(21) By ET. on 2022-05-03 08:11:14 in reply to 18 [link] [source]
This indeed does fix my problem.
After going through hell stress testing my ram as suggested and buying a new ssd hard drive thinking this is some hardware issue, I'm relieved that it's not.
I can also confirm the unreleased commit upstream Correct incorrect inputs provided to the CRC functions fixes this issue without this patch.
The previous releases of zlib were not sensitive to incorrect CRC
inputs with bits set above the low 32. This commit restores that
behavior, so that applications with such bugs will continue to
operate as before.
However the commit message seems to imply that other apps are at fault, so is fossil the buggy one here?
(22) By Luna Gräfje (LunaLikesSpace) on 2022-05-03 09:06:24 in reply to 21 [link] [source]
so is fossil the buggy one here?
I would say so. crc32() wants you to pass the result of a previous crc32() call as the first parameter but the unsigned long -> int -> unsigned long conversion that previously happened changed the result to a value that crc32() would never return.
Consider this program
#include <stdio.h>
int main(int argc, char** argv) {
(void) argc, (void) argv;
printf("%016lx\n", (unsigned long) ((int) 0x7FFFFFFFUL));
printf("%016lx\n", (unsigned long) ((int) 0xFFFFFFFFUL));
return 0;
}
which prints the following on my machine
000000007fffffff
ffffffffffffffff
i.e. converting something with 32 bits from unsigned long to int and back to unsigned long works fine if it doesn't have bit 31 set but in cases where the bit is set, the higher 32 bits get filled with ones (but I think this is actually implementation defined).
(23) By Stephan Beal (stephan) on 2022-05-03 09:36:40 in reply to 21 [link] [source]
so is fossil the buggy one here?
As Luna demonstrates, yes, but fossil's misbehavior was apparently hidden by what may possibly be described as a long-standing bug in zlib. When zlib was corrected, the bug in misbehaving clients was revealed. An unfortunate mixup for all involved parties.