Fossil Forum

Git trying to move from SHA-1
Login

Git trying to move from SHA-1

Git trying to move from SHA-1

(1) By Joel Dueck (joeld) on 2020-02-04 14:58:26 [link] [source]

Likely of interest to Fossil authors/users:

Article on LWN:
https://lwn.net/SubscriberLink/811068/cfeb6a67b8dfbe47/

Discussion on HN:
https://news.ycombinator.com/item?id=22233295

(Some mention of Fossil on the HN post)

Nice to know Fossil has had this problem licked for a couple of years now.

(2) By Stephan Beal (stephan) on 2020-02-04 15:21:15 in reply to 1 [link] [source]

From the article:

A new version of Git can be made with a different hash algorithm, along with a tool that will convert a repository from the old hash to the new.

That's appalling. Fossil's implementation doesn't require a conversion. A single repo can work with both formats, without requiring any sort of conversions or mappings between the two hashes, as the article later goes on to mention:

For blobs, this tracking will happen through the maintenance of a set of translation tables; given a hash generated with one algorithm, Git will be able to look up the corresponding hash from the other.

Also:

With a simple command like:

git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb-subtrees \
   	--liability-waiver=none --use-shovels --carbon-offsets

a user can leave SHA‑1 behind (note that the specific command-line options may differ)

i really hope that's intended to be sarcastic.

In mid-January [2020], carlson posted the first part of this transition code, which clearly only solves part of the problem:

First, it contains the pieces necessary to set up repositories and write but not read extensions.objectFormat. In other words, you can create a SHA‑256 repository, but will be unable to read it.

The article says he's been working on that problem since 2014. The implication there is that git is braindeadedly complicated and/or nigh unmaintainable. Richard single-handedly had fossil whipped into SHA-256 shape in a matter of days or, at most, weeks.

(3) By anonymous on 2020-02-04 16:21:32 in reply to 2 [link] [source]

The article says he's been working on that problem since 2014. ... Richard single-handedly had fossil whipped into SHA-256 shape in a matter of days or, at most, weeks.

Actually, Fossil skipped over "SHA-256" and went to "SHA-3".

SHA-256 is a member of the SHA-2 family. which was first published in 2001 in the draft FIPS PUB 180-2.

SHA-3 was selected in a competition that ended in 2012.

If I recall correctly, Richard added SHA-3 to Fossil in 2014, putting Fossil 6 years ahead of Git in moving away from SHA-1 (and years more getting to SHA-3). (So far as I can see, the process to start choosing SHA-4 has not yet started.)

(4) By Richard Hipp (drh) on 2020-02-04 16:50:53 in reply to 3 [link] [source]

If I recall correctly, Richard added SHA-3 to Fossil in 2014...

Thank you for that vote of confidence! :-) But, no, I didn't add SHA3 until the SHAttered attack appeared in early 2017. The complete check-in history can be seen here:

(5) By anonymous on 2020-02-04 16:52:15 in reply to 2 [link] [source]

That's appalling. Fossil's implementation doesn't require a conversion.

That was brought up in the linked discussions. Apparently, the Git people decided that not converting would create a "weird security model."

In the discussions leading up to Fossil adopting SHA-3, the thinking, at the time, was that creating a modified copy of a file that matched the original's SHA-1 hash was still impractical enough. Also, as I recall, forward propagation of post-commit changes, via delta-encoding, would be unlikely as the hashes are computed before delta-encoding, so the result, after delta-decoding, would likely not match its expected hash.

(6) By Richard Hipp (drh) on 2020-02-04 17:03:18 in reply to 2 [source]

That's appalling. Fossil's implementation doesn't require a conversion.

This is a key point, that I want to highlight. I'm sorry that it wasn't made more clear in the LWN posting nor in the HN discussion.

With Fossil, to begin using the new SHA3 hash algorithm, you just upgrade your "fossil" binary. No further actions, workflow changes, disruptions, or thought are required on the part of the user.

  • Old check-ins with SHA1 hashes continue to use their SHA1 hash names.
  • New check-ins automatically get more secure SHA3 hash names.
  • No repository conversions need to occur
  • Given a hash prefix, Fossil automatically figures out whether it is dealing with a SHA1 or a SHA3 hash
  • No human brain-cycles are wasted trying to navigate through a hash-algorithm cut-over.

Contrast this to Git, where a repository must be either all-SHA1 or all-SHA2. Hence, to cut-over a repository requires rebuilding the repository and in the process renaming all historical artifacts -- essentially rebasing the entire repository. The historical artifact renaming means that external links to historical check-ins (such as in tickets) are broken. And during the transition period, users have to be constantly aware of whether they are using SHA1 or SHA2 hash names. It is a big mess. It is no wonder, then, that few people have been eager to transition their repositories over to the newer SHA2 format.

(7) By Stephan Beal (stephan) on 2020-02-04 17:23:41 in reply to 6 [link] [source]

No further actions, workflow changes, disruptions, or thought are required on the part of the user.

IIRC we had one user who's organization-internal processes expected/required SHA1 hashes, so the upgrade broke some scripts, but that sort of "outer rim" fallout was inevitable.

The level of collective disruption when git transitions will be fairly epic. Many verbose and convoluted HOWTOs will be published about it, while fossil's equivalent docs simply say "upgrade your fossil binary."

(9) By anonymous on 2020-02-04 20:43:13 in reply to 6 [link] [source]

The historical artifact renaming means that external links to historical check-ins (such as in tickets) are broken.

I recall seeing, in one of the linked discussions, that the conversion also creates a table mapping old, SHA-1 hashes to the new, SHA-2-256 hashes. So, I would expect that old links will continue to work.

(10) By anonymous on 2020-02-05 02:15:59 in reply to 2 [link] [source]

git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb-subtrees \ --liability-waiver=none --use-shovels --carbon-offsets

missing --shatter-porcelain

Let's be fair, Git is facing a gigantic use-base, change of this scale will be tectonic no matter which approach taken (explicit convert or transparent dual). I guess/hope, they are weighing their options. Then, there're third-party tools relying in differing ways upon Git's (or libgit2) output.

Even with Fossil's easy way, some users did experience hickups (old binary working with a new repo).

But, Git's huge advantage is that the issues might prioritize themselves due to sheer volume, and just as large army of eager devs should be able to tackle them sooner than later. Good luck!

(11) By Stephan Beal (stephan) on 2020-02-05 07:28:29 in reply to 10 [link] [source]

Let's be fair, Git is facing a gigantic use-base, change of this scale will be tectonic no matter which approach taken

That's absolutely true and i do not for a moment envy their scale, nor the wide-scale collateral fallout in toolchains based on git.

But, Git's huge advantage is that the issues might prioritize themselves due to sheer volume, and just as large army of eager devs should be able to tackle them sooner than later.

And i do wish them all the best in that effort, but suspect that this is a case of "too many chefs would spoil the brew." This is a type of problem in which Strong Opinions are voiced for/against any given algorithm and much time is spent bickering and bikeshedding. Having a single project lead, like Fossil does, who simply dives in and Gets it Done bypasses all of that because, though it's often entertaining to endlessly debate the merits and downsides of algorithms X, Y, and Z (admittedly, bikeshedding is practically my second hobby), few people are willing to argue against real, working code.

(8) By Warren Young (wyoung) on 2020-02-04 19:05:28 in reply to 1 [link] [source]

The timing of this move is probably relevant, with the SHAmbles attack being only about a month old now. We've been predicting that SHA-1 would become increasingly unreliable since SHAttered. I'm not enough of a cryptographer to go saying SHA-1 is dead, but there's plenty of tea leaves for the likes of me to read.

For one thing, all of the major X.509 certificate producers and consumers have been recommending that SHA-1 not be used for signing TLS certs for quite some time.

Blockchain applications are quite a different thing, but it's probably only a matter of time before someone shows a way to forge a Git commit cheaply enough to be worthwhile for some projects.