Fossil & Git as "blockchains"

(1) By Warren Young (wyoung) on 2020-10-01 19:45:47 [source]

The recent HN thread on Fossil got derailed in part because of my use of the term "blockchain" to refer to Git. This is the second time this has happened to me, so despite drh's justification for the term I tried to dig into the "why" and came across this discussion on SO.

Distilled, Git is not a blockchain because:

There is no distributed consensus component to Git commits. If cryptocoins worked like Fossil or Git, anyone with a commit bit could forge arbitrary amounts of money. While it is true that not all blockchains are cryptocoin schemes, there's a common conflation in the concepts, a fact we have to keep in mind if our purpose is communication of ideas.
Because of #1, blockchains allow partial history while maintaining strong trust that a given block isn't forged; you don't have to re-verify the whole chain to be certain your small slice of it is trustworthy.

Not only can you not prove that a given Git commit "belongs" in the tree by recalculating some element of it at your own expense in a way that proves the whole block belongs in the chain, it may not be possible to understand certain arbitrary commits without reference to other parts of the repo due to delta compression and such.

Bitcoin's proof-of-work is just one way to do this verification. A better method for a DVCS like Git would be to require that commits be signed, backed by a trusted PKI. This is very much optional with Git. (And Fossil!)
Git allows history rewriting. Blockchains are usually understood to be a type of distributed ledger, which requires that a committed block can never be changed.

These problems don't apply equally to Fossil. In some cases, Fossil has a better story (e.g. distributed ledger), while in other areas, Fossil is an even worse match for the term (e.g. partial trees) than Git is.

I think we should abandon the use of this term in any documentation, at the very least where Git is involved, based on the voting in that SO answer. I think it's okay to continue to use it in Fossil-only tutorial material where the document is trying to draw analogies, but it doesn't hold up at a deep level for Fossil, either. The term is ultimately confusing concepts, and it has a whiff of marketing hype around it besides.

Another consideration is that cryptocoins are inherently political, as is anything that has considerable economic power. Do we want to be tying debates about Fossil to concepts that are likely to inspire emotional responses rather than logical ones?

Cryptocoins and the conflated concept of blockchains are also associated with a lot of bad actors. Angie: "Fossil is a blockchain." Bobby: "So, like that stuff ransomware thugs demand?"

The term "Merkle tree" is more accurate and less likely to mislead readers or cause debate derailment.

"Distributed ledger" is less of a stretch for Fossil than for Git, though with the availability of purge and shunning, I think we should limit that to tutorial material as well.

Another way to come at this problem is to ask, "What is the minimum it would take to make Fossil into a 'blockchain'?" Based on the above, I think we'd have to say:

Mandatory PGP clearsigning of commits.
Provide good answers to the PKI and distributed identity problems.
Allow partial clones.

In the limit, there should be a fourth option beside the current zip/tarball/sqlar downloads: clone a given commit ID, resulting in a self-contained repo that can make strong assertions about parentage, commit signing, etc., containing only the minimum info needed to build that single version. This would be a nice complement to reproducible builds: not only do I get the same binary you do, I have the unforgeably signed repo artifacts that produced that build.

(2) By sean (jungleboogie) on 2020-10-02 03:53:54 in reply to 1 [link] [source]

First they come after the "serverless" term drh was using for SQLite and now they're coming after "blockchain".

Ultimately, what is it that needs to be described for Fossil?

I understand the SO replies/answers and tend to agree with them and your conclusions.

In a blockchain implementation, every block is verified independently multiple times before it is added to the blockchain.

Is this where/why you're saying PGP clearsigning would be a requirement? The answer goes on to indicate that signing is possible, but only one person is required before it's committed - which, AFAIK, is similar to Fossil.

You do have a good point about crypto currencies, politics, and ransomware thugs. All those things can be conflated to mean not exactly what we're talking about or what we want to convey with this project.

(3) By Warren Young (wyoung) on 2020-10-02 04:30:04 in reply to 2 [link] [source]

It depends on what you’re willing to accept as verification or consensus.

It occurs to me that we could have a classic BFT consensus algorithm where a majority of voters is needed before any commit hits the blockchain. For 5 voters, the author and two others with voting capability would have to sign the commit before its accepted.

This would implicitly get us a code review feature, incidentally.

I’m not actually arguing for making Fossil into an incontestable blockchain system. I’m listing the requirements (as I see them) to show how far off we are, according to those closest to that tech.

(4) By Richard Hipp (drh) on 2020-10-02 11:21:19 in reply to 1 [link] [source]

My thoughts:

I do not object to deprecating the use of the term "blockchain". It is sad to lose such a colorful term, but perhaps helpful to avoid unnecessary confusion.
PGP clearsigning currently depends on an external program (PGP or GPG). If somebody can come up with a way to clearsign using APIs available in OpenSSL, that would be a step toward making signed commits the default. A method to verifying a clearsigned manifest using OpenSSL would also be helpful.
"Merkle tree", while accurate, is not a beautiful term. I would prefer something different, if possible. But if Merkle tree is the best available nomenclature, then that is what we should use.

(5) By Dan Shearer (danshearer) on 2020-10-02 14:32:22 in reply to 4 [link] [source]

Richard Hipp (drh) on 2020-10-02 11:21:19 :

"Merkle tree", while accurate, is not a beautiful term. I would prefer something different, if possible. But if Merkle tree is the best available nomenclature, then that is what we should use.

This is marketing, and just like the required Fossil response to the SHAttered attack, just because it is marketing doesn't mean it is unimportant.

Descriptions such as "merkle tree" or "hash tree" or "DAG" don't distinguish Fossil from many other systems. Instead, think of the term "wiki": I propose that Fossil is as different from other version control systems as the wiki was to all previous editable web systems. Fossil is different because of its design goals and implementation choices rather than novel algorithms (like "blockchain") or mathematics (like "modified Merkle tree").

As to what the category of Fossil-like software systems should be called, hmm... Paeleoartifactual? Hippocratic? Plain "paeleo" could work, because it is simple and clear.

Regardless of this I think significant releases of Fossil should be given dinosaur names. This is about marketing, and fun.

Dan

(8) By Stephan Beal (stephan) on 2020-10-02 15:25:14 in reply to 5 [link] [source]

Regardless of this I think significant releases of Fossil should be given dinosaur names. This is about marketing, and fun.

i really like that idea but wouldn't limit it to just dinosaurs. There are numerous examples of commonly-recognized non-dino fossils, like ferns, horseshoe crabs, and the petrified forest of Arizona.

Fossil 2.13 Triceratops

is the closest semantic match for 13 which immediately comes to mind.

(9) By Dan Shearer (danshearer) on 2020-10-02 16:19:08 in reply to 8 [link] [source]

Stephan Beal (stephan) on 2020-10-02 15:25:14 :

i really like that idea but wouldn't limit it to just dinosaurs. There are numerous examples of commonly-recognized non-dino fossils, like ferns, horseshoe crabs, and the petrified forest of Arizona.

Yes for sure, and trilobites etc.

(My favourite fossil is the opalised Addyman plesiosaur , which was picked out of its stone behind glass at my local museum over decades as I grew up.)

Dan

(11) By Warren Young (wyoung) on 2020-10-02 17:32:12 in reply to 8 [link] [source]

Triceratops is the closest semantic match for 13

What we actually want is a triskadekatops, but that doesn't exist. Nor is there a dinosaur on my chosen list that begins with "Tris".

So, I filtered for names with 13 characters and came up with 128 of them.

My current favorite is graciliraptor, from gracile.

(12) By Warren Young (wyoung) on 2020-10-02 17:36:57 in reply to 11 [link] [source]

We can keep this up for some time:

 $ ggrep -Po '^\w{14} - ' dinos | wc -l
       69
 ...15...49 choices
 ...16...30
 ...17...14
 ...18...6
 ...19...1

And you may now be asking, what name will we be forced — forced, I say! — to use for Fossil 2.19? Carcharodontosaurus. I think it means it chews through const char*, but my Latin's rusty.

(7) By Roy Keene (rkeene) on 2020-10-02 15:23:39 in reply to 4 [link] [source]

I've posted a patch to Fossil a few times in the past that allows clearsign to use the "openssl smime" command for validation [0]. The OpenSSL API looks to be not too difficult to do for this if you look at the "openssl smime" command itself 1.

A very simple example of the "openssl smime" command: https://rkeene.org/viewer/tmp/smime-example.txt.htm

[0] https://rkeene.org/viewer/tmp/fossil-6ca400a315-add-smime-support-1rsk.diff.htm
[1] https://github.com/openssl/openssl/blob/master/apps/smime.c

(10) By Warren Young (wyoung) on 2020-10-02 17:17:17 in reply to 4 [link] [source]

It is sad to lose such a colorful term, but perhaps helpful to avoid unnecessary confusion.

As we're seeing in this very thread, even we can't agree on what "blockchain" means.

Rather than remove www/blockchain.md, I think we should fold the results of this thread into it and retitle it "Is Fossil a blockchain?" It should then summarize the broad consensus of what the term means and how that applies to Fossil.

The developer summary to the article's new title question is "Naaah, not really; not today, anyway." The middle-management summary is "Kinda yes, kinda no." The executive summary is, "What would you like the answer to be?" :)

clearsign using APIs available in OpenSSL

That's the easy part of the problem. The hard part — constituting 95%+ of the problem — is setting up a PKI.

I expect that those knowledgeable in the topic are rolling on the ground laughing now. "Oh, just set up a PKI? Aaaaahhahahhahha!"

As a non-expert, I can list several obstacles and problems we'd have to overcome:

How do we distribute the public keys? Easy (and wrong) answer: in the user table. But we purposely don't sync the user table to normal users. Do we partially reverse that decision, sending only subsets of user table data? What happens then if a previously anonymous or low-privilege user gets an Admin or Setup bit and does fossil conf sync all? Does the remote Fossil instance somehow prevent the partial user table from overwriting the real user table, and if so, how? And if not, did we just create a foot-gun?
Do we punt the TOFU problem with public key assertion trust to HTTPS or SSH, which have their own TOFU problems? In other words, if Fossil asserts that wyoung's public key is ABCD1234.... then does your Fossil just have to accept that blindly, hoping it wasn't MITMd in sync? Is there no secondary way to prove that wyoung's public key is trustworthy? This leads us to...
How do we avoid collapsing the distributed nature of Fossil to a centralized model in doing all of this? We can't follow HTTPS's CA model here. Even if someone wanted to set up a Fossil CA for distributing certified public keys for users, many Fossil users wouldn't want to use it, being the herd of cats that we are. There's PGP's web of trust, but that's fairly impractical. Do we just delegate trust along cloning lines, and give up on third-party verification?
How do we do key revocation? (The answers to this probably tell you that putting the keys in the user table are a bad idea to start with: one user will not necessarily have one keypair forever and ever.)

That'll do to be getting started with. :)

Merkle tree is not a beautiful term.

Ralph Merkle is worth honoring, but there's also "hash tree," which has the nice property that it's descriptive to one suitably versed in the arts.

(15.1) By Warren Young (wyoung) on 2020-10-06 13:03:42 edited from 15.0 in reply to 4 [link] [source]

I do not object to deprecating the use of the term "blockchain".

All right, this is ~~on trunk~~ published now.

I believe the new version of the doc reflects this thread's results reasonably well, without getting into too much of the speculative parts. As and when Fossil gets the features we've discussed here, we can modify the doc to suit.

(EDIT: Moved the commit to a branch until we can settle this debate.)

(25) By Warren Young (wyoung) on 2020-10-07 00:30:11 in reply to 15.1 [link] [source]

The second attempt at this is now up.

(26) By Warren Young (wyoung) on 2020-10-08 08:56:24 in reply to 25 [link] [source]

I've expanded it some more. You will now note that it makes its arguments on purely technical grounds. I've removed the social arguments and added more and better tech arguments.

This is ready to merge to trunk, as far as I'm concerned, but I thought that before, and criticism came in half an hour of me doing so, three days after I thought the issue was settled. :)

Therefore, I invite further commentary, and I remain open to continued improvement to this doc before we merge it.

Incidentally, the first section is basically just an improved version of the original doc. Everything between that and the conclusion is there to show why that limited view is either inaccurate or not helpful in our overall goals.

(6) By Roy Keene (rkeene) on 2020-10-02 15:15:20 in reply to 1 [link] [source]

As a genuine blockchain developer, I would say that Fossil is a blockchain but not a Distributed Ledger.

To be a blockchain, in my view, you must be able to validate the entire preceding chain by validating a single node and that's it. It's often used to mean more than that, however.

It's often used with the implication that there's some OTHER mechanism that lets you know which node is the "latest" one (the consensus mechanism). Additionally, it's implied that this other mechanism gives you an answer such that if, in the future, you ask again and get a different "latest" node that the previous answer is a child node in the graph of ancestors. These are the properties that make for a good distributed ledger/database.

These properties are what some people expect when talking about "blockchains", but it's possible to build useful systems out of blockchains that lack these properties and those systems are nonetheless still based on blockchains.

I spent a lot of time working with and under different consensus mechanism regimes (the mechanism by which it is decided which node is currently the latest one). Some mechanisms are more automatic, using game theory to incentivize people to come to a single answer as a group and discourage changing that answer ever. Some use a single definition that everyone agrees on ahead of time, and then some provable property that is maintained as part of the nodes to encourage converging on a stable answer. Yet others are manual -- people agree that the current definitive version is available from some trusted source, for now.

Fossil's consensus mechanism is most like the latter one, an external social one.

However, note that this mechanism does not uphold the requirements I talked about earlier with regards to the latest node being part of the ancestry. That's because it's possible to fork a Fossil repository at some point in time and have changes to both sides of the fork, then for the consensus to change from one side of the fork to the other -- losing some data. This would be bad for a distributed ledger/database but can be good for software development where the future yields new information about what should have been done in the past.

(13) By Warren Young (wyoung) on 2020-10-02 19:07:27 in reply to 6 [link] [source]

To be a blockchain...you must be able to validate the entire preceding chain by validating a single node

Can you restate that in practical terms?

I mean, here's the manifest for Fossil artifact abcd:

   https://fossil-scm.org/fossil/raw/abcd

What can you prove about the Fossil tree it came from given only the HTTPS body content from that call? I don't see that this lets you "validate the entire preceding chain."

You can check that the MD5 manifest checksum is right, but that's almost trivial in terms of what it proves about the manifest.

You could bend the definition of "single node" to include all of the referenced file artifacts. So, let's go pull the first one:

  https://fossil-scm.org/fossil/raw/29c5476ae4

That artifact happens to be a legacy SHA-1 artifact, and indeed, a SHA-1 over that node's contents comes to 29c5476ae4..., but I still don't see that validates the entire preceding chain, nor does repeating this for the other F cards in the manifest.

If we do all of this to the manifest pointed to by the P card, now we're beginning to prove something, but are we not sliding down the repo-cksum slope now?

By your definition, I would think repo-cksum would be pointless if Fossil were a true blockchain implementation. Validate one node, and you know the rest are good, too, right?

which node is the "latest" one

That gets us into the CAP theorem, and I think Fossil will always remain an AP system. In this case, it means "tip" can never be guaranteed to mean the same thing in any arbitrary pair of clones.

If "C" is necessary for something to be a blockchain or distributed ledger, then Fossil will never be one, because we're not willing to do what it takes to get to CA or CP.

Fossil's consensus mechanism is most like the latter one, an external social one.

There are two different consensuses we're talking about here:

"Current" in the local clones.

This is the CAP theorem stuff, and Fossil is "eventually consistent" for all nodes that sync regularly. But because Fossil doesn't require anyone to sync, and indeed allows people to go off and never sync again, we can never achieve full consensus. I expect I can find a clone of fossil-scm.org/fossil on one of my old VMs that'll yield Fossil 1.x on "fossil up release" if I don't let it connect to the Internet, for example.
Whether artifact abcd1234 is legitimate.

This is the aspirational blockchain stuff, which I believe we don't have yet.

(14) By Roy Keene (rkeene) on 2020-10-05 17:27:42 in reply to 13 [link] [source]

Can you restate that in practical terms?

I mean, here's the manifest for Fossil artifact abcd:

https://fossil-scm.org/fossil/raw/abcd What can you prove about the Fossil tree it came from given only the HTTPS body content from that call? I don't see that this lets you "validate the entire preceding chain."

If we assert that the manifest (node) "abcd" is valid, then we can also assert that the manifest (node) "d685096f99a977909bcb8931a55d5cba2e02819c" is valid -- and also that the specific entries had specific values ([F www/webui.wiki]=9efb7f22521c203fd35e4dd1533ac203a8330a99, [F www/wikitheory.wiki]=16ff4640b1d9c9950f12a43cc1958687378204ac, etc). Since we know that "d68509...." is valid, we also know that "1e2d76ecb204347c580f1571e384f4bed5be5845" and "d5575d14dee9244c43a80c4b254ed4116012397a" are valid nodes in the graph, and we can chase this down until we complete the entire chain to the very first node.

That is, given a single node's validiy we can validate the entire preceding chain of events that led there just by confirming hashes locally.

(16) By jamsek on 2020-10-06 04:04:52 in reply to 14 [link] [source]

I think, as Roy explains, Fossil is a blockchain; the confusion appears
to stem from the common misapprehension when conflating Bitcoin with a
blockchain. The seminal work¹ of Haber and Stornetta (1991) explicates
fundamental features that constitute a blockchain, which are inherent in
Fossil. As Roy explains, each node (i.e., manifest) ensures the
authenticity of the values of the previous node. And each instance of a
repository tracks at least part of the chain (i.e., a series of
manifests), and collectively guarantees that history cannot be
manipulated after the fact (unlike other DCVSs), which preserves the
integrity of the entire blockchain.

¹ And the follow-up paper by Bayer, Haber, and Stornetta (1993) that
introduced the use of hash trees to improve efficiency.

(18) By Warren Young (wyoung) on 2020-10-06 14:07:24 in reply to 16 [link] [source]

Roy explains, Fossil is a blockchain

As far as I'm concerned, he only asserts that. He hasn't convinced me yet.

In fact, his own argument undermines it, according to the public consensus understanding of the term on Wikipedia: "A blockchain is a decentralized, distributed, and oftentimes public, digital ledger..." Roy also says Fossil isn't a distributed ledger, so doesn't that mean Fossil isn't a blockchain?

That part of Roy's argument I get, and my new version of blockchain.md explains this view in detail: without the ability to have a consistent view of the tip of the Fossil DAG, we cannot indeed call Fossil a distributed ledger.

If you visit your bank's web site and press reload on the browser twice, your balance is highly unlikely to change. That's what a proper accounting ledger does: it gives you the bottom-line truth.

Whereas if I visit Reddit and look at the balance of my fake Internet points, posts on threads a week dead are likely to flicker ±2 points from one reload to the next, seconds apart. People are clearly not changing their votes second-to-second on a post that old. The system's just getting different answers because my browser's being load-balanced to different sources of truth each time. Reddit uses AP-mode accounting, which isn't what I'd call a "digital ledger."

The thing is, Fossil is also an AP-mode system: you can't say "fossil info tip" and be guaranteed to get the same answer I do if I say that here.

Roy's right: Fossil is not a digital ledger. Doesn't that mean Fossil is also not a blockchain?

The seminal work of Haber and Stornetta (1991)

...speaks of timestamps, another thing Fossil doesn't guarantee.

The abstract of your referenced paper says, "...it is infeasible for a user either to back-date or to forward-date his document, even with the collusion of a time-stamping service," whereas Fossil's docs correctly point out that "Timewarps can also happen [in Fossil] due to misconfigured system clocks...they are very confusing and so best avoided."

The paper goes on to say, "What is needed is a method of time-stamping digital documents with the following two properties. First, we must find a way to time-stamp the data itself, without any reliance on the characteristics of the medium on which the data appears, so that it is impossible to change even one bit of the document without the change being apparent. Second, it should be impossible to stamp a document with a time and data different from the actual one." Fossil provides the first, but not the second.

(Incidentally, I believe the word "data" at the end of the quote is a typo in the original paper. I think they meant "date" here, but it's kind of immaterial, since again, Fossil can't provide that guarantee.)

Doesn't this prove Fossil is not a blockchain in the Haber & Stornetta sense?

As an amuse-bouche, I offer a fun quote from the 29-year-old paper: "There are practical implementations of hash functions, for example, that of Rivest [19], which seem to be reasonably secure."

Their reference 19? MD4. Oh, sweet summer child... 😂

Moving on, the paper says, "The second improvement makes use of digital signatures..."

...Which Fossil doesn't require, a matter we brought up previously in this thread. Until we have something like mandatory clear-signing of commits, Fossil is not conforming to this idea in this paper, either.

The paper's idea on use of signatures is inverted from the optional feature currently in Fossil, though. You can read their proposal at the end of page 102, but I'll recast it in terms of Fossil here. On commit, they say the remote Fossil server should apply its own timestamp to the commit and send back a digitally-signed copy of the commit hash (a.k.a. artifact ID) plus the timestamp to prove that a) it received the hash, and b) that the timestamp it applied is correct. The client then stores this hash + timestamp document in case the server later loses these proofs.

Together, this means a client can later prove that it created the the commit, because only the server could have signed the confirmation reply, the timestamp can't be tampered with by the same fact, and the hash matches the content of the commit. Therefore, to the extent that we trust the server's ability to timestamp documents properly and that its signature is valid, the commit is also valid and occurred on the claimed date.

It's a good idea, and it'd be useful to have in Fossil, but it isn't there today.

The rest of the paper is concerned with the question of whether we can trust the integrity of the remote Fossil server. Since this reply is already quite long, and I don't think this question has anything to do with what we've discussed so far, I'm going to send this as-is and reply later if further reading sheds any useful illumination on the matter.

(23) By Warren Young (wyoung) on 2020-10-06 17:35:01 in reply to 18 [link] [source]

The second half of the Haber & Stornetta (1991) paper doesn't apply well to Fossil at all.

It isn't a complete loss. The paper indirectly points out that the fact that each new Fossil commit contains the hash of its parent(s) means you can prove — in the mathematical sense — that the descendant was created afterward. In other words, we can always detect backdating in Fossil, as currently implemented.

(But only if it's bad enough to create a timewarp. If the commit rate is 1 per hour, then a backdating of half an hour can't be detected through this property of Fossil's commit hash tree. It can reliably detect a backdating to last week.)

The paper outlines two schemes to guarantee time stamp integrity. In the first, described in §5.1:

"The [server] issues signed, sequentially numbered time-stamp certificates." You can only make this first scheme of theirs work with a single centralized server; it can't work in a Fossil-style DVCS scheme.
"...the client number ID_n..." You can squint and say the Fossil manifest's U card is this ID. Either you treat the UTF-8 string of characters as a "number", or you use that to look up a record ID in the user table. Only Setup level clones have the second option in Fossil, but let's hand-wave this aside for now...
"In the scheme just outlined, clients must keep all their certificates." Yes. Commit receipt certificates which Fossil currently doesn't issue.
"…the [server] cannot forward-date a document, because..." and then it goes on to make an argument based on the fact that future commits (to recast this paper's terms into Fossil-speak) will be coming in, and they will have timestamps that will conflict if the Fossil server certifies that its forward-dated timestamp is correct. This argument incorrectly assumes:
- The NTP configs cannot be wrong on both ends by a near-equal amount.
- Commits are arriving at a rate that the forward-dating of the commit will always be caught.
- That the sequential commit ID value exists, which we shot down in point 1 above

So, their first scheme is insupportable in the context of Fossil, because the authors' premises crumble in our world.

Then their §5.2 goes on to describe a second scheme, which is completely irrelevant to Fossil, as far as I can tell. Anyone who believes otherwise is welcome to show how you'd redesign Fossil to accommodate it.

It falls down on this key bit here: "Our client sends her request(y, ID) to each of these clients."

In Fossil-speak, they're saying that in order to cross-check that everyone is using the same notion of "time" — i.e. that all our clocks are properly NTP-synchronized — that just before making a commit, she should take the commit ID, use that as the seed into a PRNG, take that output and transform that somehow into a list of "client IDs", and then send each one of those a demand that it produce a timestamp. Because of the way the required reply is constructed in this paper, the client is allowed to make the commit only if all of those clients pass her challenge.

So what this means in Fossil terms is that every committer have a list of all other committers — which they don't in Fossil, on purpose — and that they be able to send their Fossil instances, in real time, a challenge which they must submit to before that client is allowed to commit, which they can't on purpose!

Even if you take the "UTF-8" argument above (i.e. "drh" is 0x647268) normal commiters have no way to contact his Fossil instance to demand that it produce a timestamp certification. And even if we did cast PII and security concerns to the wind, how do we get enough of these Fossil instances online in a way we can contact them in the first place, so that there are enough respondents to provide the paper's desired guarantees?

This paper suffers from a typically 1991 viewpoint: all Internet hosts are public-facing, so everyone can just stand up a server and let the world demand service of them. That hasn't worked since NAT firewalls became common.

This second scheme is the very antithesis of a Fossil-style DVCS.

Not that it matters for the purposes of this thread, because it's the first scheme that has the seeds of "blockchain" in it. The second scheme is presented as a pure alternative, not an extension of it.

The bottom line is that we can't use either scheme this paper presents to guarantee the timestamps on Fossil commits. The commit receipt idea is useful, but that's about it.

...And none of this proves that Fossil is a blockchain.

On to the 1993 paper now...

(24) By Warren Young (wyoung) on 2020-10-06 18:38:35 in reply to 23 [link] [source]

On to the 1993 paper now...

The paper's actually dated March 1992, but it was apparently published in 1993.

I found a way around the Springer paywall for it, so here we go. From here on, I'm using Markdown to quote from the paper, not from prior posts in this thread:

…the challenger of a time-stamp is satisfied by following the linked chain from the document in question to a time-stamp certificate that the challenger considers trustworthy.

That's a valuable insight I didn't get from the prior paper. They're pointing out that the scheme presented in §5.1 of the 1991 paper only works if you chase the links far enough.

In Fossil terms, this means you have to decide where to stop crawling back up the DAG to verify commits, which as I've been saying, can amount to the current repo-cksum feature. Resorting to that feels like a circular argument to me: we trust the tree because we trust the tree. The repo-cksum feature only provides an assurance of data integrity, not an assurance of the accuracy of the timestamps or the identities of those that committed artifacts to the tree.

However, there is a glimmer of an escape hatch here: if I crawl up the DAG until I find a commit that I can be certain is valid, perhaps because I, wyoung, committed it, then I can stop there, because by participating in the tree, I've essentially expressed some amount of trust in the path up to the root of the DAG, at least.

In other words, when I signed the contributor's agreement and mailed it to drh, requesting a commit bit, I brought with me the assumption that all the prior commits were valid. This may be an incorrect assumption, but I had to start with that to get to work. I wasn't going to go back in time and re-validate everything, right?

Therefore, if I chase a commit from one tip of the DAG down to one I committed, and I have some way to assure myself that it hasn't been tampered with somehow — which proof isn't especially easy to come up with in the standard usage of Fossil, but is at least possible to come up with — then I can stop there, since beyond that point, I'm already "in the game" in the thermodynamic sense.

(The three laws of thermodynamics are: you can't win, you can't break even, and you can't get out of the game.)

This only works for people like me who have commits in the tree, and it only works for documents we have ourselves modified.

How does J. Random User come along and do the same verification?

…suppose that N hash values are combined into one via a binary tree, and the resulting single hash value is widely publicized.

This is the paper's primary contribution to the corpus of knowledge: instead of the linear scheme of §5.1 in the prior paper, which relies on a centralized server to provide a monotonically increasing counter, this paper gives a scheme that substitutes Bitcoin-like contests with hashes going up the tree in a way kind of like Fossil, but only if you squint on both counts.

I don't quite see how this applies to Fossil. Indeed, it looks a lot more like the Linux kernel's "dictator and lieutenants" model, where the "global winners" from the paper are the patches that make it all the way into Linus Torvalds' Git repo.

If we view a more typical Fossil project as simply a microcosm of this global-scale distributed development effort, then the root of trust I've been seeking is simply "drh didn't pluck these bad commits out of his tree, so I trust any subtree where the hashes lead back to this central tree."

Until fossil-scm.org began redirecting all HTTP requests to HTTPS — late 2014, as best as I can tell — this amounts to "because drh says so." That's not a value-less assertion, but it's nothing at all like what we expect from a chain-of-trust style blockchain.

After the "HTTPS everywhere" movement made its way to fossil-scm.org, we can boil it down to a combination of our collective trust in the TLS certificate system and the security of the *.fossil-scm.org hosts. These are centralized claims, not distributed chain-of-trust assertions.

My point is, none of this gets us any closer to a proof that Fossil is a blockchain in a useful sense.

If the pair (D,σ) was time-stamped at a time before the signature was compromised, then the pair still constitutes a valid signature.

It's good to see this sort of thinking in this paper, in light of the fact that it and its predecessor refer to MD4 as a "reasonably secure" hash function.

In the terms of Fossil today, what it means is that we were right to change the default hash scheme to SHA3-only in Fossil 2.10 but not to bother re-hashing the whole tree under SHA3, because to the extent that we trust the timestamps on the commits, we can still be reasonably certain that the earlier commits are valid, because they come from a time when SHA1 was secure.

Our premises are a bit shaky here, because as I discussed in the prior post, we don't absolutely trust the timestamps on Fossil commits. However, I can be reasonably certain that Fossil commit [abcd] occurred sometime around March 31, 2016, almost certainly within an hour of the claimed timestamp, which is more than good enough for me to trust that nobody could have been making SHA1 collisions of the hashes in that document at that time, most of a year before SHAttered.

(17) By Warren Young (wyoung) on 2020-10-06 12:53:13 in reply to 14 [link] [source]

If we assert that the manifest (node) "abcd" is valid

...then you're completely inverting my question, following the logic from that, and claiming success.

What I asked was, here's [abcd], now what can you prove from that, given no other proofs?

What makes [abcd] valid? Who said so? How can I prove that to myself?

I'm not trying to be bloody-minded here: I'm getting into the deep question of trust that's at the bottom of the whole blockchain concept. Where's my root of trust here, the axoims at the bottom of my epistemology of Fossil?

The node [abcd] claims to be written by "drh", but I assure you, I can craft a node from "drh", too, and I can push it into the "blockchain" at fossil-scm.org/fossil, so that proves nothing. What makes makes my forged node inherently untrustworthy, but [abcd] trustworthy? In the cryptocoin analogy, I'm telling you I'm capable of forging a coin! How would you know if I did?

The reason I'm not going to do that is that I'm a trustworthy person, but are you going to believe that just because some person on the Internet wrote that on a public forum? If such claims are the root of our trust in the Fossil blockchain, then how does cryptocurrency work at all, if both Fossil and Bitcoin are blockchains?

As for the rest of your argument, I think that just amounts to repo-cksum, which isn't "validating the entire preceding chain by validating a single node", it's just validating the entire preceding chain, period, full stop. If cryptocurrencies worked like that, you'd only know if you held a valid coin by redoing the entire proof-of-work of the history of the cryptocurrency. That's an O(n²) ecological collapse from throwing the entire energy output of the human civilization into proof-of-work, is what that is.

Clearly that's not happening, so what's the essential difference that allows cryptocurrencies to not destroy the planet, merely so those using the tech can be reasonably certain they hold valid coins?

I don't think I'm drawing a false analogy to crytpocurrencies here. There must be some fundamental difference between the way they work and the way Fossil works, and if so, I think that means we can't call Fossil a "blockchain" without causing needless confusion.

"Technically right" doesn't matter if it results in judgements like "Fossil's stupid because they don't even know what a blockchain is!" It doesn't matter if we have a different concept of "blockchain" than those we're trying to persuade, because if we can't agree on the meaning of key terms, we're failing to communicate our ideas, which means we lose in the marketplace of ideas.

(19) By Roy Keene (rkeene) on 2020-10-06 14:17:07 in reply to 17 [link] [source]

...then you're completely inverting my question, following the logic from that, and claiming success.

What I asked was, here's [abcd], now what can you prove from that, given no other proofs?

What makes [abcd] valid? Who said so? How can I prove that to myself?

I'm not trying to be bloody-minded here: I'm getting into the deep question of trust that's at the bottom of the whole blockchain concept. Where's my root of trust here, the axoims at the bottom of my epistemology of Fossil?

You'll need something other than a blockchain for this, since as I said, the only thing blockchains give you are the ability to implicitly validate the ancestor nodes in the chain once you have validated a given node. I'll repeat what I said here:

To be a blockchain, in my view, you must be able to validate the entire preceding chain by validating a single node and that's it. It's often used to mean more than that, however.

So if you haven't validated [abcd] then the blockchain tells you nothing -- if you have validated that single node, however, it tells you about the entire chain of nodes from that node to the root.

The node [abcd] claims to be written by "drh", but I assure you, I can craft a node from "drh", too, and I can push it into the "blockchain" at fossil-scm.org/fossil, so that proves nothing. What makes makes my forged node inherently untrustworthy, but [abcd] trustworthy? In the cryptocoin analogy, I'm telling you I'm capable of forging a coin! How would you know if I did?

This is because cryptocurrencies don't just use a blockchain, they combine it with other mechanisms, like a consensus mechanism and a digital signature scheme of some kind.

The reason I'm not going to do that is that I'm a trustworthy person, but are you going to believe that just because some person on the Internet wrote that on a public forum? If such claims are the root of our trust in the Fossil blockchain, then how does cryptocurrency work at all, if both Fossil and Bitcoin are blockchains?

Again, cryptocurrencies are more than blockchains and offer stronger guarantees than a blockchain alone would. Additionally, they offer stronger guarentees than a blockchain + digitally signing every entry in every node.

As for the rest of your argument, I think that just amounts to repo-cksum, which isn't "validating the entire preceding chain by validating a single node", it's just validating the entire preceding chain, period, full stop. If cryptocurrencies worked like that, you'd only know if you held a valid coin by redoing the entire proof-of-work of the history of the cryptocurrency. That's an O(n²) ecological collapse from throwing the entire energy output of the human civilization into proof-of-work, is what that is.

A system outside the blockchain has to validate the "latest" node, this is a property of the blockchain.

A few other things to note here:

1. Validating work from the Proof-of-Work systems is much easier (many trillions of times easier in large systems) than computing it, and nodes do validate this work (note that this done per block/node not per transaction typically in PoW systems);
2. Not all cryptocurrencies use Proof-of-Work with Most-Worked-Chain-Wins semantics to decide which chain is the one to follow;

Clearly that's not happening, so what's the essential difference that allows cryptocurrencies to not destroy the planet, merely so those using the tech can be reasonably certain they hold valid coins?

I don't think I'm drawing a false analogy to crytpocurrencies here. There must be some fundamental difference between the way they work and the way Fossil works, and if so, I think that means we can't call Fossil a "blockchain" without causing needless confusion.

"Technically right" doesn't matter if it results in judgements like "Fossil's stupid because they don't even know what a blockchain is!" It doesn't matter if we have a different concept of "blockchain" than those we're trying to persuade, because if we can't agree on the meaning of key terms, we're failing to communicate our ideas, which means we lose in the marketplace of ideas.

Cryptocurrencies use blockchains, consensus mechanisms, and digital signatures (among others) to accomplish their goals (create atomic transactions among mutually distrusting parties) -- that does not mean that any of the parts (blockchains, consensus mechanisms, and digital signatures, among others) have all the same properties of cryptocurrencies.

I don't really care if the Fossil developers choose to refer to part of its implementation a blockchain or not, my only note was that it still is. I never made any claims that other people don't know what a blockchain is, or whether or not choosing to advertise Fossil in this way was a good idea, but this seems to be the strawman you are arguing against for some reason.

I will say this, the properties of a blockchain are useful outside of cryptocurrencies and having a short way to explain that a system has all of those properties is a useful thing.

(21) By Warren Young (wyoung) on 2020-10-06 14:35:31 in reply to 19 [link] [source]

All of that makes sense, Roy, and I'll update the document to fold some of it in.

I suspect we'll still be failing to communicate ideas clearly if we call Fossil a "blockchain", though, because I highly doubt my misapprehnsion of the terms is unique.

I think what we need the document to do is to tease these things apart and then show how they apply (or not) to Fossil.

Let us assume I do an impeccable job in my edits, that my next version of the document will be clear, unambiguous, and incontestably useful. Do we now expect that we are communicating our ideas clearly by using that word? Do we expect the world to read our blockchain.md doc before making their judgement when we use that word?

One more thing:

So if you haven't validated [abcd] then the blockchain tells you nothing

How do you do that validation in Fossil, today?

(22) By Roy Keene (rkeene) on 2020-10-06 14:44:58 in reply to 21 [link] [source]

All of that makes sense, Roy, and I'll update the document to fold some of it in.

I suspect we'll still be failing to communicate ideas clearly if we call Fossil a "blockchain", though, because I highly doubt my misapprehnsion of the terms is unique.

I think what we need the document to do is to tease these things apart and then show how they apply (or not) to Fossil.

Let us assume I do an impeccable job in my edits, that my next version of the document will be clear, unambiguous, and incontestably useful. Do we now expect that we are communicating our ideas clearly by using that word? Do we expect the world to read our blockchain.md doc before making their judgement when we use that word?

I'm certainly no expert on communication to the masses, but if your audience is technical then I think if you said that Fossil made use of blockchains to provide many of its properties that it would be helpful to explain the mechanism by which those properties are provided; But I would only do that when trying to explain the mechanism, not as part of some dedicated page which looks more like marketing and thus likely to be aimed at non-technical people.

One more thing:

So if you haven't validated [abcd] then the blockchain tells you nothing

How do you do that validation in Fossil, today?

Either manually or by a combination of manual and digital signatures (since me and 10 million of my closest friends have smartcards with X.509v3 certificates; hence my patch to fossil to use S/MIME for this since it's far and away more widely used than PGP/GPG)

(20) By jamsek on 2020-10-06 14:23:05 in reply to 17 [link] [source]

I think this might be misapplying the proof-of-work requirement—a key
feature of cryptocurrency—to a blockchain. The fundamental feature of a
blockchain is that each entry references, or validates, the previous
thus ensuring the integrity of the entire chain. Sure, you have a commit
bit, so you can commit anything you like to the repository, but you
can’t go back and change the hash of a previous transaction to present
a falsified historical record.

(27) By anonymous on 2020-10-10 01:32:37 in reply to 1 [link] [source]

For partial clones, I do believe it would be useful in some cases. The protocol seems to mostly support this, although if you do not already know the artifact IDs, it will be difficult to find them. To help with this, you might have some new commands (possibly as pragmas):

To request the most recent manifest (check-in) artifact.
To request all artifacts of a specific type (e.g. wiki, ticket, forum, etc), perhaps only the newest versions in some cases if that is all that is wanted.

About shunning, the ability to disable syncing of shuns (as an option on the client; the syncing of other configuration could also have an option to disable it) might help.

In the limit, there should be a fourth option beside the current zip/tarball/sqlar downloads: clone a given commit ID, resulting in a self-contained repo that can make strong assertions about parentage, commit signing, etc., containing only the minimum info needed to build that single version.

I believe this is already possible with the existing protocol, without using the new commands mentioned above (assuming that whoever commited them has signed them). The protocol already allows you to request a specific artifact; just request the manifest artifact for the given commit ID, and then request all files that belong to that commit.

However, the fossil clone and fossil sync commands have no option to request only specific artifacts, as far as I can see. This can be fixed perhaps by adding a --artifact option to request only a specific artifact, and a --checkin option to request that artifact and then to parse it as a manifest and also request any files it refers to.

(28) By Warren Young (wyoung) on 2020-10-10 03:16:06 in reply to 27 [link] [source]

the ability to disable syncing of shuns…might help.

The shun list already doesn't sync. (Scroll to the bottom.) You have to go out of your way to pull the shun list, e.g. via "fossil conf pull shun".

The reason a clone from a repo containing shunned artifacts doesn't get those artifacts isn't because the shun list comes down on initial clone, it's because the remote repo refuses to serve them, if they haven't been rebuilt out of existence yet.

I believe [pulling a single commit] is already possible with the existing protocol

We need more than protocol support. For example, when you say "fossil ui" after opening such a repo, what then happens? If you say "fossil pull", what then? Can you bisect such a repo? If you clone such a repo and then point it at the main repo, what breaks, if anything? Etc.

(29) By anonymous on 2020-10-10 04:26:56 in reply to 28 [link] [source]

The shun list already doesn't sync.

Sorry, I made a mistake; I was confused. It says it does pull the shunning list by default (but not sync), although that can already be controlled anyways.

We need more than protocol support.

I described one thing that would be needed in the next paragraph, but you made a good point; I don't know what will break, because I have not tried it. I suppose ideally it would display "not available" if you try to access the history. Obviously many features won't work if you do not have all of the artifacts, but that ought not to prevent the features that can work from working.

Message f11daadc90 does mention the following:

... clone a given commit ID, resulting in a self-contained repo ... containing only the minimum info needed to build that single version. ...

I was just mentioning how to fix the clone, pull, andsync` commands to allow it (without changing the protocol), but you made a good point that it is also necessary to ensure that it doesn't break due to some artifacts not available.

Fossil User Forum

Fossil + Git as ”blockchains”