Lessons from 13+ Years of Fossil

(1) By Gavin D. Howard (gavin) on 2020-10-02 15:23:52 [link] [source]

There is an episode of the Changelog podcast with Dr. Hipp talking about SQLite. Near the end, he mentions that he has some ideas to make a "git killer" VCS. When I emailed him asking what those ideas were, he said that most were already in Fossil.

He suggested that I read all of Fossil's documentation (which I am close to finishing) and to post in the forum to ask what lessons have been learned from 13+ years of Fossil.

So, what have you all learned from Fossil, whether as a user or developer?

(2) By Stephan Beal (stephan) on 2020-10-02 16:25:47 in reply to 1 [link] [source]

So, what have you all learned from Fossil, whether as a user or developer?

Challenge accepted!

Very brief background: i migrated to fossil from SVN (formerly CVS) before git became The Next Big Thing, in early 2008 after having discovered it (randomly tripping over a reference to it in an sqlite3 ticket or wiki page) over Christmas break of 2007. My practical experience with git is (extremely) limited to at-work projects, where i hated every minute of using it, and i am functionally illiterate when it comes to git.

The lessons/learnings which come to mind include, in no particular orde...

CGI connectivity was, and still is, one of fossil's secret weapons (to the best of my knowledge, still unique among FOSS SCMs). This opens it up as self-hostable for even the cheapest of the cheap shared hosters (i.e. what i use for hosting).
Though DVCS inherently promises us the ability to work independently, in practice we simply don't do so. We still collectively like to have a central authority, an "official upstream." This is most readily visible in fossil's "autosync" setting defaulting to on.
A VCS without a timeline-style view is now unthinkable. It would be a real drag to go back to working that way.
Fossil makes both the branching and the merging absolutely painless. Branching was always a last resort for me in SVN (and almost unheard of in CVS), and projects i was on had changes simply disappear when merging more than once. Also, fossil's branching model (using tags) is downright elegant.
The ability to diff any two versions via a simple point-click in the timeline or diff -tk --from ... is real godsend, as is the ability to copy that diff's URL and send it to anyone else. Sure, other systems can diff like that, but it always felt like more work in other systems.
Immutable history is the One True Way. The ability to modify (as opposed to amend) SCM history is Downright Wrong. Linux (i.e. git) truly needs that feature to keep its thousands-of-commits-a-week from costing terrabytes of space, but it's still Downright Wrong and facepalmy, and has no place in smaller-scale projects (which is the overwhelmingly vast majority of projects on the planet).
SCM data model and storage model can and should be separated. People often assume that in fossil it's possible to change the SCM history with a handful of SQL, but that's absolutely false. Fossil's data model is 100% independent of the storage model, and fossil "could" be implemented 100% without a database. sqlite3, however, does a great deal of the heavy lifting in fossil, and a non-db-powered implementation would require probably 10-20x as much code. sqlite3 is what makes many of fossil's features feasible to implement.
C for fun and profit! Before finding fossil i had given up C in the mid-90s in favor of higher-level languages like Java, C++, and scripting languages. Fossil showed me that C can indeed be used in maintainable ways, and was single-handedly responsible for my return to C (which is again, since about 10 years now, my primary language).
Monoculture helps nobody. Fossil is the largest FOSS project i've contributed to, and we have no shortage of conflicting personalities and opinions here (who, me?!?). That's a good thing, as a single-opinion monoculture leads to poorer overall results. The learning here is to be open to others' ideas, regardless of how wildly they may differ from one's own.
Markdown isn't all that bad. To my shame, it took me far too long to warm up to markdown, solely because of its common association with my arch-nemesis git/github. We have to accept that sometimes the Bad Guys do it right, and be willing to learn from them.

There's more, but we'll stop there. If i've used/contributed to this thing for nearly 13 years without learning more than that then something has gone horribly wrong.

(5) By Gavin D. Howard (gavin) on 2020-10-02 17:42:41 in reply to 2 [link] [source]

Now I have learned a lot.

I would not have considered CGI as Fossil's secret weapon! But it makes sense because I had to do a lot of hard work to find a git hoster (Gitea) that I could run in a cheap cloud instance.

I agree that we definitely don't work independently in practice, but that begs the question: why don't we?

Why does branching and merging work better in Fossil than in SVN? I have heard a lot of people say that git and Fossil are better in this respect, but why?

For immutable history, I admit that reading Rebase Considered Harmful was a turning point in my opinion.

Thank you!

(11) By Stephan Beal (stephan) on 2020-10-02 18:36:43 in reply to 5 [link] [source]

I would not have considered CGI as Fossil's secret weapon! But it makes sense because I had to do a lot of hard work to find a git hoster (Gitea) that I could run in a cheap cloud instance.

When fossil came about, "cloud instances" weren't yet a real thing. i couldn't run SVN on my shared hoster because doing so required a standalone server running. (Though i'm a long-time Unix system administrator, i really hate having to maintain internet-facing systems, and prefer to leave root servers, and similar, to those who enjoy/have the patience for that type of thing. i'll take a shared hoster with SSH access and CGI support instead.)

For me the two absolutely killer features which brought me into fossil were CGI access and the wiki. Though the wiki is far less important to me now than it was then, CGI access is still a 100% absolute killer feature for me. CGI is ancient and perhaps quaint, but it's available everywhere and works like a champ.

I agree that we definitely don't work independently in practice, but that begs the question: why don't we?

Probably, and i'm speculating here, it's because practice has shown us collectively that the longer we work apart, the larger the drift in our code becomes, making it potentially more difficult to consolidate later on. That's a simplification, obviously, as there are many factors which play into whether or not a code conflict, or similar, is likely, but the principle is, in my mind at least, simply a fact of life in collaborative software development.

Even when working on a long-lived branch on the main fossil repo, it's common practice to merge the trunk into that branch now and then so as to minimize the chance/effects of collisions when the branch is finally merged into trunk. So, even when we're working apart (in other branches), that central reference point (trunk) helps keep us all from straying too far from the main line of development.

That said: the overall benefits of DVCS over central-server solutions (SVN, CVS) are (to me) undeniable. In practice, though, we effectively use a hybrid, in that we generally tend to use a central server as our official basis of comparison, and we check against/sync with that central authority often (insofar as possible, noting that purely offline work is certainly a real thing, it's just not all that common in collaborative projects).

Why does branching and merging work better in Fossil than in SVN? I have heard a lot of people say that git and Fossil are better in this respect, but why?

The innards and theory of branching and merging are quite above my pay grade, so i can only speak from the practical on-the-job benefits: in the last project i used SVN on (2012-13) we used several branches at a time to develop new features. We would sometimes find that, after merging branches X and Y into trunk, that some portion of X which neither Y nor trunk concurrently touched would simply be missing from the resulting merge. We didn't always notice that immediately, though - we most often stumbled over it in production. When using a single branch in SVN i've never seen such weirdness - only when multiple branches are going on at once. (i'm actually quite fond of both SVN and CVS - respect their limits and they work well.)

Fossil merging and always been painless for me. Even when there are merge conflicts, they're simply a matter of sorting out which parts of a conflict are current and which are not (necessarily a human task - the software can't help, beyond marking the boundaries of the conflicts). In the context of git i've only ever heard the word "merge conflict" mentioned in hushed whispers, as it's supposedly quite painful to deal with them there (but i've got no experience with that, so this is all hearsay).

(15) By Gavin D. Howard (gavin) on 2020-10-02 19:07:48 in reply to 2 [link] [source]

Forgot to say that I discovered C fairly recently as well. I agree with you, though I wish that there was less undefined behavior.

(3) By Richard Hipp (drh) on 2020-10-02 17:03:02 in reply to 1 [link] [source]

I don't remember what was on my mind for the Changelog podcast. But here are some random thoughts:

Git uses a key/value database in a bespoke format (packfiles). Fossil uses a relational database (SQLite). This gives Fossil the ability to do lots of interesting things that Git cannot do, such as easily finding the descendants (children) of a check-in. But a relational database could be retrofitted into libgit2. The idea here is that the useful relational data is extract from the Git logfile. When you want to use relational data, the library first checks to see if the logfile has been appended to since the last relational data update, and if so it reads and parses the tail of the logfile and adds it to the relational tables. Couple this with a virtual table that can read/write packfiles, and you have infrastructure for a really powerful VCS that is fully Git compatible and thus able to interoperate with recent tooling that assumes Git.
The Git/Fossil model of giving a clone of the complete project history to every developer works fine for smaller projects like SQLite or the Linux kernel. But it does not scale to larger projects. It seems like a hybrid model would work better. Cloning is possible for people who have the storage space and bandwidth to do the initial clone, but a client/server interaction similar to CVS or SVN could be used for cases where the repository was exceptionally large and cloning becomes a problem. We've talked about adding support for this to Fossil, but no work has gone forward on that yet.
Git/Fossil works well for source code files. They do not work as well for large binary artifacts such as images, sound files, and audio files. In particular, there are no good ways (at present) to do concurrent editing or merging of these kinds of binary artifacts. As far as I know, this is still an open problem. Nobody has come up with a good solution. If you solve this problem, you will likely become famous. I'm guessing that the solution here will require thinking outside the box.

(6) By Gavin D. Howard (gavin) on 2020-10-02 17:50:15 in reply to 3 [link] [source]

I agree that a database is nice for a VCS, but have you ever run into the problem of outlandishly huge Fossil files for certain repos? Is Fossil not intended to scale as far as git will? Also, what are the disadvantages of keeping the diff for each commit in packfiles but keeping all other data inside of a database? Would that scale better, or would it add more problems?

As far as versioning binary artifacts, I think I may have a way of doing that. (See the Sections "Portable Executables", "Universal File Format", and "Version Control for Binary Files".)

But even if those ideas will work, I want the VCS to be high-quality in every respect, which is why I am asking these questions. Thank you for your patience.

(8) By Stephan Beal (stephan) on 2020-10-02 18:15:41 in reply to 6 [link] [source]

I agree that a database is nice for a VCS, but have you ever run into the problem of outlandishly huge Fossil files for certain repos?

Shrug. Those people are, IMHO, using The Wrong Tool for the Job.

Seriously: the NetBSD(?) package source repo has no business using fossil, in its current form, as its SCM. But as Blade so elegantly put it: there's always someone trying to ice skate uphill.

Shrug.

Is Fossil not intended to scale as far as git will?

Fossil's reason for existence, plain and simple, is managing the source code for sqlite3. That's the scope of project it was initially envisioned for, and that size of project seems to be quite average for the FOSS world, falling clearly somewhere between "small" and "large." It does not scale well to projects with hundreds of thousands of files in any given version and/or tens of GB of data. As that's not how any of the more active developers use it, there's been little incentive for them/us to put any effort into "solving" (changing) that.

Also, what are the disadvantages of keeping the diff for each commit in packfiles but keeping all other data inside of a database? Would that scale better, or would it add more problems?

i know nothing at all about git's packfiles, so can't comment on that, other than to say that the single-file approach is not the central scalability problem: sqlite is (independently of fossil) used in some stupidly huge databases (someone on the sqlite forum a couple of days ago mentioned a 750GB db). Fossil's manifest record format, which is the central-most core of its data model, is simply not geared towards repos with tremendous numbers of files in any given version/checkin. It's perfectly fine for small/mid-sized projects (which is the majority of projects), but can slow down considerably, eventually to the point of unusability, for projects with tens/hundreds of thousands of files.

But even if those ideas will work, I want the VCS to be high-quality in every respect

For a given value of "every", fossil is high quality in "every" respect. It doesn't scale well to Linux-kernel-scope projects, but it was not designed to, so any objection to its quality for such cases is kind of like down-rating a 200ml cup because it isn't suitable for one's 300ml use case.

Aside from cases of failed storage, we're still unaware of any data loss in any fossil repositories. git, on the other hand, has lost more data for me than literally any other piece of software aside from /bin/rm. Fossil makes it exceptionally difficult to lose data (and sqlite as the storage layer is, for the most part, to thank for that), whereas git makes it inevitable.

Before fossil commits (in the SQL sense) any changes, it reads back (from within an SQL transaction) all of the data it will commit (in the SQL sense) an ensures that what it's written is 100% what it will read back. If it cannot do that, it will fail loudly. It's essentially impossible to get corrupted data into a fossil db because of its multiple self-checks in performs on every single injection of new SCM-side data (as opposed to, say, user table changes, which are not SCM-relevant and not versioned).

(10) By Gavin D. Howard (gavin) on 2020-10-02 18:25:37 in reply to 8 [link] [source]

I apologize; I did not mean to make it sound like Fossil isn't high quality because I think it is. It is not proper to say something isn't high quality for a use case that it was explicitly not designed for.

(12) By Stephan Beal (stephan) on 2020-10-02 18:42:08 in reply to 10 [link] [source]

I apologize;

Not needed - i didn't intend to imply that you were complaining about fossil's 200ml capacity (but my response, upon a second reading, certainly comes across that way). i was just placing a limit on the scope of "high quality in every way."

(13) By Andreas Kupries (aku) on 2020-10-02 18:49:19 in reply to 8 [link] [source]

Is Fossil not intended to scale as far as git will?

[...] It does not scale well to projects with hundreds of thousands of files in any given version and/or tens of GB of data.

Yes, because a regular manifest of fossil is a list of all the files in the commit, i.e. paths and hashes.

That said, Fossil does support something called delta manifests. You simply have to turn them on (see below), when you need them. These are manifests which:

Refer to a regular (baseline) manifest, and then
list only the changed paths and their hashes.

A delta-compression specifically for manifests, beyond the deltas done generically at blob-level.

The support was added several years ago (*), at the asking of a very large project, with a heap load of files. I think it was one of the BSD's, although I do not recall if it was Free or Net.

See https://fossil-scm.org/fossil/doc/trunk/www/fileformat.wiki#manifest and the B card description just after the set of allowed cards.

In normal operation fossil creates delta manifests only if the repository already contains at least one delta manifest. Which means that delta manifests are off by default. The feature can be forced on by using the --delta option for the commit command. From that point on fossil will then generate delta manifests whenever they are worth it.

(Ad *) I found a 2013 reference, so at least seven years by now.

(16) By Stephan Beal (stephan) on 2020-10-02 20:17:31 in reply to 13 [link] [source]

That said, Fossil does support something called delta manifests

They're not a magic bullet, though. Creating a delta manifest still requires navigating through the parent version's list of files to find the differences between the parent and delta.

If the repo-cksum option (the R-card generation) is on (which it is by default) then delta manifests save only raw manifest size, not manifest creation runtime, as calculating the R-card still has to read the listed version of every single file from the repo db and hash it for the R-card. For a repo the size of the BSD one the cost of that calculation is downright prohibitive.

Disabling that option removes one of fossil's 3(?) layers of repo/manifest integrity checks, but the R-card is now of arguable utility (it certainly had more value in fossil's earliest days of development). To the best of my recollection, we've never had a report of someone who was "saved" by the R-card check after something wicked managed to pass through the other layers. i.e. it's arguably completely superfluous.

In any case: the BSD pkgsrc repo's top-most full manifest (ee90d4da...) is currently 10MB, with 103100 F-cards (distinct file names). When creating a delta manifest from that (like the current trunk in that repo is), fossil still has to read in, parse, and hold in memory that 10MB baseline. If the R-card were on (it's not for that repo, as we can see in the manifests), that would require extracting 103100 different files from the repo and hashing them as part of each checkin.

The current pkgsrc trunk is a delta from that, and it's only 688 lines and 64kb. The sqlite3 manifest, by comparison, is "only" about 144kb (but deltas are not used in that repo, so that size applies to every manifest).

Sidebar: the manifest parsing is actually as memory-efficient as it gets: after reading the raw manifest blob (a single memory block), parsing it essentially goes through that memory and puts NUL characters at the ends of tokens (where spaces or newlines are in the raw copy). "Fossilized" filenames (those with escaped characters) get "defossilized," but that form is smaller than the original, so it's done inline in the original memory. Thus, though the manifest data structure is ostensibly complicated and memory-hungry, all of its dynamic memory is owned by that one memory block. Even so, a 10MB manifest is not cheap.

The feature can be forced on by using the --delta option for the commit command. From that point on fossil will then generate delta manifests whenever they are worth it.

They can also be forced off with a relatively new (this past summer) setting whose name i've forgotten. e.g. Richard specifically doesn't want them appearing in the sqlite3 repo because he provides the manifest to users as a way of verifying the contents of sqlite3 download distributions. Delta manifests make such validation next to impossible for clients. (That said, i use them in all of my repos because such validation is not an issue there.)

(17) By Andreas Kupries (aku) on 2020-10-02 20:41:37 in reply to 16 [link] [source]

Looking through the fossil documentation I have not really found anything talking about delta manifests and their pros and cons in the depth as you did above.

There are essentially only two short references in the File format docs, namely Manifest format, and in the addenda.

Would it be worth its own page, as an advanced topic ?

(19) By Stephan Beal (stephan) on 2020-10-02 21:10:41 in reply to 17 [link] [source]

Looking through the fossil documentation I have not really found anything talking about delta manifests and their pros and cons in the depth as you did above.

Back when the libfossil effort was still a thing (before RSI cut me down a few notches), implementing delta manifests was one of the porting tasks and they kind of fascinated me. (There are gaps in that memory, though, such as whether the generation of a delta actually saves time or if it's just saving manifest storage space.)

Would it be worth its own page, as an advanced topic ?

i am certainly willing to write something up "real soon now." This weekend i'm adopting a 2nd puppy, which is likely to keep me busy full-time with house-training for at least 6 weeks or so (judging by how long it took with her older sister). As time/energy allows, though, i will put something together unless someone else beats me to it.

(35) By anonymous on 2020-11-12 14:56:40 in reply to 17 [link] [source]

I agree this feature needs better documentation, as it can be rather useful for even a small repository. It slows down the rate at which the repository gobbles up database pages.

One thing I would want to know is how Fossil decides whether to use a delta or baseline manifest in a commit. It would also be helpful to adjust the wording on the commit command page to say that the --delta option actually turns the feature on and that no further action is needed from the user.

I'd also suggest adding a --delta option to fossil init so as to have the feature available from the very beginning of a new repository.

(36.1) By Stephan Beal (stephan) on 2020-11-12 15:50:51 edited from 36.0 in reply to 35 [link] [source]

One thing I would want to know is how Fossil decides whether to use a delta or baseline manifest in a commit.

When the repository "sees" (processes) a delta manfifest it records a boolean config setting named seen-delta-manifest (which is an internal detail and not documented). If this setting is not set then fossil will never generate a delta manifest unless it is forced to via the --delta flag to commit (but that flag is trumped by the forbid-delta-manifests setting (see below)). The --delta flag, in turn, will cause the seen-delta-manifest setting to get turned on when the fossil internals process the manifest generated by the checkin.

When generating a checkin manifest, if the repo has the seen-delta-manifest setting then it will generally always attempt to create a delta manifest. After creating it, fossil checks out much space that actually saved, and if it's below a certain threshold (i forget exactly what, or how it figures it out) then the delta is discarded and a full (a.k.a. "baseline") manifest is used instead (computing a delta is notably more complicated, so it's not preferred if the savings is minimal).

I'd also suggest adding a --delta option to fossil init so as to have the feature available from the very beginning of a new repository.

That's sounds a reasonable thing to do, but it has a very notable caveat described below. As a workaround until that happens, you can add the seen-delta-manifest setting to the repo's config table, but it cannot be set with the config command because it's an undocumented internal setting. Something like this should do the trick:

$ fossil sqlite -R theRepo
...
sqlite> replace into config (name,value,mtime) values ('seen-delta-manifest',1,cast(strftime('%s') as integer));
sqlite> ^D

The obligatory caveat:

As an undocument/internal setting, that setting will not sync, and will only allow deltas to be generated on that copy of the repo. Thus if you start working with that repo locally immediately after creating it, it may/might (starting with the second checkin) generate deltas. However, if you first upload that new/empty repo to a remote, then clone it, your local clone will not have that setting and will not generate deltas until/unless it has "seen" another delta during its normal manifest processing (which includes using the --delta commit option). As of that point, it remembers having seen one and it is then free to create deltas for future commits. That doesn't mean it will create deltas, only that it may - it won't create a delta if it decides that the size savings are not worth the effort.

Sidebar: the forbid-delta-manifests setting was added because delta manifests are specifically undesired in certain repos. sqlite, for example, provides instructions explaining to clients how to confirm that the sqlite zip/tar archive they've downloaded really contains genuine/pristine/unmodified copies of those files. That confirmation requires that the downloads include the checkin's manifest, and that confirmation is only possible if the manifest is a full/baseline. Confirming the authenticity of files with a delta manifest is complicated, requiring two manifests (its baseline and the delta), and a good understanding of exactly how fossil delta manifests are constructed (an internal detail clients shouldn't have to concern themselves with). forbid-delta-manifests will not reject a delta which arrives via a remote, but it will stop a given repository from generating them on its own. Unlike seen-delta-manifest, forbid-delta-manifests is a documented, syncable option.

(37) By anonymous on 2020-11-12 15:56:04 in reply to 36.1 [link] [source]

Thanks for that. Luckily I don't need it to sync as I only use it for personal LaTeX work.

(18) By Gavin D. Howard (gavin) on 2020-10-02 20:57:41 in reply to 16 [link] [source]

From what I can tell, you are saying that the size of the Fossil file should never be a problem for scalability; instead, it is the structure of the data inside the database that stops Fossil from scaling to repos with large numbers of files and versions.

Did I read that correct?

(20) By Stephan Beal (stephan) on 2020-10-02 21:45:12 in reply to 18 [link] [source]

From what I can tell, you are saying that the size of the Fossil file should never be a problem for scalability;

i wouldn't quite say "never," but based on the sizes of dbs people often mention on the sqlite3 forum, the db size itself does not seem to be a scalability limitation for any "remotely reasonable" repo size (maybe tens of GB, just to pick a number out of thin air). Certain operations are certainly slowed down, perhaps considerably, by large db size, e.g. a vacuum (which is part of the rebuild process, IIRC), but day-to-day ops on the database don't depend so much on the amount of physical drive space allocated to the repo db.

instead, it is the structure of the data inside the database that stops Fossil from scaling to repos with large numbers of files and versions.

The raw record format used to record checkins is where the current scaling pains seem to be centered. The database schema around that is all just normalized/cached reformulations of that information, and the entire schema, except for the blob table, can essentially be changed at will to optimize things as needed. What can't simply be changed, while retaining backwards compatibility, is the raw manifest format. The manifests are stored as-is in the blob table, noting that the data model does not specify any specific storage type and is not itself dependent on a database - sqlite3 is an implementation detail for fossil's implementation of that storage-agnostic data model. (It's an important one, though: many of fossil's features would not be nearly as feasible without a relational db.)

To give an indication of how important that backwards compatibility is to the project, here's the opening line from the relevant documentation:

https://fossil-scm.org/fossil/doc/trunk/www/fileformat.wiki

The global state of a fossil repository is kept simple so that it can endure in useful form for decades or centuries. A fossil repository is intended to be readable, searchable, and extensible by people not yet born.

And that is very much the case, but it has inherent scalability limits. Specifically, certain computational costs grow effectively linearly on the number of files in a given version and/or the aggregate size of those files, and that design never foresaw (or ignored - not sure which) pathological cases like 100k files in a single checkin and 20GB+ of file content.

Keep in mind that fossil's initial goal was literally to manage the sqlite3 source tree, and the design's limitations are well within any growth that particular project will ever see. It wasn't until 3rd parties picked up on fossil and started trying to (mis?)apply it to massive trees that any scaling issues came to light. By that point, the core architecture was well-established and could not be readily modified to address it. Delta manifests were, IIRC, initially added for the sake of the such behemoth projects as the BSD pkgsrc repo (the largest single repo we know of), but they really only address one aspect of the cost of large repos: the file size of the checkin manifest. Without delta manifests, checking in a 1-line change on the pkgsrc repo would result in a 10MB manifest as part of that change. Deltas cut that particular cost down considerably but don't, unless i'm sorely misremembering the details, save much (if any) related computational costs.

If the performance limitations of the checkin manifest were somehow solved (presumably via a new architecture/structure), we might (might) discover pains involving the db size, but that's purely conservative speculation with no concrete grounds for believing it would be the case. Lots of people get by just fine with truly massive sqlite3 db files in non-fossil contexts.

(21) By Gavin D. Howard (gavin) on 2020-10-02 23:05:08 in reply to 20 [link] [source]

Thank you for the details!

(22) By John Rouillard (rouilj) on 2020-10-03 01:37:12 in reply to 20 [link] [source]

If a delta manifest is going to be generated why are checksums for all files in the manifest recalculated? It seems that only the checksums for the new/changed files would need recalculation.

Even if you are not using delta manifests, it seems you should be able to generate a new manifest by extracting the manifest for the parent version and change the entries for the new/updated/deleted files in the current checkin. This allows you to skip calculating checksums etc. for all files that are not changing by reusing the previous manifest's values for the files remaining the same.

(23) By Stephan Beal (stephan) on 2020-10-03 04:42:35 in reply to 22 [link] [source]

If a delta manifest is going to be generated why are checksums for all files in the manifest recalculated? It seems that only the checksums for the new/changed files would need recalculation.

That's simply not how the R-card checksum is defined, and delta manifests didn't exist for the first 5-ish years, so they didn't play a role in the definition. The checksum, like the manifest itself, does not know which files changed. The manifest simply records the list of files and (only in the special case of deltas) deletions. The only way to determine which files changed between versions is to compare their list of files (their manifests).

The file format definition, which foresees clients beyond the fossil binary, does not require that any given client actually know what changed between versions. It requires, instead, that they provide a complete list of their version's contents.

(4) By Marcelo Huerta (richieadler) on 2020-10-02 17:17:04 in reply to 1 [link] [source]

I'm a Fossil user for my own projects only, but these are my $.02 of experience:

Small is beautiful

Having all tools with a single executable file is wonderful. You run it as a server over a "museum" directory of repo files: instant visual accessibility of all my projects in one place.
... specially if it's cross platform

Compiles in any unixoid and in Windows. Easily runnable as a service or daemon in most of them. Fantastic.
Love the timeline

Having the graph of the whole situation at any given time is invaluable to have a clear picture. It gives me a sense of the whole project, and I don't need to install additional software: my normal browser suffices. As Stephen mentioned, is painful not to have this in other systems.
Embrace the full history

I enjoy having the full history available, without having squashed any data. If I want to hide some intermediate steps I can always hide specific commits with a flag, but it's available if I need it back.
I find merges in Fossil visually and conceptually cleaner

(7) By Gavin D. Howard (gavin) on 2020-10-02 17:52:10 in reply to 4 [link] [source]

Thank you!

Can you explain why merges in Fossil are cleaner for you? As background, git is (besides some playing around with Fossil and wishing I had the time to convert personal repos to it) the only VCS I am able to use, so my vision is colored by git.

(9) By Marcelo Huerta (richieadler) on 2020-10-02 18:18:07 in reply to 7 [link] [source]

Can you explain why merges in Fossil are cleaner for you?

From what I saw in Git, after merging a branch B into another branch A, all commits from B now appear to be shown as part of A also (or at least that's what the tooling show me). In Fossil, AFAIK, the merge with A, happens with the latest node of branch B; branch A is not "fast-forwarded" by incorporating all commits from B into A.

(Can any knowledgeable person confirm if my understanding of Fossil [and Git, for that matter] is correct?)

(24) By Thomas Hess (luziferius) on 2020-10-03 12:13:12 in reply to 9 [link] [source]

Git performs an automatic choice, unless told otherwise.

About your question: TL/DR: “It depends.”

If you do a 'feature branch' style of development as the sole developer, you frequently run into this.
By git default: If there are commits on both sides of the branch, git will perform a merge commit.
If there are commits on only one side (you didn’t commit to trunk while working on the feature branch),
git will go the 'less work involved' path and 'fast-forward' trunk by applying all commits from the feature branch on top of trunk and does not create a merge commit.
If you disregard branch names for a bit, this is sensible, as there is no real branching in the DAG in this case, if you were to draw it.

You can disable this automatic decision by using git merge --no-ff (or disable it via a per-repository setting or via a user-global setting)

If you disabled fast-forward merges, git will create an empty merge commit. It will still throw away the branch names, if you clean up the open branches in the repository.

(14.1) By sean (jungleboogie) on 2020-10-02 18:57:24 edited from 14.0 in reply to 4 [link] [source]

I agree with Marcelo's 1 and 2 statements - a single binary, cross platform file. I have git installed on my Windows machine and I'm not entirely sure what it did, but I know MinGW is required to use git. With Fossil, I know exactly where the file is and can replace it anytime I want.

It's news to me, but there are more than a couple dozen Windows GUI's for git, and I'm almost certain they're more complicated than the Fossil UI that you run in the web browser.

I think the greatest selling point for Fossil is the ability to easily set it up as a stand alone server for the ability to share your code/project with someone on the same LAN or somewhere over the internet without having to rely on a third party.

(25.1) By MBL (RoboManni) on 2020-10-05 06:22:20 edited from 25.0 in reply to 1 [link] [source]

My oldest initial empty checking I made on 2014-08-22 when I started collecting version snapshots from even times before to become enabled to compare them with each other.

Even with such an old repository I was able to still use it - I did not try to which extend this would still be valid... I am using nowadays still the same repository but rebuilt for the latest fossil executable version publicly available for windows.

I like its reliability and backwards compatibility. The functions are at least as much is needed as a minimum and in some aspects something more.

Especially the timeline and the possibilities to walk and focus and compare from there are worth enough to use fossil as a viewer tool.

There is so much to tell about but I am short in time and many others already said most about it.

After a periode of trying with small project my first bigger projects with fossil I started about 2015 at the times of SQLite3 version 3.8.10 and with

E:\backup>fossil version
This is fossil version 1.33 [9c65b5432e] 2015-05-23 11:11:31 UTC

I realized that the latest 2.13 windows 64 version is not able to show the timeline of that old repositories (ui) but the windows 32 version is. Some unexpected difference!

(27) By Gavin D. Howard (gavin) on 2020-10-04 22:02:21 in reply to 25.0 [link] [source]

I never considered backwards compatibility. Thank you.

(26) By kak (kkugler) on 2020-10-04 21:31:51 in reply to 1 [link] [source]

Unfortunately I'm being forced into git at work. Compared to git:

Fossil pros:

Small, performant self-contained executable. git-for-windows has to include a MinGW instance(?), takes up several hundred megabytes on a system and is horribly slow. Git on Linux is performant but can have some dependency issues.
Same executable for client/server.
Timeline view. I cannot imagine navigating through any scm without it. I have to use additional tools on top of git to get something that is comparable to what is built in to Fossil.
Autosync! I absolutely hate trying to keep local git repos in sync.
Branches as propagating tags.
closed tags. This allows you to keep a record of all previous branches without having to shift through all of them to see what is currently being worked on. The way you handled this on git is by deleting the branch pointer which makes it much harder to quickly navigate the repo history to look for when a feature was added or modified.

Fossil cons:

The only one that really stands out to me is integration with other tools. Many tools are designed with integration for git. This is directly reflective of market share, with git being by far the most popular scm (This is also the reason I'm forced into using it at work)

(28) By Gavin D. Howard (gavin) on 2020-10-04 22:03:29 in reply to 26 [link] [source]

I forgot about integration; I do all of my git/fossil work on the command-line, so it didn't occur to me. Thank you.

(29) By BitsVital (bitsvital) on 2020-10-04 22:10:35 in reply to 26 [link] [source]

That was one of my reasons why I hesitated so long to moving to Fossil. But, I'm mainly a Python developer. I use PyCharm 97> of the time. For anyone else that is a Python delveloper and uses PyCharm IDE there is a free plugin that intergrates with Fossil. It's some years old but it works flawlessly for me. https://plugins.jetbrains.com/plugin/7479-fossil-integration

(30) By kak (kkugler) on 2020-10-04 23:18:49 in reply to 29 [link] [source]

That's good to know about the pycharm plugin. I use multiple languages and use vscode almost exclusively so I wrote a fossil plugin for vscode.

I find that the integration problem is only a problem when it comes to developing with different departments in my organization. Since git has something like >90% share, there is >90% chance that git is what the other department is using, and as of yet I have never had a seamless bidirectional syncing between fossil <-> git (mostly human error). It ends up being easier for the few developers in my office to switch to git then for all the other developers to switch to Fossil, which is a bummer.

I have often wondered why Fossil never made significant market penetration. I suppose is it naive to think that if SQLite is the most popular db engine that Fossil would become the most popular scm tool.

(31) By BitsVital (bitsvital) on 2020-10-04 23:30:40 in reply to 30 [link] [source]

First, that's awesome, and thank you about VSCode and the plugin you made.

Second, I'm just as confused on why Fossil hasn't penetrated the market either. But, I'm working on pushing on several tangents.

One, I'm creating how-to videos on YouTube that are extremely easy to understand. Get's right to the point (I hate having to watch 20min video's when there are just 2 mins of what I need), and is very easy to understand.

I've been trying and working with the developers of the app Working Client on the Apple App Store https://apps.apple.com/us/app/working-copy-git-client/id896694807 It's a git client and it's pretty awesome. I love it. I can grab my iPad and with the app Pythonista and Working Client, I can program while traveling. I've been talking to the developer trying to convince him to add Fossil. Finger crossed I think I'm almost there.

I don't think a lot of people are aware of Fossil. I wasn't. I found it because I wanted to host my own repositories on my own server. I did a search for open source alternatives to GitHub and that's how I found it.

After finding it and trying it I was pretty excited and asked some of my collegues and they were like me. They had never heard of it.

(32) By Thomas Hess (luziferius) on 2020-10-07 14:37:28 in reply to 29 [link] [source]

The plugin is marked as compatible with versions 2.0.4 — 2019.3.4.
That seems to be the reason, why the latest versions show no result when
searching for 'fossil' in the plugin repository.

You’ll have to manually download it from the website and then install it
from disk.

(33) By Dan Shearer (danshearer) on 2020-10-14 12:18:26 in reply to 1 [source]

Gavin D. Howard (gavin) on 2020-10-02 15:23:52 :

More like a few weeks for me, rather than 13 years :-)

So, what have you all learned from Fossil, whether as a user or developer?

I have learned a lot:

Fossil and SQLite are symbiotic projects (Stephan Beal's good words). I'm delving into SQLite internals and that comes close to implying knowing about Fossil if not using it. Not least because Fossil and SQLite are (unsurprisingly) significant use cases for each other. I didn't expect this to be the case.
Fossil has a well-designed commandline UI. Git famously doesn't, which also means some unlearning of git habits is required when using Fossil. I think the plausible nonsense git man page linked to by the SQLite explanatory page on Fossil is a fair statement of how bad the interface is. I've made some silly mistakes with Fossil and I'm doing my first large merge right now, but the learning curve has been much flatter. My silly mistakes are more likely to be in public because of autosync and the attitude of "give them a commit bit", but less likely to be catastrophic.
I've learned that Fossil shifts the conversation from git to github. A discussion about DVCSs shouldn't be about Fossil vs Git, because they aren't directly comparable. It is true that git includes a GUI the source tree that some people use, but the user interface for git is on average github (or, a long distance behind, gitlab.) That's a different discussion from what I've learned about Fossil and I will try to summarise that in another thread that has been started about what it would take for Fossil to challenge github.
Fossil has fewer developers than git but a much quicker response time for fixes and features. That is partly community attitude but also the nature of the source tree. Relative to git, Fossil is small, concise, and uses SQL to reduce the amount of code required while git is large and confusing. So I've learned that if my project needs something changed about Fossil and I want to get involved, there's a fair chance that change will happen. I've seen that happen with a bug report I found already. And when I noticed some errors in the help system, it took me minutes with the Fossil source to discover how to fix the problem.
Fossil encourages a development model by design and by ethos that is a good match for the way groups of people like to work, with much-reduced forking and merging. Github has about 78 million repositories, and my impression from the "advanced search" queries I've attempted on the topic and just general experience is that very few of these repos are maintained in a manner that uses the 'D' in DVCS.
Fossil is accessible, because the Fossil design is compatible with accessibility. I'll include Github accessibility issues that are major blockers for developers in the other thread, but in brief: the current design of Fossil including the goal that all GUI features should have a commandline eqivalent makes Fossil much much more accessible. Accessibility here does not only include the obvious nature of web pages, but also the ability to connect to non-standard input and output devices. Accessibility goes hand in hand with automation, because a processing pipeline can't use a webpage either. There will be some pressure on Fossil accessibility as Javascript use is extended and improved, but clearly the Fossil design is accessible by default. Despite having at least one visual feature as good as any in its class, the web timeline.
Fossil is likely to become a tool of choice for me for project documentation. I haven't quite figured out how the intersection of Fossil's markdown and Pandoc toolchains should work. This will hinge on how metadata in Fossil markdown develops, and that's another discussion. It isn't clear to me how much of the functionality of the bespoke SQLite documentation tree will be replaced by features in Fossil, but since the projects are symbiotic I expect there will be some relationship.
I have found that Fossil is a close to a handy personal productivity tool. I can quickly raise a ticket for myself on any topic or dash off a wiki note. It's still a bit rough for that purpose but I can see the potential.

(34) By jshoyer on 2020-10-29 16:13:25 in reply to 33 [link] [source]

The (endo)symbiosis of SQLite and Fossil is indeed interesting, as illustrated by the recent recursive common table expression [5631123d66 | enhancements]. Fossil has enough documentation features now that it is just as apt to describe it as a self-contained multitool as a source code manager.

That episode of the Changelog (201) was recorded and released 4.5 years ago. They release transcripts with all their shows. Fossil is discussed at two places in the episode. The first covered dogfooding, command-line tools and such, while the second covered the comparison to Git.

(38) By MBL (RoboManni) on 2020-11-28 10:45:56 in reply to 33 [link] [source]

the current design of Fossil including the goal that all GUI features should have a commandline eqivalent makes Fossil much much more accessible.

The web based list of help shows Ticket search and Wiki search but there is nothing about Content search or more specific nothing about 'File checkin find by content search'.

To me it looks like the web enabled content searching is not very much supported and has not (yet) an equivalent for the command line based fossil grep command.

OR: Where can I find the Web-enabled equivalent for the fossil grep command? And where are the possible options described?

(39) By Kees Nuyt (knu) on 2020-11-28 12:55:57 in reply to 38 [link] [source]

Enable Search and what can be searched in menu: Admin / Search, that is /srchsetup

That page also describes the effects of the settings.

(40) By MBL (RoboManni) on 2020-11-28 16:44:22 in reply to 39 [link] [source]

thanks for the hints .. . In principle it works this way but however, I can find something only when using the command line like

fossil grep 50312008 * --no-messages

but I cannot find it by using the Search in web interface even after having all options checked and the selector set to "All" or "Docs". The web interface is just not as capable as the command line is regarding search strenght and options (e.g. the possibility to focus the search on file patterns).