An Introduction to the Fossil Data Model
Upon hearing that Fossil is based on sqlite, it's natural for people unfamiliar with its internals to assume that Fossil stores its SCM-relevant data in a database-friendly way and that the SCM history can be modified via SQL. The truth, however, is far stranger than that.
This document introduces, at a relatively high level:
The underlying enduring and immutable data format, which is independent of any specific storage engine.
The
blob
table: Fossil's single point of SCM-relevant data storage.The transformation of (1) from its immutable raw form to a transient database-friendly form.
Some of the consequences of this model.
Part 1: Artifacts
AllObjects: [ A: file "Artifacts" fill lightskyblue; down; move to A.s; move 50%; F: file "Client" "files"; right; move 1; up; move 50%; B: cylinder "blob table" right; arrow from A.e to B.w; arrow from F.e to B.w; arrow dashed from B.e; C: box rad 0.1 "Crosslink" "process"; arrow AUX: cylinder "Auxiliary" "tables" arc -> cw dotted from AUX.s to B.s; ] # end of AllObjects→ /pikchrshow
The centerpiece of Fossil's architecture is a data format which describes what we call "artifacts." Each artifact represents the state of one atomic unit of SCM-relevant data, such as a single checkin, a single wiki page edit, a single modification to a ticket, creation or cancellation of tags, and similar SCM constructs. In the cases of checkins and ticket updates, an artifact may record changes to multiple files resp. ticket fields, but the change as a whole is atomic. Though we often refer to both fossil-specific SCM data and client-side content as artifacts, this document uses the term artifact solely for the former purpose.
From the data format's main documentation:
The global state of a fossil repository is kept simple so that it can endure in useful form for decades or centuries. A fossil repository is intended to be readable, searchable, and extensible by people not yet born.
This format has the following major properties:
It is syntactically simple, easily and efficiently parsable in any programming language. It is also entirely human-readable.
It is immutable. An artifact is identified by its unique hash value. Any modification to an artifact changes that hash, thereby changing its identity.
It is not generic. It is custom-made for its purpose and makes no attempt at providing a generic format. It contains only what it needs to function, with zero bloat.
It holds all SCM-relevant data except for client-level file content, the latter instead being referenced by their unique hash values. Storage of the client-side content is an implementation detail delegated to higher-level applications.
Auditability. By following the hash references in artifacts it is possible to unambiguously trace the origin of any modification to the SCM state. Combined with higher-level tools (specifically, Fossil's database), this audit trail can easily be traced both backwards and forwards in time, using any given version in the SCM history as a starting point.
Notably, the artifact file format does not...
Specify any specific storage mechanism for the SCM's raw bytes, which includes both artifacts themselves and client-side file content. The file format refers to all such content solely by its unique hash value.
Specify any optimimizations such as storing file-level changes as deltas between two versions of that content.
Such aspects are all considered to be implementation details of higher-level applications (be they in the main fossil binary or a hypothetical 3rd-party application), and have no effect on the underlying artifact data model. That said, in Fossil:
All raw byte content (artifacts and client files) is stored in the
blob
database table.Fossil uses delta and zlib compression to keep the storage size of changes from one version of a piece of content to the next to a minimum.
Sidebar: SCM-relevant vs Non-SCM-relevant State
Certain data in Fossil are "SCM-relevant" and certain data are not. In short, SCM-relevant data are managed in a way consistent with controlled versioning of that data. Conversely, non-SCM-relevant data are essentially any state neither specified by nor unambiguously refererenced by the artifact file format and are therefore not versioned.
SCM-relevant state includes:
Any and all data stored in the bodies of artifacts. This includes, but is not limited to: wiki/ticket/forum content, tags, file names and Fossil-side permissions, the name of each user who introduces any given artifact into the data store, the timestamp of each such change, the inheritance tree of checkins, and many other pieces of metadata.
Raw file content of versioned files. These data are external to artifacts, which refer to them by their hashes. How they are stored is not the concern of the data model, but (spoiler alert!) Fossil stores in them an sqlite database, one record per distinct hash, in its
blob
table (which we will cover more very soon).
Non-SCM-relevant state includes:
Fossil's list of users and their metadata (permissions, email address, etc.). Artifacts themselves reference users only by their user names. Artifacts neither care whether, nor guaranty that, user "drh" in one artifact is in fact the same "drh" referenced in another artifact.
All Fossil UI configuration, e.g. the site's skin, config settings, and project name.
In short, any tables in a Fossil repository file except for the
blob
table. Most, but not all, of these tables are transient caches for the data specified by the artifact files (which are stored in theblob
table), and can safely be destroyed and rebuilt from the collection of artifacts with no loss of state to the repository. All of them, except forblob
anddelta
, can be destroyed with no loss of SCM-relevant data.
Terminology Hair-splitting: Manifest vs. Artifact
We sometimes refer to artifacts as "manifests," which is technically a term for artifacts which record checkins. The various other artifact types are arguably not "manifests," but are sometimes referred to as such because the internal APIs use that term.
A Very Basic Example
The following artifact, truncated for brevity, represents a typical checkin artifact (a.k.a. a manifest):
C Bug\sfix\sin\sthe\slocal\sdatabase\sfinder.
D 2007-07-30T13:01:08
F src/VERSION 24bbb3aad63325ff33c56d777007d7cd63dc19ea
F src/add.c 1a5dfcdbfd24c65fa04da865b2e21486d075e154
F src/blob.c 8ec1e279a6cd0cfd5f1e3f8a39f2e9a1682e0113
<SNIP>
F www/selfcheck.html 849df9860df602dc2c55163d658c6b138213122f
P 01e7596a984e2cd2bc12abc0a741415b902cbeea
R 74a0432d81b956bfc3ff5a1a2bb46eb5
U drh
Z c9dcc06ecead312b1c310711cb360bc3
Each line is a single data record called a "card." The first letter of
each line tells us the type of data stored on that line and the
following space-separated tokens contain the data for that
line. Tokens which themselves contain spaces (notably the checkin
comment) have those escaped as \s
. The raw text of wiki
pages/comments, forum posts, and ticket bodies/comments is stored
directly in the corresponding artifact, but is stored in a way which
makes such escaping unnecessary.
The hashes seen above are a critical component of the architecture:
The
F
(file) records refer to the content of those files by the hash of that content. Where that content is stored is not specified by the data model.The
P
(parent) line is the hash code of the parent version (itself an artifact).The
Z
line is a hash of all of the content of this artifact which precedes theZ
line. Thus any change to the content of an artifact changes both the artifact's identity (its hash) and itsZ
value, making it impossible to inject modified artifacts into an existing artifact tree.The
R
line is yet another consistency-checking hash which we won't go into here except to say that it's an internal consistency check/line of defense against modification of file content referenced by the artifact.
Part 2: The blob
Table
AllObjects: [ A: file "Artifacts"; down; move to A.s; move 50%; F: file "Client" "files" fill lightskyblue; right; move 1; up; move 50%; B: cylinder "blob table" fill lightskyblue; right; arrow from A.e to B.w; arrow from F.e to B.w; arrow dashed from B.e; C: box rad 0.1 "Crosslink" "process"; arrow AUX: cylinder "Auxiliary" "tables" arc -> cw dotted from AUX.s to B.s; ] # end of AllObjects→ /pikchrshow
The blob
table is the core-most storage of a Fossil repository
database, storing all SCM-relevant data (and only SCM-relevant
data). Each row of this table holds a single artifact or the content
for a single version of a single client-side file. Slightly truncated
for clarity, its schema contains the following fields:
uuid
: the hash code of the blob's contents.rid
: a unique integer key for this record. This is how the blob table is mapped to other (transient) tables, but the RIDs are specific to one given copy of a repository and must not be used for cross-repository referencing. The RID is a private/internal value of no use to a user unless they're building SQL queries for use with the Fossil db schema.size
: the size, in bytes, of the blob's contents, or -1 for "phantom" blobs (those which Fossil knows should exist because it's seen them referenced somewhere, but for which it has not been given any content).content
: the blob's raw content bytes, with the caveat that Fossil is free to store it in an "alternate representation." Specifically, thecontent
field often holds a zlib-compressed delta from a previous version of the blob's content (a separate entry in theblob
table), and an auxiliary table nameddelta
maps such blobs to their previous versions, such that Fossil can reconstruct the real content from them by applying the delta to its previous version (and such deltas may be chained). Thus extraction of the content from this field cannot be performed via vanilla SQL, and requires a Fossil-specific function which knows how to convert any internal representations of the content to its original form.
Sidebar: How does blob
Distinguish Between Artifacts and Client Content?
Notice that the blob
table has no flag saying "this record is an
artifact" or "this record is client data." Similarly, there is no
place in the database dedicated to keeping track of which blob
records are artifacts and which are file content.
That said, (A) the type of a blob can be implied via certain table
relationships and (B) the event
table (the /timeline
's main data
source) incidentally has a list of artifacts and their sub-types
(checkin, wiki, tag, etc.). However, given that all of those
relationships, including the timeline, are transient, how can Fossil
distinguish between the two types of data?
Fossil's artifact format is extremely rigid and is strictly enforced internally, with zero room provided for leniency. Every artifact which is internally created is re-parsed for validity before it is committed to the database, making it impossible that Fossil can inject an invalid artifact into the repository. Because of the strictness of the artifact parser, the chances that any given piece of arbitrary client data could be successfully parsed as an artifact, even if it is syntactically 99% similar to an artifact, are effectively zero.
Thus Fossil's rule of interpreting the contents of the blob table is: if it can be parsed as an artifact, it is an artifact, else it is opaque client-side data.
That rule is most often relevant in operations like rebuild
and
reconstruct
, both of which necessarily have to sort out artifacts
and non-artifact blobs from arbitrary collections of blobs.
It is, in fact, possible to store an artifact unrelated to the current repository in that repository, and it will be parsed and processed as an artifact (see below), but it likely refers to other artifacts or blobs which are not part of the current repository, thereby possibly introducing "strange" data into the UI. If this happens, it's potentially slightly confusing but is functionally harmless.
Part 3: Crosslinking
AllObjects: [ A: file "Artifacts"; down; move to A.s; move 50%; F: file "Client" "files"; right; move 1; up; move 50%; B: cylinder "blob table" right; arrow from A.e to B.w; arrow from F.e to B.w; arrow dashed from B.e; C: box rad 0.1 "Crosslink" "process" fill lightskyblue; arrow AUX: cylinder "Auxiliary" "tables" fill lightskyblue; arc -> cw dotted from AUX.s to B.s; ] # end of AllObjects→ /pikchrshow
Once an artifact is stored in the blob
table, how does one perform
SQL queries against its plain-text format? In short: One Does Not
Simply Query the Artifacts.
Crosslinking, as its colloquially known, is a one-way processing step
which transforms an immutable artifact's state into something
database-friendly. Crosslinking happens automatically every time
Fossil generates, or is given, a new artifact. Crosslinking of any
given artifact may update many different auxiliary tables, all of
which are transient in the sense that they may be destroyed and then
recreated by crosslinking all artifacts from the blob
table (which
is exactly what the rebuild
command does). The overwhelming majority
of individual database records in any Fossil repository are found in
these transient auxiliary tables, though the blob
table tends to
account for the overwhelming majority of a repository's disk space.
This approach to mapping data from artifacts to the db gives Fossil
the freedom to change its database model, effectively at will, with
minimal client-side disruption (at most, a call to rebuild
). This
allows, for example, Fossil to take advantage of new improvements in
sqlite without affecting compatibility with older repositories.
Auxiliary tables hold data mappings such as:
- Child/parent relationships of checkins. (The
plink
table.) - Records of file names and changes to files. (The
mlink
andfilename
tables.) - Timeline entries. (The
event
table.)
And numerous other bits and pieces.
The many auxiliary tables maintained by the app-level code reference
the blob
table via its RID field, as that's far more efficient than
using hashes (blob.uuid
) as foreign keys. The contexts of those
auxiliary data unambiguously tell us whether the referenced blobs are
artifacts or file content, so there is no efficiency penalty there for
hosting both opaque blobs and artifacts in the blob
table.
The complete SQL schemas for the core-most auxiliary tables can be found at:
Noting, however, that all database tables are effectively internal APIs, with no API stability guarantees and subject to change at any time. Thus their structures generally should not be relied upon in client-side scripts.
Part 4: Implications and Consequences of the Model
Some of the implications and consequences of Fossil's data model combined with the higher-level access via SQL include:
Provable immutability of history. Fossil offers only one option for modifying history: "shunning" is the forceful removal of an artifact from the
blob
table and the creation of a db record stating that the shunned hash may no longer be synced into this repository. Shunning effectively leaves a hole in the SCM history, and is only intended to be used for removal of illegal, dangerous, or private information which should never have been added to the repository.Complete separation of SCM-relevant data and app-level data structures. This allows the application to update its structures at will without significant backwards-compatibility concerns. In Fossil's case, "data structures" primarily refers to the SQL schema. Bringing a given repository schema up to date vis a vis a given fossil binary version simply means rebuilding the repository with that fossil binary. There are exceptionally rare cases, namely the switch from SHA1 to SHA3-256 ushered in with Fossil 2.0, which can lead to true incompatibility. e.g. a Fossil 1.x client cannot use a repository database which contains SHA3 hashes, regardless of a rebuild.
Two-way compatibility with other hypothetical clients which also implement the same underlying data model. So far there are none, but it's conceivably possible.
Provides a solid basis for reporting. Fossil's real-time metrics and reporting options are arguably the most powerful and flexible yet seen in an SCM.
Very probably several more things.