/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set ts=2 et sw=2 tw=80: */
#if !defined(ORG_FOSSIL_SCM_PAGES_H_INCLUDED)
#define ORG_FOSSIL_SCM_PAGES_H_INCLUDED
/*
Copyright 2013-2021 The Libfossil Authors, see LICENSES/BSD-2-Clause.txt
SPDX-License-Identifier: BSD-2-Clause-FreeBSD
SPDX-FileCopyrightText: 2021 The Libfossil Authors
SPDX-ArtifactOfProjectName: Libfossil
SPDX-FileType: Code
Heavily indebted to the Fossil SCM project (https://fossil-scm.org).
*****************************************************************************
This file contains only Doxygen-format documentation, split up into
Doxygen "pages", each covering some topic at a high level. This is
not the place for general code examples - those belong with their
APIs.
*/
/** @mainpage libfossil
Forewarning: this API assumes one is familiar with the Fossil SCM,
ideally in detail. The Fossil SCM can be found at:
https://fossil-scm.org
libfossil is an experimental/prototype library API for the Fossil
SCM. This API concerns itself only with the components of fossil
which do not need user interaction or the display of UI components
(including HTML and CLI output). It is intended only to model the
core internals of fossil, off of which user-level applications
could be built.
The project's repository and additional information can be found at:
https://fossil.wanderinghorse.net/r/libfossil/
This code is 100% hypothetical/potential, and does not represent
any Official effort of the Fossil project. It is up for any amount
of change at any time and does not yet have a stable API.
All Fossil users are encouraged to participate in its development,
but if you are reading this then you probably already knew that
:).
This effort does not represent "Fossil Version 2", but provides an
alternate method of accessing and manipulating fossil(1)
repositories. Whereas fossil(1) is a monolithic binary, this API
provides library-level access to (some level of) the fossil(1)
feature set (that level of support grows approximately linearly
with each new commit).
Current status: alpha. Some bits are basically finished but there
is a lot of work left to do. The scope is pretty much all
Fossil-related functionality which does not require a user
interface or direct user interaction, plus some range of utilities
to support those which require a UI/user.
*/
/** @page page_terminology Fossil Terminology
See also: https://fossil-scm.org/home/doc/trunk/www/concepts.wiki
The libfossil API docs normally assume one is familiar with
Fossil-internal terminology, which is of course a silly assumption
to make. Indeed, one of libfossil's goals is to make Fossil more
accessible, partly be demystifying it. To that end, here is a
collection of terms one may come across in the API, along with
their meanings in the context of Fossil...
- REPOSITORY (a.k.a. "repo) is an sqlite database file which
contains all content for a given "source tree." (We will use the
term "source tree" to mean any tree of "source" (documents,
whatever) a client has put under Fossil's supervision.)
- CHECKOUT (a.k.a. "local source tree" or "working copy") refers
to (A) the action of pulling a specific version of a repository's
state from that repo into the local filesystem, and (B) a local
copy "checked out" of a repo. e.g. "he checked out the repo," and
"the changes are in his [local] checkout."
- ARTIFACT is the generic term for anything stored in a repo. More
specifically, ARTIFACT refers to "control structures" Fossil uses
to internally track changes. These artifacts are stored as blobs
in the database, just like any other content. For complete details
and examples, see:
https://fossil-scm.org/home/doc/tip/www/fileformat.wiki
- A MANIFEST is a specific type of ARTIFACT - the type which
records all metadata for a COMMIT operation (which files, which
user, the timestamp, checkin comment, lineage, etc.). For
historical reasons, MANIFEST is sometimes used as a generic term
for ARTIFACT because what the fossil(1)-internal APIs originally
called a Manifest eventually grew into other types of artifacts
but kept the Manifest naming convention. In Fossil developer
discussion, "manifest" most often means what this page calls
ARTIFACT (probably because that how the C code is modelled). The
libfossil API calls uses the term "deck" instead of "manifest" to
avoid ambiguity/confusion (or to move the confusion somewhere
else, at least).
- CHECKIN is the term libfossil prefers to use for COMMIT
MANIFESTS. It is also the action of "checking in"
(a.k.a. "committing") file changes to a repository. A CHECKIN
ARTIFACT can be one of two types: a BASELINE MANIFEST (or BASELINE
CHECKIN) contains a list of all files in that version of the
repository, including their file permissions and the UUIDs of
their content. A DELTA MANFIEST is a checkin record which derives
from a BASELINE MANIFEST and it lists only the file-level changes
which happened between the baseline and the delta, recording any
changes in content, permisions, or name, and recording
deletions. Note that this inheritance of deltas from baselines is
an internal optimization which has nothing to do with checkin
version inheritance - the baseline of any given delta is normally
_not_ its direct checkin version parent.
- BRANCH, FORK, and TAG are all closely related in Fossil and are
explained in detail (with pictures!) at:
https://fossil-scm.org/home/doc/trunk/www/concepts.wiki
In short: BRANCHes and FORKs are two names for the same thing, and
both are just a special-case usage of TAGs.
- MERGE or MERGING: the process of integrating one version of
source code into another version of that source code, using a
common parent version as the basis for comparison. This is
normally fully automated, but occasionally human (and sometimes
Divine) intervention is required to resolve so-called "merge
conflicts," where two versions of a file change the same parts of
a common parent version.
- RID (Record ID) is a reference to the blob.rid field in a
repository DB. RIDs are used extensively throughout the API for
referencing content records, but they are transient values local
to a given copy of a given repository at a given point in
time. They _can_ change, even for the same content, (e.g. a
rebuild can hypothetically change them, though it might not, and
re-cloning a repo may very well change some RIDs). Clients must
never rely on them for long-term reference to SCM'd data - always use
the full UUID of such data. Even though they normally appear to be
static, they are most explicitly NOT guaranteed to be. Nor are
their values guaranteed to imply any meaning, e.g. "higher is
newer" is not necessarily true because synchronization can import
new remote content in an arbitrary order and a rebuild might
import it in random order. The API uses RIDs basically as handles
to arbitrary blob content and, like most C-side handles, must be
considered transient in nature. That said, within the db, records
are linked to each other exclusively using RIDs, so they do have
some persistence guarantees for a given db instance.
*/
/** @page page_APIs High-level API Overview
The primary end goals of this project are to eventually cover the
following feature areas:
- Provide embeddable SCM to local apps using sqlite storage.
- Provide a network layer on top of that for synchronization.
- Provide apps on top of those to allow administration of repos.
To those ends, the fossil APIs cover the following categories of
features:
Filesystem:
- Conversions of strings from OS-native encodings to UTF.
fsl_utf8_to_unicode(), fsl_filename_to_utf8(), etc. These are
primarily used internally but may also be useful for applications
working with files (as most clients will). Actually... most of
these bits are only needed for portability across Windows
platforms.
- Locating a user's home directory: fsl_find_home_dir()
- Normalizing filenames/paths. fsl_file_canonical_name() and friends.
- Checking for existence, size, and type (file vs directory) with
fsl_is_file() and fsl_dir_check(), or the more general-purpose
fsl_stat().
Databases (sqlite):
- Opening/closing sqlite databases and running queries on them,
independent of version control features. See fsl_db_open() and
friends. The actual sqlite-level DB handle type is abstracted out
of the public API, largely to simplify an eventual port from
sqlite3 to sqlite4 or (hypothetically) to other storage back-ends
(not gonna happen - too much work).
- There are lots of utility functions for oft-used operations,
e.g. fsl_config_get_int32() and friends to fetch settings from one
of several different configuration areas (global, repository,
checkout, and "versionable" settings).
- Pseudo-recusive transactions: fsl_db_transaction_begin() and
fsl_db_transaction_end(). sqlite does not support truly nested
transactions, but they can be simulated quite effectively so long
as certain conventions are adhered to.
- Cached statements (an optimization for oft-used queries):
fsl_db_prepare_cached() and friends.
The DB API is (as Brad Harder put so well) "very present" in the
public API. While the core API provides access to the underlying
repository data, it cannot begin to cover even a small portion of
potential use cases. To that end, it exposes the DB API so that
clients who want to custruct their own data can do so. It does
require research into the underlying schemas, but gives
applications the ability to do _anything_ with their repositories
which the core API does not account for. Historically, the ability
to create ad-hoc data structures as needed, in the form of SQL
queries, has accounted for much of Fossil's feature flexibility.
Deltas:
- Creation and application of raw deltas, using Fossil's delta
format, independent of version control features. See
fsl_delta_create() and friends. These are normally used only at
the deepest internal levels of fossil, but the APIs are exposed so
that clients can, if they wish, use them to deltify their own
content independently of fossil's internally-applied
deltification. Doing so is remarkably easy, but completely
unnecessary for content which will be stored in a repo, as Fossil
creates deltas as needed.
SCM:
- A "context" type (fsl_cx) which manages a repository db and,
optionally, a checkout db. Read-only operations on the DB are
working and write functionality (adding repo content) is
ongoing. See fsl_cx, fsl_cx_init(), and friends.
- The fsl_deck class assists in parsing, creating, and outputing
"artifacts" (manifests, control (tags), events, etc.). It gets its
name from it being container for "a collection of cards" (which is
what a Fossil artifact is).
- fsl_content_get() expands a (possibly) deltified blob into its
full form, and fsl_content_blob() can be used to fetch a raw blob
(possibly a raw delta).
- A number of routines exist for converting symbol names to RIDs
(fsl_sym_to_rid()), UUIDs to RIDs (fsl_uuid_to_rid(),
and similar commonly-needed lookups.
Input/Output:
- The API defines several abstractions for i/o interfaces, e.g.
fsl_input_f() and fsl_output_f(), which allow us to accept/emit
data from/to arbitrary streamable (as opposed to random-access)
sources/destinations. A fsl_cx instance is configured with an
output channel, the intention being that all clients of that
context should generate any output through that channel, so that
all compatible apps can cooperate more easily in terms of i/o. For
example, the s2 script binding for libfossil routes fsl_output()
through the script engine's i/o channels, so that any output
generated by libfossil-using code it links to can take advantage
of the script-side output features (such as output buffering,
which is needed for any non-trivial CGI output). That said: the
library-level code does not actually generate output to that
channel, but higher-level code like fcli does, and clients are
encouraged to in order to enable their app's output to be
redirected to an arbitrary UI element, be it a console or UI
widget.
Utilities:
- fsl_buffer, a generic buffer class, is used heavily by the
library. See fsl_buffer and friends.
- fsl_appendf() provides printf()-like functionality, but sends
its output to a callback function (optionally stateful), making it
the one-stop-shop for string formatting within the library.
- The fsl_error class is used to propagate error information
between the libraries various levels and the client.
- The fsl_list class acts as a generic container-of-pointers, and
the API provides several convenience routines for managing them,
traversing them, and cleaning them up.
- Hashing: there are a number of routines for calculating SHA1,
SHA3, and MD5 hashes. See fsl_sha1_cx, fsl_sha3_cx, fsl_md5_cx,
and friends.
- zlib compression is used for storing artifacts. See
fsl_data_is_compressed(), fsl_buffer_compress(), and friends.
These are never needed at the client level, but are exposed "just
in case" a given client should want them.
*/
/** @page page_is_isnot Fossil is/is not...
Through porting the main fossil application into library form,
the following things have become very clear (or been reinforced)...
Fossil is...
- _Exceedingly_ robust. Not only is sqlite literally the single
most robust application-agnostic container file format on the
planet, but Fossil goes way out of its way to ensure that what
gets put in is what gets pulled out. It cuts zero corners on data
integrity, even adding in checks which seem superfluous but
provide another layer of data integrity (i'm primarily talking
about the R-card here, but there are other validation checks). It
does this at the cost of memory and performance (that said, it's
still easily fast enough for its intended uses). "Robust" doesn't
mean that it never crashes nor fails, but that it does so with
(insofar as is technically possible) essentially zero chance of
data loss/corruption.
- Long-lived: the underlying data format is independent of its
storage format. It is, in principal, usable by systems as yet
unconceived by the next generation of programmers. This
implementation is based on sqlite, but the model can work with
arbitrary underlying storage.
- Amazingly space-efficient. The size of a repository database
necessarily grows as content is modified. However, Fossil's use of
zlib-compressed deltas, using a very space-efficient delta format,
leads to tremendous compression ratios. As of this writing (March,
2021), the main Fossil repo contains approximately 5.36GB of
content, were we to check out every single version in its
history. Its repository database is only 64MB, however, equating
to a 83:1 compression ration. Ratios in the range of 20:1 to 40:1
are common, and more active repositories tend to have higher
ratios. The TCL core repository, with just over 15 years of code
history (imported, of course, as Fossil was introduced in 2007),
is (as of September 2013) only 187MB, with 6.2GB of content and a
33:1 compression ratio.
Fossil is not...
- Memory-light. Even very small uses can easily suck up 1MB of RAM
and many operations (verification of the R card, for example) can
quickly allocate and free up hundreds of MB because they have to
compose various versions of content on their way to a specific
version. To be clear, that is total RAM usage, not _peak_ RAM
usage. Peak usage is normally a function of the content it works
with at a given time, often in direct relation to (but
significantly more than) the largest single file processed in a
given session. For any given delta application operation, Fossil
needs the original content, the new content, and the delta all in
memory at once, and may go through several such iterations while
resolving deltified content. Verification of its 'R-card' alone
can require a thousand or more underlying DB operations and
hundreds of delta applications. The internals use caching where it
would save us a significant amount of db work relative to the
operation in question, but relatively high memory costs are
unavoidable. That's not to say we can't optimize a bit, but first
make it work, then optimize it. The library takes care to re-use
memory buffers where it is feasible (and not too intrusive) to do
so, but there is yet more RAM to be optimized away in this regard.
*/
/** @page page_threading Threads and Fossil
It is strictly illegal to use a given fsl_cx instance from more
than one thread. Period.
It is legal for multiple contexts to be running in multiple
threads, but only if those contexts use different
repository/checkout databases. Though access to the storage is,
through sqlite, protected via a mutex/lock, this library does not
have a higher-level mutex to protect multiple contexts from
colliding during operations. So... don't do that. One context, one
repo/checkout.
Multiple application instances may each use one fsl_cx instance to
share repo/checkout db files, but must be prepared to handle
locking-related errors in such cases. e.g. db operations which
normally "always work" may suddenly pause for a few seconds before
giving up while waiting on a lock when multiple applications use
the same database files. sqlite's locking behaviours are
documented in great detail at https://sqlite.org.
*/
/** @page page_artifacts Creating Artifacts
A brief overview of artifact creating using this API. This is targeted
at those who are familiar with how artifacts are modelled and generated
in fossil(1).
Primary artifact reference:
https://fossil-scm.org/home/doc/trunk/www/fileformat.wiki
In fossil(1), artifacts are generated via the careful crafting of
a memory buffer (large string) in the format described in the
document above. While it's relatively straightforward to do, there
are lots of potential gotchas, and a bug can potentially inject
"bad data" into the repo (though the verify-before-commit process
will likely catch any problems before the commit is allowed to go
through). The libfossil API uses a higher-level (OO) approach,
where the user describes a "deck" of cards and then tells the
library to save it in the repo (fsl_deck_save()) or output it to
some other channel (fsl_deck_output()). The API ensures that the
deck's cards get output in the proper order and that any cards
which require special treatment get that treatment (e.g. the
"fossilize" encoding of certain text fields). The "deck" concept
is equivalent to Artifact in fossil(1), but we use the word deck
because (A) Artifact is highly ambiguous in this context and (B)
deck is arguably the most obvious choice for the name of a type
which acts as a "container of cards."
Ideally, client-level code will never have to create an artifact
via the fsl_deck API (because doing so requires a fairly good
understanding of what the deck is for in the first place,
including the individual Cards). The public API strives to hide
those levels of details, where feasible, or at least provide
simpler/safer alternatives for basic operations. Some operations
may require some level of direct work with a fsl_deck
instance. Likewise, much read-only functionality directly exposes
fsl_deck to clients, so some familiarity with the type and its
APIs will be necessary for most clients.
The process of creating an artifact looks a lot like the following
code example. We have elided error checking for readability
purposes, but in fact this code has undefined behaviour if error
codes are not checked and appropriately reacted to.
```
fsl_deck deck = fsl_deck_empty;
fsl_deck * d = &deck ; // for typing convenience.
// Doxygen bug ^^^^^^^ requires space before semicolon!
fsl_deck_init( fslCtx, d, FSL_SATYPE_CONTROL ); // must come first
fsl_deck_D_set( d, fsl_julian_now() );
fsl_deck_U_set( d, "your-fossil-name", -1 );
fsl_deck_T_add( d, FSL_TAGTYPE_ADD, "...uuid being tagged...",
"tag-name", "optional tag value");
...
// Now output it to stdout:
fsl_deck_output( f, d, fsl_output_f_FILE, stdout );
// See also: fsl_deck_save(), which stores it in the db and
// "crosslinks" it.
fsl_deck_finalize(d);
```
The order the cards are added to the deck is irrelevant - they
will be output in the order specified by the Fossil specs
regardless of their insertion order. Each setter/adder function
knows, based on the deck's type (set via fsl_deck_init()), whether
the given card type is legal, and will return an error (probably
FSL_RC_TYPE) if an attempt is made to add a card which is illegal
for that deck type. Likewise, fsl_deck_output() and
fsl_deck_save() confirm that the decks they are given contain (A)
only allowed cards and (B) have all required
cards. fsl_deck_output() will "unshuffle" the cards, making sure
they're in the correct order.
Sidebar: normally outputing a structure can use a const form of
that structure, but the traversal of F-cards in a deck requires
(for the sake of delta manifests) using a non-const cursor. Thus
outputing a deck requires a non-const instance. If it weren't for
delta manifests, we could be "const-correct" here.
*/
/** @page page_transactions DB Transactions
The fsl_db_transaction_begin() and fsl_db_transaction_end()
functions implement a basic form of recursive transaction,
allowing the library to start and end transactions at any level
without having to know whether a transaction is already in
progress (sqlite3 does not natively support nested
transactions). A rollback triggered in a lower-level transaction
will propagate the error back through the transaction stack and
roll back the whole transaction, providing us with excellent error
recovery capabilities (meaning we can always leave the db in a
well-defined state).
It is STRICTLY ILLEGAL to EVER begin a transaction using "BEGIN"
or end a transaction by executing "COMMIT" or "ROLLBACK" directly
on a fsl_db instance. Doing so bypasses internal state which needs
to be kept abreast of things and will cause Grief and Suffering
(on the client's part, not mine).
Tip: implementing a "dry-run" mode for most fossil operations is
trivial by starting a transaction before performing the
operations. Many operations run in a transaction, but if the
client starts one of his own they can "dry-run" any op by simply
rolling back the transaction he started. Abstractly, that looks
like this pseudocode:
```
db.begin();
fsl.something();
fsl.somethingElse();
if( dryRun ) db.rollback();
else db.commit();
```
*/
/** @page page_code_conventions Code Conventions
Project and Code Conventions...
Foreword: all of this more or less evolved organically or was
inherited from fossil(1) (where it evolved organically, or was
inherited from sqilte (where it evol...)), and is written up here
more or less as a formality. Historically i've not been a fan of
coding conventions, but as someone else put it to me, "the code
should look like it comes from a single source," and the purpose
of this section is to help orient those looking to hack in the
sources. Note that most of what is said below becomes obvious
within a few minutes of looking at the sources - there's nothing
earth-shatteringly new nor terribly controversial here.
The Rules/Suggestions/Guidelines/etc. are as follows...
- C99 is the basis. It was C89 until 2021-02-12.
- The canonical build environment uses the most restrictive set of
warning/error levels possible. It is highly recommended that
non-canonical build environments do the same. Adding -Wall -Werror
-pedantic does _not_ guaranty that all C compliance/portability
problems can be caught by the compiler, but it goes a long way in
helping us to write clean code. The clang compiler is particularly
good at catching subtle foo-foo's such as uninitialized variables.
- API docs (as you may have already noticed), does not (any
longer) follow Fossil's comment style, but instead uses
Doxygen-friendly formatting. Each comment block MUST start with
two or more asterisks, or '*!', or doxygen apparently doesn't
understand it
(https://www.stack.nl/~dimitri/doxygen/manual/docblocks.html). When
adding code snippets and whatnot to docs, please use doxygen
conventions if it is not too much of an inconvenience. All public
APIs must be documented with a useful amount of detail. If you
hate documenting, let me know and i'll document it (it's what i do
for fun).
- Public API members have a fsl_ or FSL_ prefix (fossil_ seems too
long). For private/static members, anything goes. Optional or
"add-on" APIs (e.g. ::fcli) may use other prefixes, but are
encouraged use an "f-word" (as it were), simply out of deference
to long-standing software naming conventions.
- Internal APIs, especially non-static ones, start with `fsl__` or
`FSL__`, with two underscores. Such APIs must never be used in
client-side code.
- Public-API structs and functions use lower_underscore_style().
Static/internal APIs may use different styles. It's not uncommon
to see UpperCamelCase for file-scope structs.
- Function parameters and function-scope vars have no set
conventions - implementors are free to name those however they
like.
- Overall style, especially scope blocks and indentation, should
follow Fossil's. We are _not at all_ picky about whether or not
there is a space after/before parens in if( foo ), and similar
small details, just the overall code pattern and two-space
indentation. Hard tabs are verboten.
- Structs and enums all get the optional typedef so that they do
not need to be qualified with 'struct' resp. 'enum' when
used. Because of how doxygen tracks those, the typedef should be
separate from the struct declaration, rather than combinding
those into a single declaration.
- Function typedefs are named fsl_XXX_f. Implementations of such
typedefs/interfaces are typically named fsl_XXX_f_SUFFIX(), where
SUFFIX describes the implementation's
specialization. e.g. fsl_output_f() is a callback
typedef/interface and fsl_output_f_FILE() is a concrete
implementation for FILE handles.
- Enums tend to be named fsl_XXX_e.
- Functions follow the naming pattern prefix_NOUN_VERB(), rather
than the more C-conventional prefix_VERB_NOUN(),
e.g. fsl_foo_get() and fsl_foo_set() rather than fsl_get_foo() and
fsl_get_foo(). The primary reasons are (A) sortability for
document processors and (B) they more naturally match with OO API
conventions, e.g. `noun.verb()`. A few cases knowingly violate
this convention for the sake of readability or sorting of several
related functions (e.g. fsl_db_get_TYPE() instead of
fsl_db_TYPE_get()).
- Structs intended to be creatable on the stack are accompanied by
a const instance named fsl_STRUCT_NAME_empty, and possibly by a
macro named fsl_STRUCT_NAME_empty_m, both of which are
"default-initialized" instances of that struct. This is superiour
to using `memset()` for struct initialization because we can
define (and document) arbitrary default values and all clients who
copy-construct them are unaffected by many types of changes to the
struct's signature (though they may need a recompile). The
intention of the fsl_STRUCT_NAME_empty_m macro is to provide a
struct-embeddable form for use in other structs or
copy-initialization of const structs, and the `_m` macro is always
used to initialize its const struct counterpart. e.g. the library
guarantees that fsl_cx_empty_m (a macro representing an empty
fsl_cx instance) holds the same default values as fsl_cx_empty (a
const fsl_cx value).
- Returning int vs fsl_int_t vs fsl_size_t: int is used as a
conventional result code. fsl_int_t is often used as a signed
length-style result code (e.g. printf() semantics). Unsigned
ranges use fsl_size_t. Ints are (also) used as a "triplean" (3
potential values, e.g. <0, 0, >0). fsl_int_t also guarantees that
it will be 64-bit if available, so can be used for places where
large values are needed but a negative value is legal (or handy),
e.g. the final arguments for fsl_strndup() and
fsl_buffer_append(). The use of the fsl_xxx_t typedefs, rather
than (unsigned) int, is primarily for readability/documentation,
e.g. so that readers can know immediately that the function uses a
given argument or return value following certain API-wide
semantics. It also allows us to better define platform-portable
printf/scanf-style format modifiers for them (analog to C99's
PRIi32 and friends), which often come in handy.
- Signed vs. unsigned types for size/length arguments: use the
fsl_int_t (signed) argument type when the client may legally pass
in a negative value as a hint that the API should use fsl_strlen()
(or similar) to determine a byte array's length. Use fsl_size_t
when no automatic length determination is possible (or desired),
to "force" the client to pass the proper length. Internally
fsl_int_t is used in some places where fsl_size_t "should" be used
because some ported-in logic relies on loop control vars being
able to go negative. Additionally, fossil internally uses negative
blob lengths to mark phantom blobs, and care must be taken when
using fsl_size_t with those.
- Functions taking elipses (...) are accompanied by a va_list
counterpart named the same as the (...) form plus a trailing
'v'. e.g. fsl_appendf() and fsl_appendfv(). We do not use the
printf()/vprintf() convention because that hoses sorting of the
functions in generated/filtered API documentation.
- Error handling/reporting: please keep in mind that the core code
is a library, not an application. The main implication is that
all lib-level code needs to check for errors whereever they can
happen (e.g. on every single memory allocation, of which there are
many) and propagate errors to the caller, to be handled at his
discretion. The app-level code (::fcli) is not particularly strict
in this regard, and installs its own allocator which abort()s on
allocation error, which simplifies app-side code somewhat
vis-a-vis lib-level code. When reporting an error can be improved
by the inclusion of an error string, functions like
fsl_cx_err_set() can be used to report the error. Several of the
high-level types in the API have fsl_error object member which
contains such error state. The APIs which use that state take care
to use-use the error string memory whenever possible, so setting
an error string is often a non-allocating operation.
*/
/** @page page_fossil_arch Fossil Architecture Overview
An introduction to the Fossil architecture. These docs
are basically just a reformulation of other, more detailed,
docs which can be found via the main Fossil site, e.g.:
- https://fossil-scm.org/home/doc/trunk/www/concepts.wiki
- https://fossil-scm.org/home/doc/trunk/www/fileformat.wiki
Fossil's internals are fundamentally broken down into two basic
parts. The first is a "collection of blobs." The simplest way to
think of this (and it's not far from the full truth) is a
directory containing lots of files, each one named after a hash of
its contents. This pool contains ALL content required for a
repository - all other data can be generated from data contained
here. Included in the blob pool are so-called Artifacts. Artifacts
are simple text files with a very strict format, which hold
information regarding the idententies of, relationships involving,
and other metadata for each type of blob in the pool. The most
fundamental Artifact type is called a Manifest, and a Manifest
tells us, amongst other things, which of the hash-based file names
has which "real" file name, which version the parent (or parents!)
is (or are), and other data required for a "commit" operation.
The blob pool and the Manifests are all a Fossil repository really
needs in order to function. On top of that basis, other forms of
Artifacts provide features such as tagging (which is the basis of
branching and merging), wiki pages, and tickets. From those
Artifacts, Fossil can create/calculate all sorts of
information. For example, as new Artifacts are inserted it
transforms the Artifact's metadata into a relational model which
sqlite can work with. That leads us to what is conceptually the
next-higher-up level, but is in practice a core-most component...
Storage. Fossil's core model is agnostic about how its blobs are
stored, but libfossil and fossil(1) both make heavy use of sqlite
to implement many of their features. These include:
- Transaction-capable storage. It's almost impossible to corrupt a
Fossil db in normal use. sqlite3 offers literally the most robust
general-purpose file format on the planet.
- The storage of the raw blobs.
- Artifact metadata is transformed into various DB structures
which allow libfossil to traverse historical data much more
efficiently than would be possible without a db-like
infrastructure (and everything that implies). These structures are
kept up to date as new Artifacts are stored in a repository,
either via local edits or synching in remote content. These data
are incrementally updated as changes are made to a repo.
- A tremendous amount of the "leg-work" in processing the
repository state is handled by SQL queries, without which the
library would easily require 5-10x more code in the form of
equivalent hard-coded data structures and corresponding
functionality. The db approach allows us to ad-hoc structures as
we need them, providing us a great deal of flexibility.
All content in a Fossil repository is in fact stored in a single
database file. Fossil additionally uses another database (a
"checkout" db) to keep track of local changes, but the repo
contains all "fossilized" content. Each copy of a repo is a
full-fledged repo, each capable of acting as a central copy for
any number of clones or checkouts.
That's really all there is to understand about Fossil. How it does
its magic, keeping everything aligned properly, merging in
content, how it stores content, etc., is all internal details
which most clients will not need to know anything about in order
to make use of fossil(1). Using libfossil effectively, though,
does require learning _some_ amount of how Fossil works. That will
require taking some time with _other_ docs, however: see the
links at the top of this section for some starting points.
Sidebar:
- The only file-level permission Fossil tracks is the "executable"
(a.k.a. "+x") bit. It internally marks symlinks as a permission
attribute, but that is applied much differently than the
executable bit and only does anything useful on platforms which
support symlinks.
*/
#endif
/* ORG_FOSSIL_SCM_PAGES_H_INCLUDED */