/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set ts=2 et sw=2 tw=80: */
#if !defined(NET_FOSSIL_SCM_PAGES_H_INCLUDED)
#define NET_FOSSIL_SCM_PAGES_H_INCLUDED
/*
Copyright (c) 2013 D. Richard Hipp
This program is free software; you can redistribute it and/or
modify it under the terms of the Simplified BSD License (also
known as the "2-Clause License" or "FreeBSD License".)
This program is distributed in the hope that it will be useful,
but without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose.
Author contact information:
drh@hwaci.com
http://www.hwaci.com/drh/
*****************************************************************************
This file contains only Doxygen-format documentation, split up into
Doxygen "pages", each covering some topic at a high level. This is
not the place for general code examples - those belong with their
APIs.
*/
/** @mainpage libfossil
Forewarning: this API assumes one is familiar with the Fossil SCM,
ideally in detail. The Fossil SCM can be found at:
http://fossil-scm.org
libfossil is an experimental/prototype library API for the Fossil
SCM. This API concerns itself only with the components of fossil
which do not need user interaction or the display of UI components
(including HTML and CLI output). It is intended only to model the
core internals of fossil, off of which user-level applications
could be built.
The project's repository and additional information can be found at:
http://fossil.wanderinghorse.net/repos/libfossil/
This code is 100% hypothetical/potential, and does not represent
any Official effort of the Fossil project. It is up for any amount
of change at any time and does not yet have a stable API.
All Fossil users are encouraged to participate in its development,
but if you are reading this then you probably already knew that
:).
This effort does not represent "Fossil Version 2", but provides an
alternate method of accessing and manipulating fossil(1)
repositories. Whereas fossil(1) is a monolithic binary, this API
provides library-level access to (some level of) the fossil(1)
feature set (that level of support grows approximately linearly
with each new commit).
Current status: alpha. Some bits are basically finished but there
is a lot of work left to do. The scope is pretty much all
Fossil-related functionality which does not require a user
interface or direct user interaction, plus some range of utilities
to support those which require a UI/user.
*/
/** @page page_terminology Fossil Terminology
See also: http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki
The libfossil API docs normally assume one is familiar with
Fossil-internal terminology, which is of course a silly assumption
to make. Indeed, one of libfossil's goals is to make Fossil more
accessible, partly be demystifying it. To that end, here is a
collection of terms one may come across in the API, along with
their meanings in the context of Fossil...
- REPOSITORY (a.k.a. "repo) is an sqlite database file which
contains all content for a given "source tree." (We will use the
term "source tree" to mean any tree of "source" (documents,
whatever) a client has put under Fossil's supervision.)
- CHECKOUT (a.k.a. "local source tree" or "working copy") refers
to (A) the action of pulling a specific version of a repository's
state from that repo into the local filesystem, and (B) a local
copy "checked out" of a repo. e.g. "he checked out the repo," and
"the changes are in his [local] checkout."
- ARTIFACT is the generic term for anything stored in a repo. More
specifically, ARTIFACT refers to "control structures" Fossil uses
to internally track changes. These artifacts are stored as blobs
in the database, just like any other content. For complete details
and examples, see:
http://fossil-scm.org/index.html/doc/tip/www/fileformat.wiki
- A MANIFEST is a specific type of ARTIFACT - the type which
records all metadata for a COMMIT operation (which files, which
user, the timestamp, checkin comment, lineage, etc.). For
historical reasons, MANIFEST is sometimes used as a generic term
for ARTIFACT because what the fossil(1)-internal APIs originally
called a Manifest eventually grew into other types of artifacts
but kept the Manifest naming convention. In Fossil developer
discussion, "manifest" most often means what this page calls
ARTIFACT (probably because that how the C code is modelled). The
libfossil API calls uses the term "deck" instead of "manifest" to
avoid ambiguity/confusion (or to move the confusion somewhere
else, at least).
- CHECKIN is the term libfossil prefers to use for COMMIT
MANIFESTS. It is also the action of "checking in"
(a.k.a. "committing") file changes to a repository. A CHECKIN
ARTIFACT can be one of two types: a BASELINE MANIFEST (or BASELINE
CHECKIN) contains a list of all files in that version of the
repository, including their file permissions and the UUIDs of
their content. A DELTA MANFIEST is a checkin record which derives
from a BASELINE MANIFEST and it lists only the file-level changes
which happened between the baseline and the delta, recording any
changes in content, permisions, or name, and recording
deletions. Note that this inheritance of deltas from baselines is
an internal optimization which has nothing to do with checkin
version inheritance - the baseline of any given delta is normally
_not_ its direct checkin version parent.
- BRANCH, FORK, and TAG are all closely related in Fossil and are
explained in detail (with pictures!) at:
http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki
In short: BRANCHes and FORKs are two names for the same thing, and
both are just a special-case usage of TAGs.
- MERGE or MERGING: the process of integrating one version of
source code into another version of that source code, using a
common parent version as the basis for comparison. This is
normally fully automated, but occasionally human (and sometimes
Divine) intervention is required to resolve so-called "merge
conflicts," where two versions of a file change the same parts of
a common parent version.
- RID (Record ID) is a reference to the blob.rid field in a
repository DB. RIDs are used extensively throughout the API for
referencing content records, but they are transient values local
to a given copy of a given repository at a given point in
time. They _can_ change, even for the same content, (e.g. a
rebuild can hypothetically change them, though it might not, and
re-cloning a repo may very well change some RIDs). Clients must
never rely on them for long-term reference to SCM'd data - always use
the full UUID of such data. Even though they normally appear to be
static, they are most explicitly NOT guaranteed to be. Nor are
their values guaranteed to imply any meaning, e.g. "higher is
newer" is not necessarily true because synchronization can import
new remote content in an arbitrary order and a rebuild might
import it in random order. The API uses RIDs basically as handles
to arbitrary blob content and, like most C-side handles, must be
considered transient in nature. That said, within the db, records
are linked to each other exclusively using RIDs, so they do have
some persistence guarantees for a given db instance.
More to come...
*/
/** @page page_APIs High-level API Overview
The primary end goals of this project are to eventually cover the
following feature areas:
- Provide embeddable SCM to local apps using sqlite storage.
- Provide a network layer on top of that for synchronization.
- Provide apps on top of those to allow administration of repos.
To those ends, the fossil APIs cover the following categories of
features:
Filesystem:
- Conversions of strings from OS-native encodings to UTF.
fsl_utf8_to_unicode(), fsl_filename_to_utf8(), etc. These are
primarily used internally but may also be useful for applications
working with files (as most clients will). Actually... most of
these bits are only needed for portability across Windows
platforms.
- Locating a user's home directory: fsl_find_home_dir()
- Normalizing filenames/paths. fsl_file_canonical_name() and friends.
- Checking for existence, size, and type (file vs directory) with
fsl_is_file() and fsl_dir_check(), or the more general-purpose
fsl_stat().
Databases (sqlite):
- Opening/closing sqlite databases and running queries on them,
independent of version control features. See fsl_db_open() and
friends. The actual sqlite-level DB handle type is abstracted out
of the public API, largely to simplify an eventual port from
sqlite3 to sqlite4 or (hypothetically) to other storage back-ends
(not gonna happen - too much work).
- There are lots of utility functions for oft-used operations,
e.g. fsl_config_get_int32() and friends to fetch settings from one
of the three different configuration areas (global, repository,
and checkout).
- Pseudo-recusive transactions: fsl_db_transaction_begin() and
fsl_db_transaction_end().
- Cached statements (an optimization for oft-used queries):
fsl_db_prepare_cached() and friends.
The DB API is (as Brad put so well) "very present" in the public
API. While the core API provides access to the underlying
repository data, it cannot begin to cover even a small portion of
potential use cases. To that end, it exposes the DB API so that
clients who want to custruct their own data can do so. It does
require research into the underlying schemas, but gives
applications the ability to do _anything_ with their repositories
which the core API does not account for. Historically, the ability
to create ad-hoc data structures as needed, in the form of SQL
queries, has accounted for much of Fossil's feature flexibility.
Deltas:
- Creation and application of raw deltas, using Fossil's delta
format, independent of version control features. See
fsl_delta_create() and friends. These are normally used only at
the deepest internal levels of fossil, but the APIs are exposed so
that clients can, if they wish, use them to deltify their own
content independently of fossil's internally-applied
deltification. Doing so is remarkably easy, but completely
unnecessary for content which will be stored in a repo, as Fossil
creates deltas as needed.
SCM:
- A "context" type (fsl_cx) which manages a repository db and,
optionally, a checkout db. Read-only operations on the DB are
working and write functionality (adding repo content) is
ongoing. See fsl_cx, fsl_cx_init(), and friends.
- The fsl_deck class assists in parsing, creating, and outputing
"artifacts" (manifests, control (tags), events, etc.). It gets its
name from it being container for "a collection of cards" (which is
what a Fossil artifact is).
- fsl_content_get() expands a (possibly) deltified blob into its
full form, and fsl_content_blob() can be used to fetch a raw blob
(possibly a raw delta).
- A number of routines exist for converting symbol names to RIDs
(fsl_sym_to_rid()), UUIDs to RIDs (fsl_uuid_to_rid(),
and similar commonly-needed lookups.
Input/Output:
- The API defines several abstractions for i/o interfaces, e.g.
fsl_input_f() and fsl_output_f(), which allow us to accept/emit
data from/to arbitrary sources/destinations. A fsl_cx instance is
configured with an output channel, the intention being that all
clients of that context should generate any output through that
channel, so that all compatible apps can cooperate more easily in
terms of i/o. For example, the th1ish script binding for libfossil
routes fsl_output() through the script's i/o channels, so that any
output generated by libfossil-using code it links to can take
advantage of the script-side output features (such as output
buffering, which is needed for any non-trivial CGI output).
Utilities:
- fsl_buffer, a generic buffer class, is used heavily by the
library. See fsl_buffer and friends.
- fsl_appendf() provides printf()-like functionality, but sends
its output to a callback function (optionally stateful), making it
the one-stop-shop for string formatting within the library.
- The fsl_error class is used to propagate error information
between the libraries various levels and the client.
- The fsl_list class acts as a generic container-of-pointers, and
the API provides several convenience routines for managing them,
traversing them, and cleaning them up.
- Hashing: there are a number of routines for calculating SHA1 and
MD5 hashes. See fsl_sha1_cx, fsl_md5_cx, and friends. We haven't yet
had need of an actual hash table class.
- zlib compression is used for storing artifacts. See
fsl_data_is_compressed(), fsl_buffer_compress(), and friends.
*/
/** @page page_porting_checklist Porting Checklist
An overview of what library-level features are implemented and
what's left to do...
- Db abstraction layer: complete and more or less stable.
- Infrastructure for opening/closing checkouts/repos
works. Infrastructure for a config db is in place.
- Fetching blob content (raw or delta-applied) and low-level
content saving is working.
- Artifact (e.g. manifest) parsing, generating, and delta manifest
baseline traversal works. Most artifacts can be exported from a
canonical Fossil repo then parsed and exported by this API with
100% fidelity, with the minor exception that _some_ timestamps
(D-cards) differ by a millisecond (round-trip precision change),
which changes their hash. So far i have only see the imprecision
affect "artifically generated" artifacts, not "real" ones. Artifacts
are never "round-tripped" like that in real use, anyway - it's only
for testing the parser and generator.
- Adding new control artifacts (tag changes) is basically working.
- Low-level delta generation and application is working, as well
as the (incidentally unrelated) diff-generation code (context- and
side-by-side).
- Manifest crosslinking. This is a large part of what goes on
during any changes to a repository. Most of the work is finished
here but there are still some cases to handle (namely tickets) and
obscene amounts of testing to be done. And a testing
infrastructure needs to be architected and put into place.
- Schema initialization/creation is complete. The rebuild process
(closely related but far more intricate) is far down the list of
TODOs.
- Wiki features are basically working: loading/saving, but
it needs APIs for working with wiki history.
Actively in progress (today==March 14, 2014):
- Event bits
- Application-level bits (::fcli).
- "vfile" (checkout-related) infrastructure is mostly ported
in. This includes checkin support.
- Tickets APIs have been started but have a low priority. The v1
impl requires a good deal of application-level infrastructure
(namely TH1), and there are no plans to port TH1 in at the library
level.
- All of the bits needed for performing a checkout are in place
with the exception of UNDO support and the actual creation of
the checkout db (but we have all the pieces needed for that).
Areas which have not yet been started or where no notable
progress has yet been made, in no particular order:
- Handling of symlinks in a repo.
- The 'rebuild' operation, i think, will essentially be the
ultimate test of the core library components. If it can do that,
it can "probably" do anything else.
- UI. The library has no UI, of course, but as it is fleshed out
one may eventually be needed, even if it's only a CLI shell.
- Synchronization. There are lots of underlying bits to finish
before this can be implemented.
- Networking. Far down the list of TODOs. The core library needs know
nothing about networking.
- "Received from" (rcvid field) info on artifacts. In v1 this is
tied closely to the network layer.
- Versionable config settings.
- Application/honoring of certain config
settings. e.g. ignore-glob and friends are currently not honored,
and case-insensitivity support is completely untested.
*/
/** @page page_is_isnot Fossil is/is not...
Through porting the main fossil application into library form,
the following things have become very clear (or been reinforced)...
Fossil is...
- _Exceedingly_ robust. Not only is sqlite literally the single
most robust application-agnostic container file format on the
planet, but Fossil goes way out of its way to ensure that what
gets put in is what gets pulled out. It cuts zero corners on data
integrity, even adding in checks which seem superfluous but
provide another layer of data integrity (i'm primarily talking
about the R-card here, but there are other validation checks). It
does this at the cost of memory and performance (that said, it's
still easily fast enough for its intended uses). "Robust" doesn't
mean that it never crashes nor fails, but that it does so with
(insofar as is technically possible) essentially zero chance of
data loss/corruption.
- Long-lived: the underlying data format is independent of its
storage format. It is, in principal, usable by systems as yet
unconceived by the next generation of programmers. This
implementation is based on sqlite, but the model can work with
arbitrary underlying storage.
- Amazingly space-efficient. The size of a repository database
necessarily grows as content is modified. However, Fossil's use of
zlib-compressed deltas, using a very space-efficient delta format,
leads to tremendous compression ratios. As of this writing
(September, 2013), the main Fossil repo contains approximately
1.3GB of content, were we to check out every single version in its
history. Its repository database is only 42MB, however, equating
to a 32:1 compression ration. Ratios in the range of 20:1 to 40:1
are common, and more active repositories tend to have higher
ratios. The TCL core repository, with just over 15 years of code
history (imported, of course, as Fossil was introduced in 2007),
is only 187MB, with 6.2GB of content and a 33:1 compression ratio.
Fossil is not...
- Memory-light. Even very small uses can easily suck up 1MB of RAM
and many operations (verification of the R card, for example) can
quickly allocate and free up hundreds of MB because they have to
compose various versions of content on their way to a specific
version. Tto be clear, that is total RAM usage, not _peak_ RAM
usage. Peak usage is normally a function of the content it works
with at a given time. For any given delta application operation,
Fossil needs the original content, the new content, and the delta
all in memory at once, and may go through several such iterations
while resolving deltified content. Verification of its 'R-card'
alone can require a thousand or more underlying DB operations and
hundreds of delta applications. The internals use caching where it
would save us a significant amount of db work relative to the
operation in question, but relatively high memory costs are
unavoidable. That's not to say we can't optimize a bit, but first
make it work, then optimize it. The library takes care to re-use
memory buffers where it is feasible (and not too intrusive) to do
so, but there is yet more RAM to be optimized away in this regard.
*/
/** @page page_threading Threads and Fossil
It is strictly illegal to use a given fsl_cx instance from more
than one thread. Period.
It is legal for multiple contexts to be running in multiple
threads, but only if those contexts use different
repository/checkout databases. Though access to the storage is,
through sqlite, protected via a mutex/lock, this library does not
have a higher-level mutex to protect multiple contexts from
colliding during operations. So... don't do that. One context, one
repo/checkout.
Multiple application instances may each use one fsl_cx instance to
share repo/checkout db files, but must be prepared to handle
locking-related errors in such cases. e.g. db operations which
normally "always work" may suddenly pause for a few seconds before
giving up while waiting on a lock when multiple applications use
the same database files. sqlite's locking behaviours are
documented in great detail at http://sqlite.org.
*/
/** @page page_artifacts Creating Artifacts
A brief overview of artifact creating using this API. This is targeted
at those who are familiar with how artifacts are modelled and generated
in fossil(1).
Primary artifact reference:
http://fossil-scm.org/index.html/doc/trunk/www/fileformat.wiki
In fossil(1), artifacts are generated via the careful crafting of
a memory buffer (large string) in the format described in the
document above. While it's relatively straightforward to do, there
are lots of potential gotchas, and a bug can potentially inject
"bad data" into the repo (though the verify-before-commit process
will likely catch any problems before the commit is allowed to go
through). The libfossil API uses a higher-level (OO) approach,
where the user describes a "deck" of cards and then tells the
library to save it in the repo (fsl_deck_save()) or output it to
some other channel (fsl_deck_output()). The API ensures that the
deck's cards get output in the proper order and that any cards
which require special treatment get that treatment (e.g. the
"fossilize" encoding of certain text fields). The "deck" concept
is equivalent to Artifact in fossil(1), but we use the word deck
because (A) Artifact is highly ambiguous in this context and (B)
deck is arguably the most obvious choice for the name of a type
which acts as a "container of cards."
Ideally, client-level code will never have to create an artifact
via the fsl_deck API (because doing so requires a fairly good
understanding of what the deck is for in the first place,
including the individual Cards). The public API strives to hide
those levels of details, where feasible, or at least provide
simpler/safer alternatives for basic operations. Some operations
may require some level of direct work with a fsl_deck
instance. Likewise, much read-only functionality directly exposes
fsl_deck to clients, so some familiarity with the type and its
APIs will be necessary for most clients.
The process of creating an artifact looks a lot like the following
code example. We have elided error checking for readability
purposes, but in fact this code has undefined behaviour if error
codes are not checked and appropriately reacted to.
@code
fsl_deck deck = fsl_deck_empty;
fsl_deck * d = &deck; // for typing convenience
fsl_deck_init( fslCtx, d, FSL_CATYPE_CONTROL ); // must come first
fsl_deck_D_set( d, fsl_julian_now() );
fsl_deck_U_set( d, "your-fossil-name", -1 );
fsl_deck_T_add( d, FSL_TAGTYPE_ADD, "...uuid being tagged...",
"tag-name", "optional tag value");
...
// unshuffle is necessary when using multi-cards which may
// need sorting (tags, filenames, etc.):
fsl_deck_unshuffle(d, 0);
// Unshuffling is done by the client because the deck is const
// when we output it:
fsl_deck_output( f, d, fsl_output_f_FILE, stdout );
// note that fsl_deck_save() does the unshuffle itself.
fsl_deck_finalize(d);
@endcode
The order the cards are added to the deck is irrelevant - they
will be output in the order specified by the Fossil specs
regardless of their insertion order. Each setter/adder function
knows, based on the deck's type (set via fsl_deck_init()), whether
the given card type is legal, and will return an error (probably
FSL_RC_TYPE) if an attempt is made to add a card which is illegal
for that deck type. Likewise, fsl_deck_output() and
fsl_deck_save() confirm that the decks they are given contain (A)
only allowed cards and (B) have all required
cards. fsl_deck_save() also sorts any "multi-cards" which need it
(e.g. T- and F-cards).
*/
/** @page page_transactions DB Transactions
The fsl_db_transaction_begin() and fsl_db_transaction_end()
functions implement a basic form of recursive transaction,
allowing the library to start and end transactions at any level
without having to know whether a transaction is already in
progress (sqlite3 does not natively support nested
transactions). A rollback triggered in a lower-level transaction
will propagate the error back through the transaction stack and
roll back the whole transaction, providing us with excellent error
recovery capabilities (meaning we can always leave the db in a
well-defined state).
It is STRICTLY ILLEGAL to EVER begin a transaction using "BEGIN"
or end a transaction by executing "COMMIT" or "ROLLBACK" directly
on a db handle which associated with a fsl_cx instances. Doing so
bypasses internal state which needs to be kept abreast of things
and will cause Grief and Suffering (on the client's part, not
mine).
Tip: implementing a "dry-run" mode for most fossil operations is
trivial by starting a transaction before performing the
operations. Many operations run in a transaction, but if the
client starts one of his own he can "dry-run" any op by simply
rolling back the transaction he started. Abstractly, that
looks like this pseudocode:
@code
db.begin();
fsl.something();
fsl.somethingElse();
if( dryRun ) db.rollback();
else db.commit();
@endcode
*/
/** @page page_code_conventions Code Conventions
Project and Code Conventions...
Foreward: all of this more or less evolved organically or was
inherited from fossil(1) (where it evolved organically, or was
inherited from sqilte (where it evol...)), and is written up here
more or less as a formality. Historically i've not been a fan of
coding conventions, but as someone else put it to me, "the code
should look like it comes from a single source," and the purpose
of this section is to help orient those looking to hack in the
sources. Note that most of what is said below becomes obvious
within a few minutes of looking at the sources - there's nothing
earth-shatteringly new nor terribly controversial here.
The Rules/Suggestions/Guidelines/etc. are as follows...
- C89 whereever possible, with the exception that we optionally
use the C99-specified fixed integer types and their standard
formatting strings when possible (if the platform has them
resp. if the configuration header is configured for them). We also
use/tolerate 'long long' (via sqlite3), which is not strictly C89
but is supported on all modern compilers even when compiling in
C89 mode. For gcc and workalike-compiler, the -Wno-long-long flag
can be used to suppress warnings regarding non-standarization of
that type. (Whether or not those warnings appear depends on other
warning levels.) Apropos warning levels...
- The canonical build environment uses the most restrictive set of
warning/error levels possible, with the exception of tolerating
'long long', as mentioned above. It is highly recommended that
non-canonical build environments do the same. Adding -Wall -Werror
-pedantic does _not_ guaranty that all C compliance/portability
problems can be caught by the compiler, but it goes a long way in
helping us to write clean code. The clang compiler is particularly
good at catching minor foo-foo's such as uninitialized variables.
- API docs (as you have probably already noticed), does not (any
longer) follow Fossil's comment style, but instead uses
Doxygen-friendly formatting. Each comment block MUST start with
two or more asterisks, or '*!', or doxygen apparently doesn't
understand it
(http://www.stack.nl/~dimitri/doxygen/manual/docblocks.html). When
adding code snippets and whatnot to docs, please use doxygen
conventions if it is not too much of an inconvenience. All public
APIs must be documented with a useful amount of detail. If you
hate documenting, let me know and i'll document it (it's what i do
for fun).
- Public API members have a fsl_ or FSL_ prefix (fossil_ seems too
long?). For private/static members, anything goes. Optional or
"add-on" APIs (e.g. ::fcli) may use other prefixes, but are
encouraged use an "f-word" (as it were), simply out of deference
to long-standing software naming conventions.
- Structs and functions use lower_underscore_style()
- Overall style, especially scope blocks and indentation, should
follow Fossil v1.x. We are not at all picky about whether or not
there is a space after/before parens in if( foo ), and similar
small details, just the overall code pattern.
- Structs and enums all get the optional typedef so that they do
not need to be qualified with 'struct' resp. 'enum' when used.
- Function typedefs are named fsl_XXX_f. Implementations of such
typedefs/interfaces are typically named fsl_XXX_f_SUFFIX(), where
SUFFIX describes the implementation's
specialization. e.g. fsl_output_f() is a callback
typedef/interface and fsl_output_f_FILE() is a concrete
implementation for FILE handles.
- Typedefs for non-struct types (numerics and enumcs) tend to be
named fsl_XXX_t.
- Functions follow the naming pattern prefix_NOUN_VERB(), rather
than the more C-conventional prefix_VERB_NOUN(),
e.g. fsl_foo_get() and fsl_foo_set() rather than fsl_get_foo() and
fsl_get_foo(). The primary reasons are (A) sortability for
document processors and (B) they more naturally match with OO API
conventions, e.g. noun.verb(). A few cases knowingly violate this
convention for the sake of readability or sorting of several related
functions (e.g. fsl_db_get_XXX() instead of fsl_db_XXX_get()).
- Structs intended to be creatable on the stack are accompanied by
a const instance named fsl_STRUCT_NAME_empty, and possibly by a
macro named fsl_STRUCT_NAME_empty_m, both of which are
"default-initialized" instances of that struct. This is superiour
to using memset() for struct initialization because we can define
(and document) arbitrary default values and all clients who
copy-construct them are unaffected by many types of changes to the
struct's signature (though they may need a recompile). The
intention of the fsl_STRUCT_NAME_empty_m macro is to provide a
struct-embeddable form for use in other structs or
copy-initialization of const structs, and the _m macro is always
used to initialize its const struct counterpart. e.g. the library
guarantees that fsl_cx_empty_m (a macro representing an empty
fsl_cx instance) holds the same default values as fsl_cx_empty (a
const fsl_cx value).
- Returning int vs fsl_int_t vs fsl_size_t: int is used as a
conventional result code. fsl_int_t is used as a signed
length-style result code (e.g. printf() semantics). Unsigned
ranges use fsl_size_t. char is used to indicate a boolean. ints
are (also) used as a "triplean" (3 potential values, e.g. <0, 0,
>0). fsl_int_t also guarantees that it will be 64-bit if
available, so can be used for places where large values are needed
but a negative value is legal (or handy), e.g. fsl_strndup()'s
second argument. The use of the fsl_xxx_f typedefs, rather than
(unsigned) int, is primarily for readability/documentation,
e.g. so that readers can know immediately that the function does
not use integer argument or result-code return semantics. It also
allows us to better define platform-portable printf/scanf-style
format modifiers for them (analog to C99's PRIi32 and friends),
which often come in handy.
- Signed vs. unsigned types for size/length arguments: use the
fsl_int_t (signed) argument type when the client may legally pass
in a negative value as a hint that the API should use fsl_strlen()
(or similar) to determine a byte array's length. Use fsl_size_t
when no automatic length determination is possible (or desired),
to "force" the client to pass the proper length. Internally
fsl_int_t is used in some places where fsl_size_t "should" be used
because some ported-in logic relies on loop control vars being
able to go negative. Additionally, fossil internally uses negative
blob lengths to mark phantom blobs, and care must be taken when
using fsl_size_t with those.
- Functions taking elipses (...) are accompanied by a va_list
counterpart named the same as the (...) form plus a trailing
'v'. e.g. fsl_appendf() and fsl_appendfv(). We do not use the
printf()/vprintf() convention because that hoses sorting of the
functions in generated/filtered API documentation.
- Error handling/reporting: please keep in mind that the core code
is a library, not an application. The main implication is that
all lib-level code needs to check for errors whereever they can
happen (e.g. on every single memory allocation, of which there are
many) and propagate errors to the caller, to be handled at his
discretion. The app-level code (::fcli) is not particularly strict
in this regard, and installs its own allocator which abort()s on
allocation error, which simplifies app-side code somewhat
vis-a-vis lib-level code.
*/
/** @page page_fossil_arch Fossil Architecture Overview
An introduction to the Fossil architecture. These docs
are basically just a reformulation of other, more detailed,
docs which can be found via the main Fossil site, e.g.:
- http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki
- http://fossil-scm.org/index.html/doc/trunk/www/fileformat.wiki
Fossil's internals are fundamentally broken down into two basic
parts. The first is a "collection of blobs." The simplest way to
think of this (and it's not far from the full truth) is a
directory containing lots of files, each one named after the SHA1
hash of its contents. This pool contains ALL content required for
a repository - all other data can be generated from data contained
here. Included in the blob pool are so-called Artifacts. Artifacts
are simple text files with a very strict format, which hold
information regarding the idententies of, relationships involving,
and other metadata for each type of blob in the pool. The most
basic Artifact type is called a Manifest, and a Manifest tells us,
amongst other things, which of the SHA1-based file names has which
"real" file name, which version the parent (or parents!) is (or
are), and other data required for a "commit" operation.
The blob pool and the Manifests are all a Fossil repository really
needs in order to function. On top of that basis, other forms of
Artifacts provide features such as tagging (which is the basis of
branching and merging), wiki pages, and tickets. From those
Artifacts, Fossil can create/calculate all sorts of
information. For example, as new Artifacts are inserted it
transforms the Artifact's metadata into a relational model which
sqlite can work with. That leads us to what is conceptually the
next-higher-up level, but is in practice a core-most component...
Storage. Fossil's core model is agnostic about how its blobs are
stored, but libfossil and fossil(1) both make heavy use of sqlite
to implement many of their features. These include:
- Transaction-capable storage. It's almost impossible to corrupt a
Fossil db in normal use. sqlite3 offers literally the most robust
general-purpose file format on the planet.
- The storage of the raw blobs.
- Artifact metadata is transformed into various DB structures
which allow libfossil to traverse historical data much more
efficiently than would be possible without a db-like
infrastructure (and everything that implies). These structures are
kept up to date as new Artifacts are stored in a repository,
either via local edits or synching in remote content. These data
are incrementally updated as changes are made to a repo.
- A tremendous amount of the "leg-work" in processing the
repository state is handled by SQL queries, without which the
library would easily require 5-10x more code in the form of
equivalent hard-coded data structures and corresponding
functionality. The db approach allows us to ad-hoc structures as
we need them, providing us a great deal of flexibility.
All content in a Fossil repository is in fact stored in a single
database file. Fossil additionally uses another database (a
"checkout" db) to keep track of local changes, but the repo
contains all "fossilized" content. Each copy of a repo is a
full-fledged repo, each capable of acting as a central copy for
any number of clones or checkouts.
That's really all there is to understand about Fossil. How it does
its magic, keeping everything aligned properly, merging in
content, how it stores content, etc., is all internal details
which most clients will not need to know anything about in order
to make use of fossil(1). Using libfossil effectively, though,
does require learning _some_ amount of how Fossil works. That will
require taking some time with _other_ docs, however: see the
links at the top of this section for some starting points.
Sidebar:
- The only file-level permission Fossil tracks is the "executable"
(a.k.a. "+x") bit. It internally marks symlinks as a permission
attribute, but that is applied much differently than the
executable bit and only does anything useful on platforms which
support symlinks.
*/
#endif
/* NET_FOSSIL_SCM_PAGES_H_INCLUDED */