/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* vim: set ts=2 et sw=2 tw=80: */ #if !defined(NET_FOSSIL_SCM_PAGES_H_INCLUDED) #define NET_FOSSIL_SCM_PAGES_H_INCLUDED /* Copyright (c) 2013 D. Richard Hipp This program is free software; you can redistribute it and/or modify it under the terms of the Simplified BSD License (also known as the "2-Clause License" or "FreeBSD License".) This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Author contact information: drh@hwaci.com http://www.hwaci.com/drh/ ***************************************************************************** This file contains only Doxygen-format documentation, split up into Doxygen "pages", each covering some topic at a high level. This is not the place for general code examples - those belong with their APIs. */ /** @mainpage libfossil Forewarning: this API assumes one is familiar with the Fossil SCM, ideally in detail. The Fossil SCM can be found at: http://fossil-scm.org libfossil is an experimental/prototype library API for the Fossil SCM. This API concerns itself only with the components of fossil which do not need user interaction or the display of UI components (including HTML and CLI output). It is intended only to model the core internals of fossil, off of which user-level applications could be built. The project's repository and additional information can be found at: http://fossil.wanderinghorse.net/repos/libfossil/ This code is 100% hypothetical/potential, and does not represent any Official effort of the Fossil project. It is up for any amount of change at any time and does not yet have a stable API. All Fossil users are encouraged to participate in its development, but if you are reading this then you probably already knew that :). This effort does not represent "Fossil Version 2", but provides an alternate method of accessing and manipulating fossil(1) repositories. Whereas fossil(1) is a monolithic binary, this API provides library-level access to (some level of) the fossil(1) feature set (that level of support grows approximately linearly with each new commit). Current status: alpha. Some bits are basically finished but there is a lot of work left to do. The scope is pretty much all Fossil-related functionality which does not require a user interface or direct user interaction, plus some range of utilities to support those which require a UI/user. */ /** @page page_terminology Fossil Terminology See also: http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki The libfossil API docs normally assume one is familiar with Fossil-internal terminology, which is of course a silly assumption to make. Indeed, one of libfossil's goals is to make Fossil more accessible, partly be demystifying it. To that end, here is a collection of terms one may come across in the API, along with their meanings in the context of Fossil... - REPOSITORY (a.k.a. "repo) is an sqlite database file which contains all content for a given "source tree." (We will use the term "source tree" to mean any tree of "source" (documents, whatever) a client has put under Fossil's supervision.) - CHECKOUT (a.k.a. "local source tree" or "working copy") refers to (A) the action of pulling a specific version of a repository's state from that repo into the local filesystem, and (B) a local copy "checked out" of a repo. e.g. "he checked out the repo," and "the changes are in his [local] checkout." - ARTIFACT is the generic term for anything stored in a repo. More specifically, ARTIFACT refers to "control structures" Fossil uses to internally track changes. These artifacts are stored as blobs in the database, just like any other content. For complete details and examples, see: http://fossil-scm.org/index.html/doc/tip/www/fileformat.wiki - A MANIFEST is a specific type of ARTIFACT - the type which records all metadata for a COMMIT operation (which files, which user, the timestamp, checkin comment, lineage, etc.). For historical reasons, MANIFEST is sometimes used as a generic term for ARTIFACT because what the fossil(1)-internal APIs originally called a Manifest eventually grew into other types of artifacts but kept the Manifest naming convention. In Fossil developer discussion, "manifest" most often means what this page calls ARTIFACT (probably because that how the C code is modelled). The libfossil API calls uses the term "deck" instead of "manifest" to avoid ambiguity/confusion (or to move the confusion somewhere else, at least). - CHECKIN is the term libfossil prefers to use for COMMIT MANIFESTS. It is also the action of "checking in" (a.k.a. "committing") file changes to a repository. A CHECKIN ARTIFACT can be one of two types: a BASELINE MANIFEST (or BASELINE CHECKIN) contains a list of all files in that version of the repository, including their file permissions and the UUIDs of their content. A DELTA MANFIEST is a checkin record which derives from a BASELINE MANIFEST and it lists only the file-level changes which happened between the baseline and the delta, recording any changes in content, permisions, or name, and recording deletions. Note that this inheritance of deltas from baselines is an internal optimization which has nothing to do with checkin version inheritance - the baseline of any given delta is normally _not_ its direct checkin version parent. - BRANCH, FORK, and TAG are all closely related in Fossil and are explained in detail (with pictures!) at: http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki In short: BRANCHes and FORKs are two names for the same thing, and both are just a special-case usage of TAGs. - MERGE or MERGING: the process of integrating one version of source code into another version of that source code, using a common parent version as the basis for comparison. This is normally fully automated, but occasionally human (and sometimes Divine) intervention is required to resolve so-called "merge conflicts," where two versions of a file change the same parts of a common parent version. - RID (Record ID) is a reference to the blob.rid field in a repository DB. RIDs are used extensively throughout the API for referencing content records, but they are transient values local to a given copy of a given repository at a given point in time. They _can_ change, even for the same content, (e.g. a rebuild can hypothetically change them, though it might not, and re-cloning a repo may very well change some RIDs). Clients must never rely on them for long-term reference to SCM'd data - always use the full UUID of such data. Even though they normally appear to be static, they are most explicitly NOT guaranteed to be. Nor are their values guaranteed to imply any meaning, e.g. "higher is newer" is not necessarily true because synchronization can import new remote content in an arbitrary order and a rebuild might import it in random order. The API uses RIDs basically as handles to arbitrary blob content and, like most C-side handles, must be considered transient in nature. That said, within the db, records are linked to each other exclusively using RIDs, so they do have some persistence guarantees for a given db instance. More to come... */ /** @page page_APIs High-level API Overview The primary end goals of this project are to eventually cover the following feature areas: - Provide embeddable SCM to local apps using sqlite storage. - Provide a network layer on top of that for synchronization. - Provide apps on top of those to allow administration of repos. To those ends, the fossil APIs cover the following categories of features: Filesystem: - Conversions of strings from OS-native encodings to UTF. fsl_utf8_to_unicode(), fsl_filename_to_utf8(), etc. These are primarily used internally but may also be useful for applications working with files (as most clients will). Actually... most of these bits are only needed for portability across Windows platforms. - Locating a user's home directory: fsl_find_home_dir() - Normalizing filenames/paths. fsl_file_canonical_name() and friends. - Checking for existence, size, and type (file vs directory) with fsl_is_file() and fsl_dir_check(), or the more general-purpose fsl_stat(). Databases (sqlite): - Opening/closing sqlite databases and running queries on them, independent of version control features. See fsl_db_open() and friends. The actual sqlite-level DB handle type is abstracted out of the public API, largely to simplify an eventual port from sqlite3 to sqlite4 or (hypothetically) to other storage back-ends (not gonna happen - too much work). - There are lots of utility functions for oft-used operations, e.g. fsl_config_get_int32() and friends to fetch settings from one of the three different configuration areas (global, repository, and checkout). - Pseudo-recusive transactions: fsl_db_transaction_begin() and fsl_db_transaction_end(). - Cached statements (an optimization for oft-used queries): fsl_db_prepare_cached() and friends. The DB API is (as Brad put so well) "very present" in the public API. While the core API provides access to the underlying repository data, it cannot begin to cover even a small portion of potential use cases. To that end, it exposes the DB API so that clients who want to custruct their own data can do so. It does require research into the underlying schemas, but gives applications the ability to do _anything_ with their repositories which the core API does not account for. Historically, the ability to create ad-hoc data structures as needed, in the form of SQL queries, has accounted for much of Fossil's feature flexibility. Deltas: - Creation and application of raw deltas, using Fossil's delta format, independent of version control features. See fsl_delta_create() and friends. These are normally used only at the deepest internal levels of fossil, but the APIs are exposed so that clients can, if they wish, use them to deltify their own content independently of fossil's internally-applied deltification. Doing so is remarkably easy, but completely unnecessary for content which will be stored in a repo, as Fossil creates deltas as needed. SCM: - A "context" type (fsl_cx) which manages a repository db and, optionally, a checkout db. Read-only operations on the DB are working and write functionality (adding repo content) is ongoing. See fsl_cx, fsl_cx_init(), and friends. - The fsl_deck class assists in parsing, creating, and outputing "artifacts" (manifests, control (tags), events, etc.). It gets its name from it being container for "a collection of cards" (which is what a Fossil artifact is). - fsl_content_get() expands a (possibly) deltified blob into its full form, and fsl_content_blob() can be used to fetch a raw blob (possibly a raw delta). - A number of routines exist for converting symbol names to RIDs (fsl_sym_to_rid()), UUIDs to RIDs (fsl_uuid_to_rid(), and similar commonly-needed lookups. Input/Output: - The API defines several abstractions for i/o interfaces, e.g. fsl_input_f() and fsl_output_f(), which allow us to accept/emit data from/to arbitrary sources/destinations. A fsl_cx instance is configured with an output channel, the intention being that all clients of that context should generate any output through that channel, so that all compatible apps can cooperate more easily in terms of i/o. For example, the th1ish script binding for libfossil routes fsl_output() through the script's i/o channels, so that any output generated by libfossil-using code it links to can take advantage of the script-side output features (such as output buffering, which is needed for any non-trivial CGI output). Utilities: - fsl_buffer, a generic buffer class, is used heavily by the library. See fsl_buffer and friends. - fsl_appendf() provides printf()-like functionality, but sends its output to a callback function (optionally stateful), making it the one-stop-shop for string formatting within the library. - The fsl_error class is used to propagate error information between the libraries various levels and the client. - The fsl_list class acts as a generic container-of-pointers, and the API provides several convenience routines for managing them, traversing them, and cleaning them up. - Hashing: there are a number of routines for calculating SHA1 and MD5 hashes. See fsl_sha1_cx, fsl_md5_cx, and friends. We haven't yet had need of an actual hash table class. - zlib compression is used for storing artifacts. See fsl_data_is_compressed(), fsl_buffer_compress(), and friends. */ /** @page page_porting_checklist Porting Checklist An overview of what library-level features are implemented and what's left to do... - Db abstraction layer: complete and more or less stable. - Infrastructure for opening/closing checkouts/repos works. Infrastructure for a config db is in place. - Fetching blob content (raw or delta-applied) and low-level content saving is working. - Artifact (e.g. manifest) parsing, generating, and delta manifest baseline traversal works. Most artifacts can be exported from a canonical Fossil repo then parsed and exported by this API with 100% fidelity, with the minor exception that _some_ timestamps (D-cards) differ by a millisecond (round-trip precision change), which changes their hash. So far i have only see the imprecision affect "artifically generated" artifacts, not "real" ones. Artifacts are never "round-tripped" like that in real use, anyway - it's only for testing the parser and generator. - Adding new control artifacts (tag changes) is basically working. - Low-level delta generation and application is working, as well as the (incidentally unrelated) diff-generation code (context- and side-by-side). - Manifest crosslinking. This is a large part of what goes on during any changes to a repository. Most of the work is finished here but there are still some cases to handle (namely tickets) and obscene amounts of testing to be done. And a testing infrastructure needs to be architected and put into place. - Schema initialization/creation is complete. The rebuild process (closely related but far more intricate) is far down the list of TODOs. - Wiki features are basically working: loading/saving, but it needs APIs for working with wiki history. Actively in progress (today==March 14, 2014): - Event bits - Application-level bits (::fcli). - "vfile" (checkout-related) infrastructure is mostly ported in. This includes checkin support. - Tickets APIs have been started but have a low priority. The v1 impl requires a good deal of application-level infrastructure (namely TH1), and there are no plans to port TH1 in at the library level. - All of the bits needed for performing a checkout are in place with the exception of UNDO support and the actual creation of the checkout db (but we have all the pieces needed for that). Areas which have not yet been started or where no notable progress has yet been made, in no particular order: - Handling of symlinks in a repo. - The 'rebuild' operation, i think, will essentially be the ultimate test of the core library components. If it can do that, it can "probably" do anything else. - UI. The library has no UI, of course, but as it is fleshed out one may eventually be needed, even if it's only a CLI shell. - Synchronization. There are lots of underlying bits to finish before this can be implemented. - Networking. Far down the list of TODOs. The core library needs know nothing about networking. - "Received from" (rcvid field) info on artifacts. In v1 this is tied closely to the network layer. - Versionable config settings. - Application/honoring of certain config settings. e.g. ignore-glob and friends are currently not honored, and case-insensitivity support is completely untested. */ /** @page page_is_isnot Fossil is/is not... Through porting the main fossil application into library form, the following things have become very clear (or been reinforced)... Fossil is... - _Exceedingly_ robust. Not only is sqlite literally the single most robust application-agnostic container file format on the planet, but Fossil goes way out of its way to ensure that what gets put in is what gets pulled out. It cuts zero corners on data integrity, even adding in checks which seem superfluous but provide another layer of data integrity (i'm primarily talking about the R-card here, but there are other validation checks). It does this at the cost of memory and performance (that said, it's still easily fast enough for its intended uses). "Robust" doesn't mean that it never crashes nor fails, but that it does so with (insofar as is technically possible) essentially zero chance of data loss/corruption. - Long-lived: the underlying data format is independent of its storage format. It is, in principal, usable by systems as yet unconceived by the next generation of programmers. This implementation is based on sqlite, but the model can work with arbitrary underlying storage. - Amazingly space-efficient. The size of a repository database necessarily grows as content is modified. However, Fossil's use of zlib-compressed deltas, using a very space-efficient delta format, leads to tremendous compression ratios. As of this writing (September, 2013), the main Fossil repo contains approximately 1.3GB of content, were we to check out every single version in its history. Its repository database is only 42MB, however, equating to a 32:1 compression ration. Ratios in the range of 20:1 to 40:1 are common, and more active repositories tend to have higher ratios. The TCL core repository, with just over 15 years of code history (imported, of course, as Fossil was introduced in 2007), is only 187MB, with 6.2GB of content and a 33:1 compression ratio. Fossil is not... - Memory-light. Even very small uses can easily suck up 1MB of RAM and many operations (verification of the R card, for example) can quickly allocate and free up hundreds of MB because they have to compose various versions of content on their way to a specific version. Tto be clear, that is total RAM usage, not _peak_ RAM usage. Peak usage is normally a function of the content it works with at a given time. For any given delta application operation, Fossil needs the original content, the new content, and the delta all in memory at once, and may go through several such iterations while resolving deltified content. Verification of its 'R-card' alone can require a thousand or more underlying DB operations and hundreds of delta applications. The internals use caching where it would save us a significant amount of db work relative to the operation in question, but relatively high memory costs are unavoidable. That's not to say we can't optimize a bit, but first make it work, then optimize it. The library takes care to re-use memory buffers where it is feasible (and not too intrusive) to do so, but there is yet more RAM to be optimized away in this regard. */ /** @page page_threading Threads and Fossil It is strictly illegal to use a given fsl_cx instance from more than one thread. Period. It is legal for multiple contexts to be running in multiple threads, but only if those contexts use different repository/checkout databases. Though access to the storage is, through sqlite, protected via a mutex/lock, this library does not have a higher-level mutex to protect multiple contexts from colliding during operations. So... don't do that. One context, one repo/checkout. Multiple application instances may each use one fsl_cx instance to share repo/checkout db files, but must be prepared to handle locking-related errors in such cases. e.g. db operations which normally "always work" may suddenly pause for a few seconds before giving up while waiting on a lock when multiple applications use the same database files. sqlite's locking behaviours are documented in great detail at http://sqlite.org. */ /** @page page_artifacts Creating Artifacts A brief overview of artifact creating using this API. This is targeted at those who are familiar with how artifacts are modelled and generated in fossil(1). Primary artifact reference: http://fossil-scm.org/index.html/doc/trunk/www/fileformat.wiki In fossil(1), artifacts are generated via the careful crafting of a memory buffer (large string) in the format described in the document above. While it's relatively straightforward to do, there are lots of potential gotchas, and a bug can potentially inject "bad data" into the repo (though the verify-before-commit process will likely catch any problems before the commit is allowed to go through). The libfossil API uses a higher-level (OO) approach, where the user describes a "deck" of cards and then tells the library to save it in the repo (fsl_deck_save()) or output it to some other channel (fsl_deck_output()). The API ensures that the deck's cards get output in the proper order and that any cards which require special treatment get that treatment (e.g. the "fossilize" encoding of certain text fields). The "deck" concept is equivalent to Artifact in fossil(1), but we use the word deck because (A) Artifact is highly ambiguous in this context and (B) deck is arguably the most obvious choice for the name of a type which acts as a "container of cards." Ideally, client-level code will never have to create an artifact via the fsl_deck API (because doing so requires a fairly good understanding of what the deck is for in the first place, including the individual Cards). The public API strives to hide those levels of details, where feasible, or at least provide simpler/safer alternatives for basic operations. Some operations may require some level of direct work with a fsl_deck instance. Likewise, much read-only functionality directly exposes fsl_deck to clients, so some familiarity with the type and its APIs will be necessary for most clients. The process of creating an artifact looks a lot like the following code example. We have elided error checking for readability purposes, but in fact this code has undefined behaviour if error codes are not checked and appropriately reacted to. @code fsl_deck deck = fsl_deck_empty; fsl_deck * d = &deck; // for typing convenience fsl_deck_init( fslCtx, d, FSL_CATYPE_CONTROL ); // must come first fsl_deck_D_set( d, fsl_julian_now() ); fsl_deck_U_set( d, "your-fossil-name", -1 ); fsl_deck_T_add( d, FSL_TAGTYPE_ADD, "...uuid being tagged...", "tag-name", "optional tag value"); ... // unshuffle is necessary when using multi-cards which may // need sorting (tags, filenames, etc.): fsl_deck_unshuffle(d, 0); // Unshuffling is done by the client because the deck is const // when we output it: fsl_deck_output( f, d, fsl_output_f_FILE, stdout ); // note that fsl_deck_save() does the unshuffle itself. fsl_deck_finalize(d); @endcode The order the cards are added to the deck is irrelevant - they will be output in the order specified by the Fossil specs regardless of their insertion order. Each setter/adder function knows, based on the deck's type (set via fsl_deck_init()), whether the given card type is legal, and will return an error (probably FSL_RC_TYPE) if an attempt is made to add a card which is illegal for that deck type. Likewise, fsl_deck_output() and fsl_deck_save() confirm that the decks they are given contain (A) only allowed cards and (B) have all required cards. fsl_deck_save() also sorts any "multi-cards" which need it (e.g. T- and F-cards). */ /** @page page_transactions DB Transactions The fsl_db_transaction_begin() and fsl_db_transaction_end() functions implement a basic form of recursive transaction, allowing the library to start and end transactions at any level without having to know whether a transaction is already in progress (sqlite3 does not natively support nested transactions). A rollback triggered in a lower-level transaction will propagate the error back through the transaction stack and roll back the whole transaction, providing us with excellent error recovery capabilities (meaning we can always leave the db in a well-defined state). It is STRICTLY ILLEGAL to EVER begin a transaction using "BEGIN" or end a transaction by executing "COMMIT" or "ROLLBACK" directly on a db handle which associated with a fsl_cx instances. Doing so bypasses internal state which needs to be kept abreast of things and will cause Grief and Suffering (on the client's part, not mine). Tip: implementing a "dry-run" mode for most fossil operations is trivial by starting a transaction before performing the operations. Many operations run in a transaction, but if the client starts one of his own he can "dry-run" any op by simply rolling back the transaction he started. Abstractly, that looks like this pseudocode: @code db.begin(); fsl.something(); fsl.somethingElse(); if( dryRun ) db.rollback(); else db.commit(); @endcode */ /** @page page_code_conventions Code Conventions Project and Code Conventions... Foreward: all of this more or less evolved organically or was inherited from fossil(1) (where it evolved organically, or was inherited from sqilte (where it evol...)), and is written up here more or less as a formality. Historically i've not been a fan of coding conventions, but as someone else put it to me, "the code should look like it comes from a single source," and the purpose of this section is to help orient those looking to hack in the sources. Note that most of what is said below becomes obvious within a few minutes of looking at the sources - there's nothing earth-shatteringly new nor terribly controversial here. The Rules/Suggestions/Guidelines/etc. are as follows... - C89 whereever possible, with the exception that we optionally use the C99-specified fixed integer types and their standard formatting strings when possible (if the platform has them resp. if the configuration header is configured for them). We also use/tolerate 'long long' (via sqlite3), which is not strictly C89 but is supported on all modern compilers even when compiling in C89 mode. For gcc and workalike-compiler, the -Wno-long-long flag can be used to suppress warnings regarding non-standarization of that type. (Whether or not those warnings appear depends on other warning levels.) Apropos warning levels... - The canonical build environment uses the most restrictive set of warning/error levels possible, with the exception of tolerating 'long long', as mentioned above. It is highly recommended that non-canonical build environments do the same. Adding -Wall -Werror -pedantic does _not_ guaranty that all C compliance/portability problems can be caught by the compiler, but it goes a long way in helping us to write clean code. The clang compiler is particularly good at catching minor foo-foo's such as uninitialized variables. - API docs (as you have probably already noticed), does not (any longer) follow Fossil's comment style, but instead uses Doxygen-friendly formatting. Each comment block MUST start with two or more asterisks, or '*!', or doxygen apparently doesn't understand it (http://www.stack.nl/~dimitri/doxygen/manual/docblocks.html). When adding code snippets and whatnot to docs, please use doxygen conventions if it is not too much of an inconvenience. All public APIs must be documented with a useful amount of detail. If you hate documenting, let me know and i'll document it (it's what i do for fun). - Public API members have a fsl_ or FSL_ prefix (fossil_ seems too long?). For private/static members, anything goes. Optional or "add-on" APIs (e.g. ::fcli) may use other prefixes, but are encouraged use an "f-word" (as it were), simply out of deference to long-standing software naming conventions. - Structs and functions use lower_underscore_style() - Overall style, especially scope blocks and indentation, should follow Fossil v1.x. We are not at all picky about whether or not there is a space after/before parens in if( foo ), and similar small details, just the overall code pattern. - Structs and enums all get the optional typedef so that they do not need to be qualified with 'struct' resp. 'enum' when used. - Function typedefs are named fsl_XXX_f. Implementations of such typedefs/interfaces are typically named fsl_XXX_f_SUFFIX(), where SUFFIX describes the implementation's specialization. e.g. fsl_output_f() is a callback typedef/interface and fsl_output_f_FILE() is a concrete implementation for FILE handles. - Typedefs for non-struct types (numerics and enumcs) tend to be named fsl_XXX_t. - Functions follow the naming pattern prefix_NOUN_VERB(), rather than the more C-conventional prefix_VERB_NOUN(), e.g. fsl_foo_get() and fsl_foo_set() rather than fsl_get_foo() and fsl_get_foo(). The primary reasons are (A) sortability for document processors and (B) they more naturally match with OO API conventions, e.g. noun.verb(). A few cases knowingly violate this convention for the sake of readability or sorting of several related functions (e.g. fsl_db_get_XXX() instead of fsl_db_XXX_get()). - Structs intended to be creatable on the stack are accompanied by a const instance named fsl_STRUCT_NAME_empty, and possibly by a macro named fsl_STRUCT_NAME_empty_m, both of which are "default-initialized" instances of that struct. This is superiour to using memset() for struct initialization because we can define (and document) arbitrary default values and all clients who copy-construct them are unaffected by many types of changes to the struct's signature (though they may need a recompile). The intention of the fsl_STRUCT_NAME_empty_m macro is to provide a struct-embeddable form for use in other structs or copy-initialization of const structs, and the _m macro is always used to initialize its const struct counterpart. e.g. the library guarantees that fsl_cx_empty_m (a macro representing an empty fsl_cx instance) holds the same default values as fsl_cx_empty (a const fsl_cx value). - Returning int vs fsl_int_t vs fsl_size_t: int is used as a conventional result code. fsl_int_t is used as a signed length-style result code (e.g. printf() semantics). Unsigned ranges use fsl_size_t. char is used to indicate a boolean. ints are (also) used as a "triplean" (3 potential values, e.g. <0, 0, >0). fsl_int_t also guarantees that it will be 64-bit if available, so can be used for places where large values are needed but a negative value is legal (or handy), e.g. fsl_strndup()'s second argument. The use of the fsl_xxx_f typedefs, rather than (unsigned) int, is primarily for readability/documentation, e.g. so that readers can know immediately that the function does not use integer argument or result-code return semantics. It also allows us to better define platform-portable printf/scanf-style format modifiers for them (analog to C99's PRIi32 and friends), which often come in handy. - Signed vs. unsigned types for size/length arguments: use the fsl_int_t (signed) argument type when the client may legally pass in a negative value as a hint that the API should use fsl_strlen() (or similar) to determine a byte array's length. Use fsl_size_t when no automatic length determination is possible (or desired), to "force" the client to pass the proper length. Internally fsl_int_t is used in some places where fsl_size_t "should" be used because some ported-in logic relies on loop control vars being able to go negative. Additionally, fossil internally uses negative blob lengths to mark phantom blobs, and care must be taken when using fsl_size_t with those. - Functions taking elipses (...) are accompanied by a va_list counterpart named the same as the (...) form plus a trailing 'v'. e.g. fsl_appendf() and fsl_appendfv(). We do not use the printf()/vprintf() convention because that hoses sorting of the functions in generated/filtered API documentation. - Error handling/reporting: please keep in mind that the core code is a library, not an application. The main implication is that all lib-level code needs to check for errors whereever they can happen (e.g. on every single memory allocation, of which there are many) and propagate errors to the caller, to be handled at his discretion. The app-level code (::fcli) is not particularly strict in this regard, and installs its own allocator which abort()s on allocation error, which simplifies app-side code somewhat vis-a-vis lib-level code. */ /** @page page_fossil_arch Fossil Architecture Overview An introduction to the Fossil architecture. These docs are basically just a reformulation of other, more detailed, docs which can be found via the main Fossil site, e.g.: - http://fossil-scm.org/index.html/doc/trunk/www/concepts.wiki - http://fossil-scm.org/index.html/doc/trunk/www/fileformat.wiki Fossil's internals are fundamentally broken down into two basic parts. The first is a "collection of blobs." The simplest way to think of this (and it's not far from the full truth) is a directory containing lots of files, each one named after the SHA1 hash of its contents. This pool contains ALL content required for a repository - all other data can be generated from data contained here. Included in the blob pool are so-called Artifacts. Artifacts are simple text files with a very strict format, which hold information regarding the idententies of, relationships involving, and other metadata for each type of blob in the pool. The most basic Artifact type is called a Manifest, and a Manifest tells us, amongst other things, which of the SHA1-based file names has which "real" file name, which version the parent (or parents!) is (or are), and other data required for a "commit" operation. The blob pool and the Manifests are all a Fossil repository really needs in order to function. On top of that basis, other forms of Artifacts provide features such as tagging (which is the basis of branching and merging), wiki pages, and tickets. From those Artifacts, Fossil can create/calculate all sorts of information. For example, as new Artifacts are inserted it transforms the Artifact's metadata into a relational model which sqlite can work with. That leads us to what is conceptually the next-higher-up level, but is in practice a core-most component... Storage. Fossil's core model is agnostic about how its blobs are stored, but libfossil and fossil(1) both make heavy use of sqlite to implement many of their features. These include: - Transaction-capable storage. It's almost impossible to corrupt a Fossil db in normal use. sqlite3 offers literally the most robust general-purpose file format on the planet. - The storage of the raw blobs. - Artifact metadata is transformed into various DB structures which allow libfossil to traverse historical data much more efficiently than would be possible without a db-like infrastructure (and everything that implies). These structures are kept up to date as new Artifacts are stored in a repository, either via local edits or synching in remote content. These data are incrementally updated as changes are made to a repo. - A tremendous amount of the "leg-work" in processing the repository state is handled by SQL queries, without which the library would easily require 5-10x more code in the form of equivalent hard-coded data structures and corresponding functionality. The db approach allows us to ad-hoc structures as we need them, providing us a great deal of flexibility. All content in a Fossil repository is in fact stored in a single database file. Fossil additionally uses another database (a "checkout" db) to keep track of local changes, but the repo contains all "fossilized" content. Each copy of a repo is a full-fledged repo, each capable of acting as a central copy for any number of clones or checkouts. That's really all there is to understand about Fossil. How it does its magic, keeping everything aligned properly, merging in content, how it stores content, etc., is all internal details which most clients will not need to know anything about in order to make use of fossil(1). Using libfossil effectively, though, does require learning _some_ amount of how Fossil works. That will require taking some time with _other_ docs, however: see the links at the top of this section for some starting points. Sidebar: - The only file-level permission Fossil tracks is the "executable" (a.k.a. "+x") bit. It internally marks symlinks as a permission attribute, but that is applied much differently than the executable bit and only does anything useful on platforms which support symlinks. */ #endif /* NET_FOSSIL_SCM_PAGES_H_INCLUDED */