Fossil C API library

(1) By anonymous on 2020-12-20 00:49:34 [link] [source]

I have the idea add a C API for Fossil, which can be useful if you want to write another program which will handle it, without having to start Fossil several times once for each artifact, and allowing some more control and other things to be calculated.

The below is some of my ideas, but probably it would need to be changed a lot, because there are some things missing, some things might not be needed at all, some things should be different than specified below, some things below are wrong in other ways, etc. In the specification below, it mentions FILE* and the GNU function fopencookie may be used to make custom stream objects; maybe this is not wanted if it is intended to work on non-GNU systems, though.

int fossil_begin_card(FILE*,char);
// This is similar to fputc(), with arguments in the reversed order, and a different return value.

int fossil_begin_transaction(fossil*);

void fossil_cgi(/* not yet defined */);
// Implements a CGI server. If the path is /xfer then it will perform the
// Fossil protocol; otherwise, it uses your own callbacks. It includes the
// functions to handle input if you need to do that in your CGI program.

int fossil_cgi_xfer_callback(/* not yet defined */);
// The intention is that you can use this to control details of how the
// Fossil protocol is handled, which artifacts are accepted/rejected,
// which artifacts will be sent or asked for, handling pragmas, etc. The
// callback can return FOSSIL_DO_DEFAULT to use the default.

void fossil_close(fossil*);
// Close the access to the repository; frees the fossil* object.

int fossil_commit(fossil*);

int fossil_end_card(FILE*);
// Writes a line feed.

int fossil_error(fossil*);

int fossil_find(/* not yet defined */);

int fossil_get(fossil*,const fossil_hash*,unsigned char**,size_t*);
// Read an artifact from the repository; it must then be freed. You can
// pass a null pointer for the third argument; in this case it need not
// be freed, and it can be used only to test the artifact's existence.

int fossil_get_config(/* not yet defined */);

const char*fossil_get_cookie(fossil*);
// Get the cookie; this is managed internally and should not be freed. It
// is null if there is no cookie currently set.

int fossil_get_stream(fossil*,const fossil_hash*,FILE**);
// Open a readable stream to read an artifact from the repository. Use
// fclose() to close it once you are finished with it.

int fossil_http_options(/* not yet defined */);

int fossil_login(fossil*,const char*user,const char*pass,int flag);
// Set the username and password for fossil_remote.

int fossil_open(const char*filename,int flag,fossil**);
// Open a Fossil repository.

int fossil_print_argument(FILE*,const char*);
// Print a space and then the text as an argument of a card in a Fossil
// structural artifact, using the appropriate escaping as necessary.

int fossil_print_argument_hash(FILE*,const fossil_hash*);

int fossil_print_hash(FILE*,const fossil_hash*);
// Print the hexadecimal representation of a hash.

int fossil_print_hash_prefix(FILE*,const fossil_hash*,fossil*);

const char*fossil_project_code(fossil*);

int fossil_put(fossil*,unsigned char*,size_t,const fossil_hash*,int,fossil_hash*);
// Add an artifact. If the first argument is null, then it will only
// calculate the hash. If the fourth argument is not null, then it is
// the hash of the previous version, which can cause it to use delta
// encoding; setting this or not setting this won't cause the set of
// artifacts to be incorrect, though. The fifth argument is the flags.
// If the sixth argument is not null, the hash of the new artifact is
// written to the variable it points to.

int fossil_put_config(/* not yet defined */);

int fossil_put_stream(fossil*,FILE**,const fossil_hash*,int,fossil_hash*);
// Similar to fossil_put() but opens a writable stream to write the data
// to. Once the stream is closed with fclose() then the hash of the data
// will be written to the last argument, if it is not null. You can then
// call fossil_error() to check the error, if there is an error.

int fossil_read_artifact(fossil*,const fossil_hash*,int(*)(void*,int,char**),void*);
// Read a structural artifact. If it does not have a valid Z card, then
// the callback is never called and it returns an error code.

int fossil_remote(/* not yet defined */);

int fossil_rollback(fossil*);

int fossil_set_cookie(fossil*,const char*);
// Set the cookie; it will make a copy of it. Specify null to delete the
// cookie if you do not want any cookie to be set.

int fossil_set_hash(fossil_hash*,const char*);
// Set a hash, given hexadecimal representation.

int fossil_set_pragma(/* not yet defined */);

/* Flags for fossil_put and fossil_put_stream:
  FOSSIL_CHECKSUM = Add a Z card at the end of the data
  FOSSIL_MD5 = Use MD5 hashes (see below)
  FOSSIL_SHA1 = Use SHA1 hashes
  FOSSIL_SHA3_256 = Use SHA3-256 hashes

  MD5 hashes are only allowed if the first argument of fossil_put or
  fossil_put_stream is null; you cannot write such an artifact into
  the repository.

  If no hash algorithm is specified, decides using the configuration.
*/

/* Flags for fossil_login:
  FOSSIL_LOGIN_ADD = Add the specified login to the login set
  FOSSIL_LOGIN_SET = The login set is only the specified user
*/

/* Flags for fossil_remote:
  FOSSIL_ASYNC = Run asynchronously
  FOSSIL_DEBUG = Disable compression
  FOSSIL_NOCOOKIE = Do not save a cookie
*/

(2.1) By Warren Young (wyoung) on 2020-12-20 05:47:49 edited from 2.0 in reply to 1 [link] [source]

Have you rejected reviving libfossil or using Fossil's JSON API, or were you not aware of them? Are you trying to achieve something with this new API that can be done with neither, and also not by any other existing interface, such as CLI output parsing, SQL queries, etc.?

One lesson you should take from libfossil and the JSON API is that it's a lot of work to maintain such a thing, and it likely won't have a bus factor above 1. Unless you're committing to be that lone key developer, I believe it's better to refactor Fossil's internals to expose needed APIs rather than create new external APIs.

(3) By Stephan Beal (stephan) on 2020-12-20 10:06:47 in reply to 2.1 [link] [source]

One lesson you should take from libfossil and the JSON API is that it's a lot of work to maintain such a thing, and it likely won't have a bus factor above 1.

A brief bit of background for the OP, in particular as the "bus factor" applies to it...

libfossil was, back at the time, my baby. Right when it reached a point where it was capable of performing a checkout, i was struck down by chronic RSI in my left arm and put on extended medical leave, first for 6 months, then back to work (working mostly 1-handed), where i developed the same problem in my right arm and was subsequently put on what has turned out to be permanent medical leave. Though i can still type, i can't do it for long stretches and often need to take 1-3 months between hacking sessions to allow my hands to recover.

That's the "bus factor" (derived from the phrase, "you could be hit by a bus tomorrow") in that particular project. i was the only active developer, with a few contributions from others, and was hit by a proverbial bus.

Since then, nobody's stepped up to take it over, and in the meantime fossil's hash support has been extended in ways which are incompatible with the current (2014) libfossil code. Even so, i think the library would make a good basis for someone who wants to take it over or to study if they want to start their own effort. The first major order of business would be to get the hash bits up to date so that they support multiple hashes instead of just SHA1, but i cannot estimate how much effort that would be.

If that sounds like something you would like to investigate, please feel free to get in touch with me.

(4) By anonymous on 2020-12-20 19:11:18 in reply to 2.1 [link] [source]

I get a 406 error when trying to access libfossil.

I do know of the JSON API, but there are many problems with it compared with what I have suggested:

It isn't designed for use with C programming.
It doesn't deal with binary data, nor with 64-bit integers.
It doesn't deal directly with raw artifacts. My own suggestion is the API which does deal directly with artifacts; functions fossil_get and fossil_put deal with artifacts. The JSON API operates at the "wrong level".

One lesson you should take from libfossil and the JSON API is that it's a lot of work to maintain such a thing, and it likely won't have a bus factor above 1. Unless you're committing to be that lone key developer, I believe it's better to refactor Fossil's internals to expose needed APIs rather than create new external APIs.

I do know what "bus factor" means, and you may be correct about Fossil's internals to expose needed APIs.

(5) By Stephan Beal (stephan) on 2020-12-20 19:29:03 in reply to 4 [link] [source]

I get a 406 error when trying to access libfossil.

Try this one:

https://fossil.wanderinghorse.net/r/libfossil

My own suggestion is the API which does deal directly with artifacts;

The single largest hurdle to your suggestion is that fossil is very much designed, from the ground up, as a monolithic app. One example: it uses a memory allocator which outright kills the app if an allocation fails. That greatly simplifies writing code in the app but is completely unacceptable for a library interface.

... and you may be correct about Fossil's internals to expose needed APIs

Any such refactoring would be incomplete without reworking how memory allocation failures are handled, and i can confidently say from experience that many of fossil's routines grow by 25-50% if alloc failure handling is added. That's a lot of code.

(6) By anonymous on 2020-12-21 02:16:01 in reply to 5 [link] [source]

Try this one:

This one also doesn't work; it is the same 406 error. (I think I have read that sometimes a 406 error occurs due to Mod Security being misconfigured.)

The single largest hurdle to your suggestion is that fossil is very much designed, from the ground up, as a monolithic app. One example: it uses a memory allocator which outright kills the app if an allocation fails. That greatly simplifies writing code in the app but is completely unacceptable for a library interface.

That problem with the memory allocation might not be a problem with some uses of the API (including some things that I would want to use such a API).

(7) By Stephan Beal (stephan) on 2020-12-21 03:25:20 in reply to 6 [link] [source]

This one also doesn't work; it is the same 406 error. (I think I have read that sometimes a 406 error occurs due to Mod Security being misconfigured.)

Then there's something wrong on your end, as both of the links provided so far work for me in multiple browsers. If mod_security is complaining, perhaps my hoster has blacklisted your ISP or some such. i don't have any insight into that, though.

You can try cloning it:

fossil clone https://fossil.wanderinghorse.net/r/libfossil libfossil.fossil

(That also works for me - just tried it.)

That problem with the memory allocation might not be a problem with some uses of the API (including some things that I would want to use such a API).

You're welcome to take a crack at refactoring, but "don't say i didn't warn you." It's a deceptively large undertaking.

When i met Richard for the first time in 2011 he asked me, "what does fossil need?" to which i immediately answered, "a library interface." We quickly agreed, however, that it would require, in his words, a "herculean effort," and so i opted for a JSON API instead (which is basically a web-invoked library).

i would love (LOVE) to see fossil get a library API, but i am well aware (probably better than anyone) of how big of an effort that involves, and am physically no longer capable of taking on projects of such a scope. If you can achieve it, you'll be my hero.

(8.1) By Warren Young (wyoung) on 2020-12-21 10:53:46 edited from 8.0 in reply to 7 [link] [source]

Middle path: libcurl + json-c or similar.

(10) By Warren Young (wyoung) on 2020-12-21 12:34:24 in reply to 4 [source]

It doesn't deal with binary data

That's an encoding issue. It's perfectly possible to send Base64 or similar through JSON. More below.

nor with 64-bit integers.

That intrigued me enough to go digging into the source code. It isn't correct, as far as I can tell.

That last one is a bit dodgy on some 32-bit platforms and could use refinement, but it seems like an odd thing to be chasing in late 2020. How many such platforms remain, which also deal with artifacts over 4 GiB?

It doesn't deal directly with raw artifacts

For output from the repository, there are /raw URLs. Get the artifact or file hash from the JSON API, then pull the raw data.

For input, the closest we have is the /fileedit feature, more or less on purpose, since submitting commits via an API is…fraught.

(11) By Stephan Beal (stephan) on 2020-12-21 12:44:58 in reply to 10 [link] [source]

nor with 64-bit integers.

That intrigued me enough to go digging into the source code. It isn't correct, as far as I can tell.

The issue is that JSON itself does not (== cannot portably) specify integer sizes. That's necessarily left as an implementation detail so that JSON can be implemented in the widest possible range of environments. Javascript, for example, historically only supported a single numeric type with 53 bits of integer precision. (It now supports "BigInt", but i've never personally seen that in use.)

That said: fossil does not internally make use of any 64-bit integers, AFAIK. All of its RID-using interfaces use int and blobs use 32-bit integers, so 64-bit integers are largely a moot point in fossil.

How many such platforms remain, which also deal with artifacts over 4 GiB?

Because of the blob struct's definition, fossil literally cannot handle individual blobs that big.

(12) By anonymous on 2020-12-21 19:40:12 in reply to 11 [link] [source]

That said: fossil does not internally make use of any 64-bit integers, AFAIK.

Not sure what do you mean by that.

sqlite3_int64 is peppered all over Fossil code (defined in src/sqlite3.h)
JSON_int_t defined to __int64 on Windows (src/cson_amalgamation.c)
cson_int_t similarly (src/cson_amalgamation.h)

(13) By Stephan Beal (stephan) on 2020-12-21 20:12:23 in reply to 12 [link] [source]

Not sure what do you mean by that.

Yes, we have 64-bit integer types in the source tree, and use them in some places (e.g. calculating the db stats), but fossil's SCM-specific bits do not, insofar as i recall, actually make use of 64-bit integers anywhere. Its core-most internal data types use only int and unsigned int for all byte ranges and db record IDs. If you were to try to give it a blob of 5GB, fossil would not accept it.

(15) By anonymous on 2020-12-21 20:53:10 in reply to 13 [link] [source]

If you were to try to give it a blob of 5GB, fossil would not accept it.

blob_read_from_file() returns sqlite3_int64, so it should be able to accommodate the whole 64-bit range.

It's just the 64-bit handling in the Fossil code base is somewhat broken, mainly due to 64-bit type mismatches. Has been pointed out since long ago.

This sort of mismatch is very much exemplary in the blob_read_from_file() context. For example, blob_resize(Blob *pBlob, unsigned int newSize) explicitly takes unsigned int instead of any sort of 64-bit int available in Fossil scope, yet file_size() returns an i64 type.

Another mismatch case is in handling of size values, signed integer types vs size_t.

SQLite-side of Fossil is very much capable of storing the 64-bit blobs, it's just the type mismatches in Fossil artificially preclude this use.

Simply saying, 64-bit is "half-way" there in Fossil, it's more than 32-bit, yet not properly 64-bit...

(16) By Stephan Beal (stephan) on 2020-12-21 21:11:28 in reply to 15 [link] [source]

blob_read_from_file() returns sqlite3_int64, so it should be able to accommodate the whole 64-bit range.

The blob data type definition makes it impossible for blobs to exceed 32 bits:

https://fossil-scm.org/fossil/file?ci=trunk&name=src%2Fblob.c&ln=39-46

blob_read_from_file() is partially a proxy for other routines, all of which return int:

Aside from proxying those, it essentially acts as a proxy for a single call to fread(), which returns size_t, meaning...

Another mismatch case is in handling of size values, signed integer types vs size_t.

Noting that size_t has an unspecified size. It can legally be (and sometimes is) 32 bits (the assumption that it's 64 bits has bitten me in the past).

SQLite-side of Fossil is very much capable of storing the 64-bit blobs, it's just the type mismatches in Fossil artificially preclude this use.

Indeed, but it's still a limitation fossil currently has and, IMHO, there seems to be very little to change it because blobs of 4GB or larger are far outside of fossil's main purpose as a source code control system. Storing such blobs in fossil, even if they fit, would be exceedingly memory-inefficient and would make the repository unusable on all but relatively high-end machines: fossil would require 8GB+ of RAM in order to apply deltas to them (the size of the original file plus the size of the new version plus the size of the delta all have to (with fossil's delta API) be held in RAM concurrently).

There's no really compelling reason to "upgrade" fossil's blob type to 64-bit friendly, considering we'd have zero functional gain and an as-yet-unknown about of risk via the potential introduction of new bugs.

Simply saying, 64-bit is "half-way" there in Fossil, it's more than 32-bit, yet not properly 64-bit...

Only the numbers which really need to be able to exceed 4GB are 64-bit, and those are all (IIRC) in statistics-like routines, as opposed to SCM-level functionality.

(17) By Warren Young (wyoung) on 2020-12-21 21:30:56 in reply to 16 [link] [source]

The blob data type definition makes it impossible for blobs to exceed 32 bits:

Surely it's blob.size in the repo DB schema that actually matters, Blob.nUsed being only an internal runtime API that can be changed at will?

SQLite will reliably store a 64-bit value in an INTEGER column if we give it one. Whether Fossil then retrieves that value correctly remains to be seen, but the return of blob_read_from_file() seems promising.

size_t has an unspecified size

It should be 64 bits on a fully 64-bit host, as most modern systems are. The remaining pure 32-bit and hybrid 32/64-bit systems are less likely to have such large files in the first place.

Storing such blobs in fossil, even if they fit, would be exceedingly memory-inefficient

Agreed, but I don't see any good reason to reject contributions to allow it.

My blinkin' Chromebook has 8 GiB of RAM...

(18.1) By Stephan Beal (stephan) on 2020-12-21 22:04:38 edited from 18.0 in reply to 17 [link] [source]

Surely it's blob.size in the repo DB schema that actually matters, Blob.nUsed being only an internal runtime API that can be changed at will?

Hypothetically, yes, but i wouldn't want to be the one chasing down an obscure bug triggered by changing it ;).

The remaining pure 32-bit and hybrid 32/64-bit systems are less likely to have such large files in the first place.

Like the current RaspberryPi OS (which runs my main workstation since August or September):

[pi@pi4b8:~/tmp]$ cat foo.c
#include <stddef.h>
#include <stdio.h>
int main(void){
    printf("sizeof(size_t)=%u\n", (unsigned)sizeof(size_t));
    return 0;
}
[pi@pi4b8:~/tmp]$ gcc -o foo foo.c
[pi@pi4b8:~/tmp]$ ./foo 
sizeof(size_t)=4

Their fully 64-bit OS is still in beta.

Edit: this machine also has 8GB:

[pi@pi4b8:~/tmp]$ free -m
              total        used        free ...
Mem:           7925        2280        3120 ...
Swap:          8499           0        8499

Agreed, but I don't see any good reason to reject contributions to allow it.

Absolutely agreed.

My blinkin' Chromebook has 8 GiB of RAM...

But does it have 8GB free to allocate to deltifying a single 4GB blob? It probably does, including swap space, but a Raspberry Pi with 2GB RAM running from an SD card very likely doesn't. My home network's "pi hole" server (a network-wide DNS sink/ad blocker) runs on an ODroid U3 (a pi-like SBC) with only 2GB RAM, running from a MicroSD card with only 512MB swap space. It acts as a backup host for all of my fossil repos, pulling from them early each morning. Any repo which applied an update to a 3GB blob would trigger an OOM on such a device.

That's not at all to say that 4GB blobs should be forever verboten, just to point out that they can and would cause problems with systems still in use today. Because of that, and because (IMHO) such blobs really have no place in a source code repo, i'm personally in no hurry to see the blob class transformed to fully 64-bit capable. Once the capability is there, someone will (ab)use it and then wonder why they can't do a checkout on their "low-end" system with only 4GB RAM.

(19) By Chris (crustyoz) on 2020-12-21 22:33:38 in reply to 18.1 [link] [source]

Back in the days when 640 kB of RAM was more than enough for anyone (from IBM), the practice was to swap to disk when things got too big. And without the benefit of virtual memory.

My new Raspberry Pi400 with 4GB of RAM matches the capabilities of my main machine for the past 5 years. I thought I was going cheap with a $500 machine then. Now it costs $100 and takes zero incremental space on/under the desk since it's all in the keyboard.

My small repos are managing just fine on the 32 bit RaspberryPi OS with Fossil installed directly from the fossil-scm.org download page.

(14.1) By Warren Young (wyoung) on 2020-12-21 20:47:56 edited from 14.0 in reply to 12 [link] [source]

The fact that Fossil's internal JSON APIs are 64-bit clean when built on a modern host¹ doesn't tell us anything about whether Fossil actually sends any 64-bit integers out its JSON APIs or parse any integers as 64-bit on JSON inputs.

Therefore, I went on a bit of a walk through the code and found these possible cases:

Timestamps will require 64 bits in 2038 and beyond due to the 32-bit time_t rollover.
The repository file size in /json/stat if the repo is over 4 GiB. Then once the repo has grown some orders of magnitude larger than that, other counts could then exceed 2³². (BLOB count, delta count, RID values, etc.)

I can't see anything like a "file commit" API under /json, so I can't see any way that a client could send a value over 32 bits that the JSON API would need to handle correctly, even if someone works out a way around Fossil's current inability to commit such large files.²

That leaves only timestamp in /json/artifact needing such large integers about the time today's newborns are applying for university spots.

That caveat excludes building on 32-bit systems where 64-bit int types aren't available, forcing a fallback to long, and then only if it is in fact 32-bit. This only affects JSON string generation via strtol() per my JSON input comment above. Even on such platforms, JSON output is likely to be able to handle 64-bit integers.
I tested fossil ci with a ~8 GiB file I had laying around to a test repo and got two different failure modes, depending on build options. Therefore, APIs like /json/artifact won't be returning contentSize values needing integers over 32 bits until that's solved.

(9) By anonymous on 2020-12-21 04:07:30 in reply to 1 [link] [source]

Some of what I wanted to have is possible with fossil artifact and fossil test-content-put, although test-content-put is only a test function and doesn't have the option to specify the repository to use (as far as I know), doesn't have an option to specify the hash of the previous version, won't output the hash of the new artifact (instead the output is the row ID, I think), and I don't know about the possibility to automatically parse it as a new structural artifact if it is a structural artifact.