Random trivia about the fossil sync protocol

(1) By Stephan Beal (stephan) on 2025-06-25 11:31:48 [source]

Random Trivia about Fossil's Sync Protocol

A couple of weeks ago, work started on libfossil's support for fossil's sync protocol. Though there's much to do there, it's now at the point where it can fire off and read an anonymous clone request over http(s), and the internal "transport" API for doing that is agnostic about the details of the actual data transport. The proof of concept is being built upon shelling out to curl and piping the data through it, but the library doesn't know that - it just deals with the I/O API behind which curl is hidden. (We'll be able to do SSH the same way. Once that's working, libfossil will be taught to speak to sockets, rather than having to shell out to curl.) The library does not yet process the sync'd data - first we had to teach it to read the data, and that part now seems (since yesterday) to be working well.

This represents my first foray into the fossil sync protocol (after 17+ years of working with/on it) and much time has been spent exploring how cloning works. This post is just about sharing some of the trivia discovered during that process for these reasons:

To help cement it in my head by "talking through it."
To demonstrate that libfossil is "this close" to having native fossil sync support. (It currently shells out to fossil to do that, but that's cheating. But it's easy to implement, so cheating is okay.)
For the 1.3-ish of you who may be interested in the nerd details of the sync protocol.

Fossil's sync protocol has 3 levels of cloning:

Version 1: fetches a long list of "igot" cards in a single response, which is the server essentially telling you "these are the IDs all artifacts on this server". The intent is that the client respond with a list of which ones of those it wants.
Version 2: starts sending all available artifacts, up to some size limit, and then sends you a sequence number which tells you "there's more - ask again and give me this number to pick up where you left off." The idea is that the cloner performs multiple round-trips, each one containing the next sequence number, to collect the whole set. This version optionally compresses the whole response payload. Compression saves a good deal of space (yesterday it was 90mb vs 260mb on fossil's own repo) but also takes a whole lot longer for the server to prepare than an uncompressed response. Some timing info is given below.
Version 3: works a lot like version 2 but the full response is not compressed. Instead, each file's content is compressed individually and wrapped in the sync protocol's text-line-based protocol which tell us how big those chunks are. This is generally faster than a compressed v2 response, but also has a slightly larger over-the-wire footprint (103mb vs 93mb on fossil's repo) and the client has to spend literally 90%+ of its time decompressing the file content (but does not have to buffer the whole response to uncompress the pieces, like v2 compression requires, so v3 takes far less peak memory). This version requires about half as many round-trips to the server than v2 does: 20 vs 41 for fossil's own repository (yesterday is was 19 vs 40).

Metrics...

Part of the experimentation process is collection of metrics, and what follows is taken from this test app.

The metrics below give timing info for "read()" and "submit()" operations. The I/O model looks something like this:

Create a "sync channel" object. This has a core interface and a handful of methods which must be populated by "concrete implementations" (subclasses). Currently the library supports shelling out to an arbitrary app to handle the communication, but needs one distinct function which is specific to each binary so that the command-line invocation can be suited to that binary.
Configure a "state" object for that channel. That configuration is necessarily channel-dependent but most of it has sensible defaults. e.g. we can get the last-synched URL from the repository db.
"init()" the channel to tell it to get ready to receive an incrementally-built sync request body.
Incrementally write out a sync request body to the channel. For an anonymous clone, that's literally two short lines to tell the server which version of the client we are (which we have to fake, of course, because we're not fossil) and to tell the server what we'd like to do, e.g. "clone 3 1"¹.
"submit()" the request. In the case of a shelled-out binary, this is when it's invoked and given the request body.
"read()" the response using a set of 3 primitive read operations required by all subclasses: read one line, read a fixed number of bytes, or read all input (required when the library has to buffer the full input, as it does for version 2 compressed responses, otherwise each subclass would need to handle that part on their own). This is where the real work happens but most of it is in the library, as opposed to the subclasses.
If the response indicates there's more input to fetch, we go back to step 3, carrying over the state which told us to continue so that the server knows where we left off. Recall that all fossil communication is over HTTP, which is stateless, so any necessary state has to be repeated back to the server on subsequent requests.

Version 1 Clone

$ ./f-test-sync -u http://localhost:8081 -c 1 
url: http://localhost:8081
Connecting using: /usr/bin/curl <snip> http://localhost:8081
Sync transfer metrics:
Bytes written:   49
Cards read:      64,384
    igot         64,382
    pragma       1
    push         1
Bytes read uncompressed: 3,625,680
Wait on submit():        0.290      ms:      0.145 user, 0.145      system
Wait on read():          22.914     ms:      8.782 user, 14.132     system
Total sync-specific run time: 48.346 ms of CPU time (18.317 user, 30.029 system)

That just fetches the "igot" cards, not the file content or any other info. The client would be responsible for constructing round-trips to tell the server which of those cards it wants. Since that's the legacy fossil sync protocol, and is significantly less efficient than its successors, there's little reason for the library to support this mode (but supporting cloning at all requires being able to read such responses, so it does that part).

If compression is enabled (add the -z flag) then the over-the-wire size is cut right at 50%.

Version 2 Clone without Compression

$ ./f-test-sync -u http://localhost:8081 -c 2
url: http://localhost:8081
Connecting using: /usr/bin/curl <snip> http://localhost:8081
Starting round-trip #2...
<snip>
Done processing 41 clone round-trip(s)
Sync transfer metrics:
Bytes written:   2,331
Cards read:      64,505
    clone_seqno  41
    file         64,382
    pragma       41
    push         41
Bytes read uncompressed: 273,081,223
Largest card payload:    9,288,273
Wait on submit():        19.146     ms:      0.599 user, 18.547     system
Wait on read():          167.993    ms:     41.941 user, 126.052    system
Total sync-specific run time: 306.179 ms of CPU time (54.417 user, 251.762 system)

Version Clone with Compression

$ ./f-test-sync -u http://localhost:8081 -c 2 -z
url: http://localhost:8081
Connecting using: /usr/bin/curl <snip> http://localhost:8081
Starting round-trip #2...
<snip>
Done processing 41 clone round-trip(s)
Sync transfer metrics:
Bytes written:   2,331
Cards read:      64,505
    clone_seqno  41
    file         64,382
    pragma       41
    push         41
Bytes read uncompressed: 273,081,223
Bytes read compressed:   93,199,847
Largest decompr. buffer: 12,951,819
Largest card payload:    9,288,273
Wait on submit():        31.702     ms:      2.808 user, 28.894     system
Wait on read():          167.207    ms:     57.491 user, 109.716    system
Wait on uncompress():    1415.645   ms:   1320.731 user, 94.914     system
Total sync-specific run time: 1698.485 ms of CPU time (1407.003 user, 291.482 system)

"Wait on uncompress" is spent waiting on decompression of the full response.

This approach is tremendously memory-hungry on both the client and the server.

Version 3

$ ./f-test-sync -u http://localhost:8081 -c 3
url: http://localhost:8081
Connecting using: /usr/bin/curl --silent -X POST -H 'Content-Type: application/x-fossil-uncompressed' --data-binary '@/tmp/stephan/libfossil-popen-request~qhHvFOx6qa_mo5j9rdcX77Cp' --output /tmp/stephan/libfossil-popen-response~NSabzT8XaUMiQjjGTDJafvOe http://localhost:8081
Starting round-trip #2...
<snip>
Done processing 20 clone round-trip(s)
Sync transfer metrics:
Bytes written:   1,135
Cards read:      64,442
    cfile        64,382
    clone_seqno  20
    pragma       20
    push         20
Bytes read uncompressed: 103,268,140
Largest card payload:    2,396,592
Wait on submit():        10.433     ms:      1.785 user, 8.648      system
Wait on read():          120.935    ms:     91.337 user, 29.598     system
Wait on uncompress():    1554.033   ms:   1399.749 user, 154.284    system
Total sync-specific run time: 1814.386 ms of CPU time (1578.312 user, 236.074 system)

"Bytes read uncompressed" is the total of the response payloads before the "cfile" cards were decompressed. "Wait on uncompress" is time spent waiting on decompression of each "cfile" card.

Again, This Time with In-memory Response Buffering

The above tests read the response via a temp file written by the shelled-to binary. We currently buffer the response to a temp file because reading directly from its stdout file handle has proven to be somewhat flaky (very possibly a bug on my end).

All of the above tests can be run with full in-memory buffering of responses by adding the -b 2 flag. In that mode, the library extracts the whole response in a single go into a buffer, decompresses it, and future read() operations of that one response will come from that buffer instead of the I/O channel.

Curiously, buffering the whole response to memory is not any faster than using a temp file - the latter is more performant, sometimes overwhelmingly, in every comparison run so far. A temp file is also far, far lighter in terms of peak RAM usage. The full-buffer approach is internally simpler, though.

^{^} The docs say to start at zero, but zero is not legal. The docs will be updated shortly. For a while there i was concerned that the library might have an off-by-one error which caused them to be missing one artifact.