Random trivia about the fossil sync protocol
(1) By Stephan Beal (stephan) on 2025-06-25 11:31:48 [source]
Random Trivia about Fossil's Sync Protocol
A couple of weeks ago, work started on libfossil's support for fossil's sync protocol. Though there's much to do there, it's now at the point where it can fire off and read an anonymous clone request over http(s), and the internal "transport" API for doing that is agnostic about the details of the actual data transport. The proof of concept is being built upon shelling out to curl
and piping the data through it, but the library doesn't know that - it just deals with the I/O API behind which curl is hidden. (We'll be able to do SSH the same way. Once that's working, libfossil will be taught to speak to sockets, rather than having to shell out to curl.) The library does not yet process the sync'd data - first we had to teach it to read the data, and that part now seems (since yesterday) to be working well.
This represents my first foray into the fossil sync protocol (after 17+ years of working with/on it) and much time has been spent exploring how cloning works. This post is just about sharing some of the trivia discovered during that process for these reasons:
To help cement it in my head by "talking through it."
To demonstrate that libfossil is "this close" to having native fossil sync support. (It currently shells out to fossil to do that, but that's cheating. But it's easy to implement, so cheating is okay.)
For the 1.3-ish of you who may be interested in the nerd details of the sync protocol.
Fossil's sync protocol has 3 levels of cloning:
Version 1: fetches a long list of "igot" cards in a single response, which is the server essentially telling you "these are the IDs all artifacts on this server". The intent is that the client respond with a list of which ones of those it wants.
Version 2: starts sending all available artifacts, up to some size limit, and then sends you a sequence number which tells you "there's more - ask again and give me this number to pick up where you left off." The idea is that the cloner performs multiple round-trips, each one containing the next sequence number, to collect the whole set. This version optionally compresses the whole response payload. Compression saves a good deal of space (yesterday it was 90mb vs 260mb on fossil's own repo) but also takes a whole lot longer for the server to prepare than an uncompressed response. Some timing info is given below.
Version 3: works a lot like version 2 but the full response is not compressed. Instead, each file's content is compressed individually and wrapped in the sync protocol's text-line-based protocol which tell us how big those chunks are. This is generally faster than a compressed v2 response, but also has a slightly larger over-the-wire footprint (103mb vs 93mb on fossil's repo) and the client has to spend literally 90%+ of its time decompressing the file content (but does not have to buffer the whole response to uncompress the pieces, like v2 compression requires, so v3 takes far less peak memory). This version requires about half as many round-trips to the server than v2 does: 20 vs 41 for fossil's own repository (yesterday is was 19 vs 40).
Metrics...
Part of the experimentation process is collection of metrics, and what follows is taken from this test app.
The metrics below give timing info for "read()" and "submit()" operations. The I/O model looks something like this:
Create a "sync channel" object. This has a core interface and a handful of methods which must be populated by "concrete implementations" (subclasses). Currently the library supports shelling out to an arbitrary app to handle the communication, but needs one distinct function which is specific to each binary so that the command-line invocation can be suited to that binary.
Configure a "state" object for that channel. That configuration is necessarily channel-dependent but most of it has sensible defaults. e.g. we can get the last-synched URL from the repository db.
"init()" the channel to tell it to get ready to receive an incrementally-built sync request body.
Incrementally write out a sync request body to the channel. For an anonymous clone, that's literally two short lines to tell the server which version of the client we are (which we have to fake, of course, because we're not fossil) and to tell the server what we'd like to do, e.g.
"clone 3 1"
1."submit()" the request. In the case of a shelled-out binary, this is when it's invoked and given the request body.
"read()" the response using a set of 3 primitive read operations required by all subclasses: read one line, read a fixed number of bytes, or read all input (required when the library has to buffer the full input, as it does for version 2 compressed responses, otherwise each subclass would need to handle that part on their own). This is where the real work happens but most of it is in the library, as opposed to the subclasses.
If the response indicates there's more input to fetch, we go back to step 3, carrying over the state which told us to continue so that the server knows where we left off. Recall that all fossil communication is over HTTP, which is stateless, so any necessary state has to be repeated back to the server on subsequent requests.
Version 1 Clone
$ ./f-test-sync -u http://localhost:8081 -c 1 url: http://localhost:8081 Connecting using: /usr/bin/curl <snip> http://localhost:8081 Sync transfer metrics: Bytes written: 49 Cards read: 64,384 igot 64,382 pragma 1 push 1 Bytes read uncompressed: 3,625,680 Wait on submit(): 0.290 ms: 0.145 user, 0.145 system Wait on read(): 22.914 ms: 8.782 user, 14.132 system Total sync-specific run time: 48.346 ms of CPU time (18.317 user, 30.029 system)
That just fetches the "igot" cards, not the file content or any other info. The client would be responsible for constructing round-trips to tell the server which of those cards it wants. Since that's the legacy fossil sync protocol, and is significantly less efficient than its successors, there's little reason for the library to support this mode (but supporting cloning at all requires being able to read such responses, so it does that part).
If compression is enabled (add the -z
flag) then the over-the-wire size is cut right at 50%.
Version 2 Clone without Compression
$ ./f-test-sync -u http://localhost:8081 -c 2 url: http://localhost:8081 Connecting using: /usr/bin/curl <snip> http://localhost:8081 Starting round-trip #2... <snip> Done processing 41 clone round-trip(s) Sync transfer metrics: Bytes written: 2,331 Cards read: 64,505 clone_seqno 41 file 64,382 pragma 41 push 41 Bytes read uncompressed: 273,081,223 Largest card payload: 9,288,273 Wait on submit(): 19.146 ms: 0.599 user, 18.547 system Wait on read(): 167.993 ms: 41.941 user, 126.052 system Total sync-specific run time: 306.179 ms of CPU time (54.417 user, 251.762 system)
Version Clone with Compression
$ ./f-test-sync -u http://localhost:8081 -c 2 -z url: http://localhost:8081 Connecting using: /usr/bin/curl <snip> http://localhost:8081 Starting round-trip #2... <snip> Done processing 41 clone round-trip(s) Sync transfer metrics: Bytes written: 2,331 Cards read: 64,505 clone_seqno 41 file 64,382 pragma 41 push 41 Bytes read uncompressed: 273,081,223 Bytes read compressed: 93,199,847 Largest decompr. buffer: 12,951,819 Largest card payload: 9,288,273 Wait on submit(): 31.702 ms: 2.808 user, 28.894 system Wait on read(): 167.207 ms: 57.491 user, 109.716 system Wait on uncompress(): 1415.645 ms: 1320.731 user, 94.914 system Total sync-specific run time: 1698.485 ms of CPU time (1407.003 user, 291.482 system)
"Wait on uncompress" is spent waiting on decompression of the full response.
This approach is tremendously memory-hungry on both the client and the server.
Version 3
$ ./f-test-sync -u http://localhost:8081 -c 3 url: http://localhost:8081 Connecting using: /usr/bin/curl --silent -X POST -H 'Content-Type: application/x-fossil-uncompressed' --data-binary '@/tmp/stephan/libfossil-popen-request~qhHvFOx6qa_mo5j9rdcX77Cp' --output /tmp/stephan/libfossil-popen-response~NSabzT8XaUMiQjjGTDJafvOe http://localhost:8081 Starting round-trip #2... <snip> Done processing 20 clone round-trip(s) Sync transfer metrics: Bytes written: 1,135 Cards read: 64,442 cfile 64,382 clone_seqno 20 pragma 20 push 20 Bytes read uncompressed: 103,268,140 Largest card payload: 2,396,592 Wait on submit(): 10.433 ms: 1.785 user, 8.648 system Wait on read(): 120.935 ms: 91.337 user, 29.598 system Wait on uncompress(): 1554.033 ms: 1399.749 user, 154.284 system Total sync-specific run time: 1814.386 ms of CPU time (1578.312 user, 236.074 system)
"Bytes read uncompressed" is the total of the response payloads before the "cfile" cards were decompressed. "Wait on uncompress" is time spent waiting on decompression of each "cfile" card.
Again, This Time with In-memory Response Buffering
The above tests read the response via a temp file written by the shelled-to binary. We currently buffer the response to a temp file because reading directly from its stdout file handle has proven to be somewhat flaky (very possibly a bug on my end).
All of the above tests can be run with full in-memory buffering of responses by adding the -b 2
flag. In that mode, the library extracts the whole response in a single go into a buffer, decompresses it, and future read() operations of that one response will come from that buffer instead of the I/O channel.
Curiously, buffering the whole response to memory is not any faster than using a temp file - the latter is more performant, sometimes overwhelmingly, in every comparison run so far. A temp file is also far, far lighter in terms of peak RAM usage. The full-buffer approach is internally simpler, though.
- ^ The docs say to start at zero, but zero is not legal. The docs will be updated shortly. For a while there i was concerned that the library might have an off-by-one error which caused them to be missing one artifact.