100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
|
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
|
-
+
-
+
|
- **Undo** support.
- **Bring some of the configuration-related state up to date.** The infrastructure is there but the exact list of fossil-supported properties is lagging behind by several years. On the other hand, the majority of such config state is really application-specific and probably has no business being handled by this library. Some settings, e.g. `forbid-delta-manifests`, we internally honor to avoid Grief in downstream repositories. Others, such as the various globs settings, have API-level support and can be honored by the appropriate library APIs via toggles (e.g. the file-add API optionally honors the `ignore-glob` setting).
- **BOMs**. Fossil's diff APIs internally convert their inputs to UTF8 and strip the BOM (if any) from them. libf does not do that. On the one hand i'm hesitant to do so because these blobs can be anything at all (not necessarily SCM controlled). On other other, for annotate's sake it might make sense to do so automatically because the user is passing in artifact IDs instead of file content. On the other other hand, the fossil routine for doing that (blob_to_utf8_no_bom()) is far, far more involved that simply stripping a BOM. i'm torn on whether that's the library's job or not, and really dislike having to either mutate the original inputs or reallocate them to make that conversion. OTOH, fossil does so.
- **Symlinks**. i have always strongly disagreed with the addition of symlink support into fossil: platform-specific constructs simply have no place in the core of any SCM (with the "effectively necessary," as well as unobtrusive, exception of the executable bit). For platforms which don't support symlinks, fossil stores/manages them as plain text files with a single line holding the name of the referenced file. This is *very likely* the route the library will take, especially since the hassles symlink handling caused fossil in late 2020 (long story). Probably the only way the library will support proper symlinks is if someone who uses that feature adds and maintains it.
- **Symlinks**. i have always strongly disagreed with the addition of symlink support into fossil: platform-specific constructs simply have no place in the core of any SCM (with the "effectively necessary," as well as unobtrusive, exception of the executable bit). For platforms which don't support symlinks, fossil stores/manages them as plain text files with a single line holding the name of the referenced file. This is *very likely* the only route the library will take to supporting symlinks, especially since the hassles symlink handling caused fossil in late 2020 (long story). Probably the only way the library will support proper symlinks is if someone who uses that feature adds and maintains it.
- **Backlinks**. Crosslinking "should" update the internal list of backlinks from certain text fields, but doing so requires parsing wiki/markdown-format text. See [`backlink.c` in the fossil tree](https://fossil-scm.org/home/file?name=src/backlink.c&ci=trunk) for the details. On the other hand, backlinks support only requires parsing wiki links, not the full grammar, so it might not be as painful as it initially sounds... though somewhat more for markdown, where we're required to do a multi-pass scan to handle its linking model. (We'd also need to handle verbatim blocks to avoid parsing links inside those blocks.)
- **Ticket support**. Ticket handling is surprisingly complicated, due largely to the customizability of the ticket database schema. If fossil-compatible ticket supports gets added to libfossil, it will very likely be because someone other than myself adds it! The core artifact data structure supports tickets, so the bits required for adding it are in place.
# Optimizations
- **Artifact parsing,** in particular of checkins, is *much* slower in libfossil than fossil. Some of this is easily attributable to more abstraction layers, but certainly not all of it. Some optimization of crosslinking speed is certainly in order. As a point of comparison, try `fossil rebuild` vs `f-parseparty --crosslink`. libf parses non-checkin types "plenty fast," e.g. 1600-odd control artifacts in roughly 6ms on my main computer. Checkins, however: <s>6m20s for 15504</s> 2m59s (debug) or 2m30s (non-debug) for 15504 checkins in the main fossil repo, as of this writing (just parsing, without crosslinking). On the sqlite3 repo it can only parse <s>approx. 3000 checkins in 10 minutes, at which point that test got cancelled</s> 24657 checkins in 10m9s (debug? Non-debug?). The reason for the serious speed degradation as the repo size increases is unclear.
- This is at least partially (roughly 20-33%, based on basic tests) due to libfossil building in debug mode by default.
- On 2021-03-24 the crosslinking was sped up by roughly 50% via the addition of a content cache identical to fossil's, but it still takes roughly 4m45s to parse and crosslink 16287 checkins in fossil's core repo with a debug libfossil build. 2m30s of that time is parsing.
- 2021-10-03 update: (`f-parseparty -t c --quiet --crosslink`) can load and re-crosslink the 2178 checkins in its own repo in about 4.5s. That same thing on fossil's 17099 checkins takes about 5m55s on the same machine.
- 2021-10-03 update: a debug build of (`f-parseparty -t c --quiet --crosslink`) can load and re-crosslink the 2178 checkins in its own repo in about 4.5s. That same thing on fossil's 17099 checkins takes about 5m55s on the same machine.
- **Buffer caching.** The library internally has to use many temporary memory buffers. Some of those it reuses as much as it can (e.g. for filename normalization), but some operations (`fsl_content_get()` in particular) have to use several, potentially many, temporary buffers of arbitrary sizes, which can easily lead it to allocating hundreds of thousands of times for a total of 1GB+, in a single session even if it only allocates a max concurrent memory of less than 15MB. We can probably improve this situation by installing a buffer cache intended primarily for use with `fsl_content_get()`, in which we store some number of buffers totaling some certain max amount of concurrent memory. This could be achieved relatively inexpensively by either hard-coding a buffer array size (e.g. 10) or modifying `fsl_buffer()` to be a singly-linked list, the links being used solely for such a cache, and keeping the buffers in alloced-size order. The catch there, however, is that the decompression and de-deltification steps effectively makes reuse of such buffers next to impossible because we cannot easily and efficiently do those operations in-place in existing buffers.
# Remote Synchronization
- This will(?) be implemented in terms of abstract streaming APIs, very possibly the ones the library already uses for the majority of its file I/O and abstracting output streaming (e.g. it uses an abstract output stream for diff output, rather than writing directly to a buffer).
|