Login
Artifact [f01bfb1ee0]
Login

Artifact f01bfb1ee058b7774727eed7ccbebea29587b11a:

Wiki page [TODOs] by stephan 2021-04-14 04:17:41.
D 2021-04-14T04:17:41.445
L TODOs
N text/x-markdown
P 66ef06c3d699a28e99aa347b5e804e23343d891c
U stephan
W 10158
# libfossil Notable TODOs

This page gives a high-level overview of the notable TODOs, or perceived TODOs, as well as non-TODOs (topics/APIs which are either out of scope or are way, way down the line).

# Core SCM (and closely adjacent) Features

In no particular order:

- **vfile.mhash** field: this was added sometime after the lib took a break and needs to be populated/handled by the lib code. (It seems to be only updated when merging?)

- **Port over checkout/repo fingerprint**: this allows detection of when a checkout's repo has been replaced by one with different RIDs. See fossil's `vfile.c:vfile_rid_renumbering_event()`.

- **Rebuild**: this feature exercises almost every major SCM feature of the framework except for merging and synchronization. Similarly, deconstruct/reconstruct might be useful for exercising the library. With the exception of missing ticket crosslinking, we have all of the pieces needed to implement this, and the `f-parseparty` test app does much of this work already.

- **Cross-linking of tickets**: the library can read and write them but they cannot yet be crosslinked. Currently, depending on a context-dependent flag, it may error out if it tries to crosslink one or it may silently skip over it.

- **Double-check crosslinking of other types** and make sure that we are not missing newly-added features.

- **Stash** support. First requires merge support.

- **Unversioned files** should be trivial to do.

- Maybe pending-moderation support (tickets, ticket comments, wiki edits), depending on how centrally-managed such data need to be (i.e. whether it can be delegated to an app layer).

# Security-relevant

But not otherwise SCM-relevant...

- **Port over `db_unprotect()` and `db_protect_pop()`** APIs, which allow a db to effectively be made read-only except for limited windows where specific sections of it needs to be writable. Related: `db.c:db_top_autorizer()`.

# Non-SCM TODOs

In no particular order...

- **Add [SPDX-style](https://spdx.dev/) license attribution** to all source files. This is ongoing.

- **Header file restructuring.** The current separation of the APIs into many `include/fossil-scm/*.h` files is somewhat confusing. The initial intent was to keep my low-end development system of the time from choking on syntax highlighting on one large file, but those days are largely behind me. It may make sense to combine those into 1 public API file, 1 internal API file, and the auto-generated config file(s). (Even then, it's big enough to choke emacs' syntax highlighting on lower-end systems like Raspberry Pi SBCs.)

- **Stop using char as booleans**. This tree historically uses `char` type for booleans. Now that the tree is C99, we can switch to the `bool` type. This is ongoing.

- **f-vdiff**: port in [](fossil:3504672187af59f0) in order to be able to select the diff width based on the terminal size.

- `fsl_appendf()` and friends use callbacks with printf-like return semantics because that's what the code those derived from used. These "really should" be changed to `fsl_output_f()` semantics because it's next to impossible to catch and report certain errors with printf-style semantics. The arguments for both types of callbacks are compatible, but their return value semantics are different, which means great care has to be taken when changing this to ensure that all cases are handled and we don't leave any calls with the old semantics (which would break stuff).

# Maybe (and Maybe Not) TODO

- **Undo** support.

- **Bring some of the configuration-related state up to date.** The infrastructure is there but the exact list of fossil-supported properties is lagging behind by several years. On the other hand, the majority of such config state is really application-specific and probably has no business being handled by this library. Some settings, e.g. `forbid-delta-manifests`, we internally honor to avoid Grief in downstream repositories. Others, such as the various globs settings, have API-level support and can be honored by the appropriate library APIs via toggles (e.g. the file-add API optionally honors the `ignore-glob` setting).

- **Symlinks**. i have always strongly disagreed with the addition of symlink support into fossil: platform-specific constructs simply have no place in the core of any SCM (with the "effectively necessary," as well as unobtrusive, exception of the executable bit). For platforms which don't support symlinks, fossil stores/manages them as plain text files with a single line holding the name of the referenced file. This is *very likely* the route the library will take, especially since the hassles symlink handling caused fossil in late 2020 (long story). Probably the only way the library will support proper symlinks is if someone who uses that feature adds and maintains it.

- **Backlinks**. Crosslinking "should" update the internal list of backlinks from certain text fields, but doing so requires parsing wiki/markdown-format text. See [`backlink.c` in the fossil tree](https://fossil-scm.org/home/file?name=src/backlink.c&ci=trunk) for the details. On the other hand, backlinks support only requires parsing wiki links, not the full grammar, so it might not be as painful as it initially sounds... though somewhat more for markdown, where we're required to do a multi-pass scan to handle its linking model. (We'd also need to handle verbatim blocks to avoid parsing links inside those blocks.)

- **Ticket support**. Ticket handling is surprisingly complicated, due largely to the customizability of the ticket database schema. If fossil-compatible ticket supports gets added to libfossil, it will very likely be because someone other than myself adds it! The core artifact data structure supports tickets, so the bits required for adding it are in place.


# Optimizations

- **Artifact parsing,** in particular of checkins, is *much* slower in libfossil than fossil. Some of this is easily attributable to more abstraction layers, but certainly not all of it. Some optimization of crosslinking speed is certainly in order. As a point of comparison, try `fossil rebuild` vs `f-parseparty --crosslink`. libf parses non-checkin types "plenty fast," e.g. 1600-odd control artifacts in roughly 6ms on my main computer. Checkins, however: <s>6m20s for 15504</s> 2m59s (debug) or 2m30s (non-debug) for 15504 checkins in the main fossil repo, as of this writing (just parsing, without crosslinking). On the sqlite3 repo it can only parse <s>approx. 3000 checkins in 10 minutes, at which point that test got cancelled</s> 24657 checkins in 10m9s (debug? Non-debug?). The reason for the serious speed degradation as the repo size increases is unclear.
  - This is at least partially (roughly 20-33%, based on basic tests) due to libfossil building in debug mode by default.
  - On 2021-03-24 the crosslinking was sped up by roughly 50% via the addition of a content cache identical to fossil's, but it still takes roughly 4m45s to parse and crosslink 16287 checkins in fossil's core repo with a debug libfossil build. 2m30s of that time is parsing.

- **Buffer caching.** The library internally has to use many temporary memory buffers. Some of those it reuses as much as it can (e.g. for filename normalization), but some operations (`fsl_content_get()` in particular) have to use several, potentially many, temporary buffers of arbitrary sizes, which can easily lead it to allocating hundreds of thousands of times for a total of 1GB+, in a single session even if it only allocates a max concurrent memory of less than 15MB. We can probably improve this situation by installing a buffer cache intended primarily for use with `fsl_content_get()`, in which we store some number of buffers totaling some certain max amount of concurrent memory. This could be achieved relatively inexpensively by either hard-coding a buffer array size (e.g. 10) or modifying `fsl_buffer()` to be a singly-linked list, the links being used solely for such a cache, and keeping the buffers in alloced-size order. The catch there, however, is that the decompression and de-deltification steps effectively makes reuse of such buffers next to impossible because we cannot easily and efficiently do those operations in-place in existing buffers.

# Remote Synchronization

- This will(?) be implemented in terms of abstract streaming APIs, very possibly the ones the library already uses for the majority of its file I/O and abstracting output streaming (e.g. it uses an abstract output stream for diff output, rather than writing directly to a buffer).

- This will almost certainly be one of the last major features.

# Wiki Parsing and Rendering

- This set of features hovers *right on the edge* of out-of-scope for the core libfossil. Rendering is necessarily output-format-specific and the library has no business defining such outputs. Also...

- The fossil implementations of these are not written with port-friendliness in mind, so a complete reimplementation would possibly be necessary.

- In order to support near-arbitrary applications the wiki parsers need to be implemented in such a way that clients can customize (via hooks/callbacks) how links/references are generated.

- The venerable Fossil Wiki format has the lowest priority. In practice markdown has easily taken the lead, and only old (pre-markdown) docs tend to be maintained in that format.


# Non-TODOs

- **Any and all UI-related elements**, including HTML, CSS, and JavaScript. The library will enable such applications but will not provide, e.g., an HTML framework beyond *perhaps* (maybe) wiki rendering.

- **Scripting of tickets**. There are no current plans to 100% mimic fossil's th1 scripting of tickets. Fossil's use of TH1 was one of convenience, not long-term practicality. The library itself will have no "official" scripting language, but is designed specifically to make tying it to scripting engines relatively straightforward. We have a standalone copy of TH1 which "could" be integrated with little work, but doing so would not be an ideal path to go down.
Z 622db567897e311a299880f3df44d32c