Private branches: upstream references to missing artifacts

(1) By Florian Balmer (florian.balmer) on 2019-08-29 15:46:44 [link] [source]

Predecessor manifests of private branches are not recorded in the manifest P card of merge children, so sync'ing of branches that have merged-in private branches does not introduce dependencies to missing parents in the upstream repository. Also, control artifacts referring to private branches are also marked private, to exclude them from sync'ing.

However, merging a private branch with the --integrate option adds a +closed tag referring to the merged-in (and now closed) leaf of the private branch to the check-in manifest of the merge child, generating a reference to a missing artifact in the upstream repository.

Is this something worth addressing? Possible solutions:

Have fossil merge --integrate abort if a private branch is merged-in.
Have fossil merge --integrate silently omit the +closed tag from the merge child manifest, and maybe report to the user that the private branch can still be closed explicitly by fossil amend --close?
Mention this behavior in the Private Branches document?

(2) By anonymous on 2019-08-29 16:40:26 in reply to 1 [source]

Since other control artifacts that reference a private branch are market private, seems to me that there is a bug when adding the control artifact for the +closed tag.

However, it occurs to me that these control artifacts you mention might be what's marking the branches as private. If that's the case, maybe other control artifacts referring to the same branches are not marked private.

Also, given that private branches can, at some future time, be published, I wonder if maybe using a "weak reference" would be a better idea. If the referenced artifact is present, great. If not, don't ask for it. That way, if/when a private branch is published, the merge relation can be shown.

Also, fossil push --private will push private branches, so that's another situation where weak references would be beneficial.

(3) By Florian Balmer (florian.balmer) on 2019-08-30 15:09:23 in reply to 2 [link] [source]

... seems to me that there is a bug when adding the control artifact for the +closed tag.

The +closed tag is not in a separate control artifact, but in the check-in manifest of the merge child. This shortcut feature of fossil merge --integrate obviates the need of an extra step with another control artifact to close branches right after merging.

However, it occurs to me that these control artifacts you mention might be what's marking the branches as private.

The "private" status of artifacts seems to be tracked in the PRIVATE table, and not determined by tags or properties -- unless by the reconstruct command!

The reconstruct command regenerates the PRIVATE table by adding check-ins with the "private" tag (TAG_PRIVATE==6) and their exclusive dependencies.

However, not only does it seem like the TAGXREF table of a repository with private branches has no "private" tag entries (there's just check-ins with branch name "private", if not specified otherwise by --branch). But also because the "private" tag entries are not exported by the deconstruct command, the reconstruct command cannot know about them.

Maybe that the marking of private artifacts was changed, from tagging to tracking RIDs in the PRIVATE table, but the reconstruct command was not updated? The rebuild command uses the same function to reconstruct the PRIVATE table, yet this seems to be a no-op, as the PRIVATE table is already populated once reconstruct_private_table() is called, and no new entries are added.

Should fossil deconstruct --private be updated to export the private RIDs→UUIDs to, say, a .private file, so the reconstruct command can use the UUIDs→RIDs to repopulate the PRIVATE table?

I wonder if maybe using a "weak reference" would be a better idea.

Any UUID appearing in a check-in manifest or control artifact is added to the BLOB table, and remains orphaned as long as no further content is added.

For example, store the following forged control artifact to close a non-existing leaf check-in in the file sample-artifact.txt (with LF line endings):

D 2019-08-30T00:00:00.000
T +closed f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0
U username
Z dbfdbf946279e17d9df7e17a124f2517

Now add it to an open repository, rebuild it, and check for missing artifacts:

fossil test-content-put sample-artifact.txt
fossil rebuild
fossil test-missing

The output is:

MISSING: f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0
...

Some complex meta-artifact layer would have to be introduced to indicate that f0f0f0f0f0... is a private artifact, for example, that should be treated as a "weak reference".

(4) By Florian Balmer (florian.balmer) on 2019-08-30 16:28:18 in reply to 1 [link] [source]

Some history and comments about the changes from tagging (T +private, TAG_PRIVATE==6 in TAGXREF) to the RID→PRIVATE table as the method to mark private content:

2012-10-13 Check-in: Omit the "private" tag from private check-ins. This opens up the possibility of publishing check-ins that were originally private. Fix the "deconstruct" command so that it omits private artifacts unless the --private option is used.
- Should deconstruct --private be enhanced to make the RID→PRIVATE table available to reconstruct?
2012-02-21 Check-in: Enhance the "fossil rebuild" command so that it looks at "private" --raw tags and rebuilds the PRIVATE table to contain (at least) the content that is tagged "private".
2011-03-26 Check-in: Add the --private option to the "fossil branch new" command. ...
- Note: Adds the T +private tag to check-in manifests. This code is still in use today, and should probably be removed?
2011-03-21 Ticket: privateness not preserved
- Note: Using (T +private, TAG_PRIVATE==6 in TAGXREF) to mark private content.
2010-11-18 Ticket: Privacy attribution loss due to de/reconstruct ...
- From the ticket comments: deconstruct / reconstruct only preserve persistent data, e.g. the manifests and linked entries. The "private" flag is (intentionally) not persistent as it would prevent publishing the changes later.
- Note: As of today, with the RID→PRIVATE table, it's possible to publish private content, so this argument is superseded?

(5) By anonymous on 2019-08-30 22:33:21 in reply to 3 [link] [source]

Some complex meta-artifact layer would have to be introduced to indicate that f0f0f0f0f0... is a private artifact, for example, that should be treated as a "weak reference".

I was thinking that the scheme for supporting both SHA-3 and SHA-1 hashes as artifact IDs could be enhanced to also support "weak IDs" or "private IDs".

As I understand, it adds a prefix to SHA-3 hashes. Outside the hashing and hash validation, this prefix is treated as part of the artifact ID.

If there were a prefix to indicate "weak" or "private", any mechanism that otherwise either complain it's missing or ask the sender to send that artifact could, instead, ignore it.

Actually, I'd prefer that this be done by doubling the number of different prefixes instead of having a "pre prefix": The existing prefix for SHA-3 and a second one to indicate both SHA-3 and that the ID is "weak" or "private". (Might also need to support "weak" SHA-1 IDs because there are long existing repos that still have SHA-1 artifact IDs along side SHA-3 IDs.)

The reason is that the prefix for indicating SHA-3 already required my whole department (at work) to make large changes.[1]

Anyway, my thought to is to somehow preserve the information while allowing private branches to work without causing extraneous warnings.

[1] We make control modules for machinery. We also need to comply with standards used by the portable devices used to diagnose problems on that machinery. Before Fossil supported SHA-3, we embedded the first 6 characters of the commit ID in our software. After, we had to increase that to 10 characters. Because of the constraints imposed by the diagnostic messaging standards, changing the schema of a message is a big deal. A further length increase would be another big deal for us.

(6) By Florian Balmer (florian.balmer) on 2019-08-31 07:44:24 in reply to 5 [link] [source]

As I understand, it adds a prefix to SHA-3 hashes.

A "hash prefix" in Fossil refers to an abbreviated form of a "full-length hash", a convenience for easier visual comparison and easier typing, which is possible because collisions are rare.

The hashing algorithm for a full-length hash can be derived from the length (SHA1: 40 characters, SHA3-256: 64 characters).

To map a hash prefix to a full-length hash, Fossil checks whether the hash prefix matches the beginning of any of the full-length hashes it has recorded, be it SHA1, or SHA3-256. There's no other way to determine whether a hash prefix maps to an SHA1 or to an SHA3-256 full-length hash (at least, if the length of the hash prefix does not exceed 40 characters).

(7) By Florian Balmer (florian.balmer) on 2019-08-31 10:32:40 in reply to 5 [link] [source]

... Before Fossil supported SHA-3, we embedded the first 6 characters of the commit ID in our software. After, we had to increase that to 10 characters. ...

The hash-digits setting defines the length of hash prefixes displayed to the user, and the minimum value of 6 is independent of the hashing algorithm. The limit is somewhat longer for hash prefixes used in URLs generated by Fossil.

However, lower values for hash-digits increases the risk of hash prefix collisions.

(8) By Florian Balmer (florian.balmer) on 2019-09-04 11:40:35 in reply to 1 [link] [source]

I've just committed some work to address the issues mentioned on this post, and additionally made fossil commit ignore the --private option if already on a private branch, so the default branch name and color are not re-applied in the middle of private branches, to simplify management of private branches with distinct names.

https://www.fossil-scm.org/fossil/timeline?r=private-branches

I've done quite some testing over the last days, so for my part this is ready to be merged, but please let me know if there's any flaws.