Serious bug: technote edits result in invalid control artifacts

(1) By Stephan Beal (stephan) on 2021-02-10 21:20:59 [source]

Management summary: when editing technotes, versions after the initial post are not stored properly. They are effectively corrupt. This has been the case since at least Dec. 31, 2015, possibly earlier.

Details...

While running tests on libfossil's control artifact parser i stumbled across a malformed artifact. Naturally, i blamed libfossil for it and shunned the artifact. Then i found another and, upon closer examination, found that it had been added not by libfossil but by fossil's technote editor.

Those of you well-versed in fossil's file format will spot it here:

https://fossil-scm.org/home/file?ci=trunk&name=src%2Fevent.c&ln=276-283

Which results in a control artifact like:

[stephan@nuc:~/fossil/libfossil/f-apps]$ ./f-acat rid:7491
C Major\smilestone:\sSHA3\sartifact\shash\ssupport
D 2021-02-09T03:25:46.050
E 2021-02-09T01:10:45 24bf6dd1edd3eb825ba08969e04ad221cae00c7b
P 5307cee0a1c345c24fc0c59da3e4e7cf6b7320fc
N text/x-markdown
T +bgcolor * #ff8000
T +sym-milestone *
U stephan
W 986
...

Looks very closely.

Spoiler alert:

[stephan@nuc:~/fossil/libfossil/f-apps]$ ./f-parseparty 
offending artifact: 7491
./f-parseparty: ERROR #128 (FSL_RC_CA_SYNTAX): Cards are not in strict lexical order

A fix will be committed momentarily, but the result of this bug is that technote edits after the initial post cannot be parsed as control artifacts and therefore will not be processed as such. The initial technote post does not have this problem because it cannot have a P-card.

(2) By Richard Hipp (drh) on 2021-02-10 21:26:47 in reply to 1 [link] [source]

I seem to recall that there is a test-parse of the control artifact prior to committing the transaction, and that if the parse fails (because cards in the wrong order or for any other reason) then the transaction is rolled back instead. Is that not happening for TechNotes?

Perhaps this is an indication of how infrequently TechNotes are actually used?

(3) By Stephan Beal (stephan) on 2021-02-10 21:55:53 in reply to 2 [link] [source]

... and that if the parse fails (because cards in the wrong order or for any other reason) then the transaction is rolled back instead. Is that not happening for TechNotes?

Apparently not but i cannot yet explain why. Both the technote editor page and wiki editor CLI commands route through event_commit_common(), but i just happen to know that the two technotes i saw the problem in were edited via the UI, not the CLI.

event_commit_common() rolls back if manifest_crosslink() fails, and i'm currently at a loss as to how the crosslink, which calls manifest_parse(), could succeed. One of my offending artifacts is only a day old, so we cannot attribute it to a bug which has since long been fixed.

i'm still trying to walk through all of the steps to find the loophole, but nothing is sticking out to me at the moment.

(4) By Stephan Beal (stephan) on 2021-02-10 21:58:06 in reply to 3 [link] [source]

event_commit_common() rolls back if manifest_crosslink() fails, and i'm currently at a loss as to how the crosslink, which calls manifest_parse(), could succeed.

Here we go:

$ cat 2c24c62c708e2ce863a9f0317ef8304b8ec89a10 
C Minor\smilestone:\sRaspberry\sPi
D 2021-02-09T03:24:22.820
E 2014-05-29T16:48:38 20d84794a1eeb32d2448ddee261b7dab8341f365
P a661ca1b9f91c6b8db09eb37ddd457aa87a84376
N text/x-fossil-wiki
T +bgcolor * #ff8000
U stephan
W 567
...

$ f test-parse-manifest 2c24c62c708e2ce863a9f0317ef8304b8ec89a10 
manifest_is_well_formed() reports the input is ok
manifest_parse() worked

Doh. That should definitely not have passed.

(5.1) By Stephan Beal (stephan) on 2021-02-10 22:18:28 edited from 5.0 in reply to 3 [link] [source]

... but nothing is sticking out to me at the moment

Summary: the var which is there to check "is the current card lexically equal to or greater than the previous card" was never being re-assigned in the loop, so that check always passed. Here's a proposed fix:

https://fossil-scm.org/home/timeline?r=manifest-sort-check

But it needs more testing and could use a better error message than the one it ends up generating:

$ f test-parse-manifest 2c24c62c708e2ce863a9f0317ef8304b8ec89a10 
manifest_is_well_formed() reports the input contains errors
ERROR: line 4: extra characters at end of card

That is, per the code path, the expected message, but it's not really indicative of what went wrong.

(6) By David Vines (djvines) on 2021-02-10 22:21:52 in reply to 3 [link] [source]

At least in theory the CLI for technotes is covered by the tests in wiki.test - at least I tried to cover all the cases that seemed relevant when I added the code to support attachments in tech notes.

That said my own use case simply creates a tech note and adds attachments via the CLI and doesn't edit them.

(7) By Stephan Beal (stephan) on 2021-02-10 22:29:16 in reply to 6 [link] [source]

and adds attachments via the CLI and doesn't edit them.

This only triggers on edits. If i understand correctly, fossil is (before my fix, above) actually parsing those, but it's doing so very much against its specification, which ishow the problem survived in the wild for so long. Once this fix is rolled out, affected manifests will no longer parse.

It was a mass test of the libfossil manifest parser (on all artifacts in that tree) which uncovered it rather by accident.

(8) By Stephan Beal (stephan) on 2021-02-11 00:08:53 in reply to 1 [link] [source]

Management summary: when editing technotes, versions after the initial post are not stored properly. They are effectively corrupt. This has been the case since at least Dec. 31, 2015, possibly earlier.

Follow-up: two separate, but closely related, problems have been found and corrected. Insofar as we know, only edited technotes were ever affected, and they were allowed to be affected by a lower-level bug. The main fix, for those who are interested, is in:

https://fossil-scm.org/home/info/023fddeec4029306

To avoid causing that any edited technotes in client repos silently disappear after this fix is applied, a special exception to the artifact syntax's strict rules is now accepted for that one tiny case. For all others, the ordering of the artifact's lines is now strictly enforced.

(9) By sean (jungleboogie) on 2021-02-11 01:45:45 in reply to 8 [link] [source]

Thank you for the report and investigation.

Reviving libfossil has already paid off!

(10) By anticrisis on 2021-02-11 07:40:16 in reply to 8 [link] [source]

What a great find. I just read the updated file format document here:

https://www.fossil-scm.org/home/doc/trunk/www/fileformat.wiki#outofordercards

I suggest you call this the "P vs NP exception." I don't know why, but it has a nice ring to it.

(11) By Andreas Kupries (aku) on 2021-02-11 08:10:20 in reply to 8 [link] [source]

I had wondered about the migration path for this one before I saw the above message.

I am still wondering if it would make sense to

detect the bad artifacts,
replace them with a fixed artifact, and then
shun the bad original

While this would not truly work for commits, where we have an entire DAG of artifacts referencing each other, for wiki pages and tech notes the higher level structure is IIRC not as entangled, and thus might be workable.

... Or not, if there can be dependent control artifacts which can reference the bad one.

(12.1) By Stephan Beal (stephan) on 2021-02-11 11:59:36 edited from 12.0 in reply to 11 [link] [source]

While this would not truly work for commits, where we have an entire DAG of artifacts referencing each other, for wiki pages and tech notes the higher level structure is IIRC not as entangled, and thus might be workable

i wondered the same, but the P card of the next edit would also have to be replaced, all the way up the edit chain (noting that there are no forks, so it would be linear, and very likely a short chain).

It "could work," i think, but whether it's worth the effort is another question entirely. Maybe as an interesting exercise.

The current workaround, though unfortunate, was trivial and only affects historical data - new ones are enforced. The inadvertent lack of sort-checking which allowed the edits to get through could have bitten us far more spectacularly than it did. We got off easy!

(13) By Stephan Beal (stephan) on 2021-02-11 11:23:50 in reply to 11 [link] [source]

shun the bad original

On second thought, i don't think shunning would be strictly necessary: all technotes of a given lineage have the same ID, and when a technote is requested for rendering, the most recent one with that ID is selected.

If, during the hypothetical editing process, we simply bumped the D-card (timestamp) by 1ms for the updated copy, it could live side by side in the db with the older one.

Maybe. We would still have to change the P-card, but we'd be effectively creating a new chain which, thanks to technotes being selected by last-one-wins, would effectively "hide" the old one. The old one would still live on in the db, but (A) would not show up in the UI unless referenced by hash and (B) if the special-case exception were removed then they would no longer parse as artifacts, so would effectively become unused blobs in the db. Dead weight, but not a significant amount of it.

Maybe.

(14) By Andreas Kupries (aku) on 2021-02-11 11:56:54 in reply to 12.0 [link] [source]

Oh yes, fully agreed on having gotten of easy.