Call for testers/feedbackers: new /pikchrshow

(1) By Stephan Beal (stephan) on 2022-06-06 18:24:19 [source]

Good evening, folks,

The past few weeks we've been tinkering with WebAssembly (WASM), first to create a "fiddle"-style app for sqlite3 and now to reimplement fossil's /pikchrshow app. An initial implementation is currently running at:

https://fossil.wanderinghorse.net/home/pikchrshow

and it's in the pikchrshow-wasm branch.

Any testing and/or usability feedback would be welcomed (just reply in this thread). The most significant difference from the current pikchrshow is that this one runs pikchr in the client's browser. That is, it does not have to send the user's input back to the server in order to render it. Removal of that round-trip allows it to support a "render as you type" option so that it can draw pikchrs as quickly as you can type them in.

Notes and caveats:

It shamelessly requires a relatively recent browser with WASM support.
This is an initial version. There are a few layout quirks to resolve (like the input area resizing in annoying ways as you type when auto-rendering is on) and a couple features to port over from the other app (namely stash/unstash, but also maybe the various preview modes).
This does not require any new tools to build the WASM file: we include a pre-compiled WASM build of pikchr.c in the source tree.

We don't have any immediate plans to outright remove the "legacy" mode, but this mode is far superior so will almost certainly become the default view for that app.

(2) By george on 2022-06-06 20:04:46 in reply to 1 [link] [source]

Stephan, thank you very much!
Is it possible to use underline WASM module from within other pages? For example to draw diagrams within custom ticket views (e.g. GANTT charts and the like)?

(4) By Stephan Beal (stephan) on 2022-06-06 20:19:30 in reply to 2 [link] [source]

Is it possible to use underline WASM module from within other pages?

Hypothetically yes, but i'm not sure it would be worth the effort. For tickets you can use /pikchrshow to prototype the pikchr and paste it in when you're ready.

(6) By george on 2022-06-06 20:30:06 in reply to 4 [link] [source]

Well... What I meant, was that some tricky TH1 code within /tktview page generates a big pikchr description from the database. But than does not bother the server to transform it into SVG. Instead it delegates this (presumably CPU-intense) task to a client's browser.

(7) By Stephan Beal (stephan) on 2022-06-06 20:33:42 in reply to 2 [link] [source]

Is it possible to use underline WASM module from within other pages? For example to draw diagrams within custom ticket views...

Some arguments against doing so:

i was recently told by a browser developer that browsers don't cache wasm files, or they prefer not to.
Wasm files aren't small: extsrc/pikchr.wasm, compiled for the smallest possible size, is right at 100kb. Then there's another 20-ish kb of JS glue code. The JS code can be cached, but the wasm file probably won't be and would be re-loaded for every ticket.

Wasm parts are far more suitable for single-page/long-running applications, as the initial high loading time isn't such an issue for those.

(3) By smd (scottdoctor) on 2022-06-06 20:11:27 in reply to 1 [link] [source]

When I click the "toggle render" button, the button becomes unreadable thereafter with dark text on a dark background. browser is Firefox.

(5) By Stephan Beal (stephan) on 2022-06-06 20:26:12 in reply to 3 [link] [source]

When I click the "toggle render" button, the button becomes unreadable thereafter with dark text on a dark background. browser is Firefox.

Insofar as i can tell that's an unfortunate combination of focus and skin-specific behavior. Why the darkmode skin does it but the other three i tried do not is currently a mystery :/. Clicking anywhere else on the page will take the focus back and restore the button state, so it's not a serious usability problem, just an annoyance. /wikiedit's buttons (mis)behave the same way.

Some of the other layout quirks were just resolved, though, so please reload if you still have it opened.

(8) By Stephan Beal (stephan) on 2022-06-06 22:23:58 in reply to 3 [link] [source]

When I click the "toggle render" button, the button becomes unreadable thereafter with dark text on a dark background.

That quirk of the darkmode skin has been replaced with a less obnoxious one: buttons with focus now have a raised border instead of being colored illegibly. The border will still stay as long as they have focus, but it's not as glaringly unsightly as the prior configuration. That change is currently on the pikchrshow-wasm branch but will eventually make its way to trunk.

(9) By Florian Balmer (florian.balmer) on 2022-06-07 05:14:59 in reply to 1 [link] [source]

Regarding the related / support branch:

[15e7b49ef3]: Remove the '; charset=utf-8' suffix from response Content-Type headers. ...

I agree that adding '; charset=utf-8' to every HTTP response seems incorrect, and I even have a local patch to remove it¹ for compressed archives (generated through the Downloads links) and images (the only content types relevant to my use of Fossil, so far).

However, it seems that adding '; charset=utf-8' has advantages regarding how fast and efficient web browsers are able to process web pages, however, and therefore I'd like to vote for the encoding tag to be left in place², at least for HTML, JS, CSS, and plain text.

^{^} My approach to this is quite simple, with this check in the ( iReplyStatus!=304 ) block:

if( strncmp(zContentType,"image/",6)==0 ||
    fossil_strcmp(zContentType,"application/x-compressed")==0 ||
  ...
  /* Omit "charset=utf-8". */
}else{
  /* Add "charset=utf-8". */
  ...

^{^} Maybe a simple check for "text/"-prefixed content types similar to that above for images is sufficient for the moment, so no deeper tweaking of the mimetype list handling is necessary? Or the other way around, so that the tag is omitted just for "wasm/" prefixes?

(10) By Stephan Beal (stephan) on 2022-06-07 06:50:18 in reply to 9 [link] [source]

I agree that adding '; charset=utf-8' to every HTTP response seems incorrect, and I even have a local patch to remove it... Maybe a simple check for "text/"-prefixed content types similar to that above for images is sufficient for the moment, so no deeper tweaking of the mimetype list handling is necessary? Or the other way around, so that the tag is omitted just for "wasm/" prefixes?

That approach would be the fallback solution if it turns out that we need the charset tag. The current trunk adds the charset to all responses, which isn't technically correct but it's also worked just fine until we tried to serve wasm content: wasm loaders are very strict about the content-type header and refuse to load if there's a charset (wasm files are binary).

Another option would be to change the mimetype mapping table so that it appends ";charset=..." to the mimetype of all types which actually need it (most don't!). The one disadvantage to that is that is that certain places in the code do a string comparison on the mimetype and those would need to be trained to stop comparing at the semicolon.

Regarding applying the charset to "text/..." types: that will work for the majority of cases but there are exceptions to that pattern: image/svg+xml, application/json, and application/javascript come to mind.

However, it seems that adding '; charset=utf-8' has advantages regarding how fast and efficient web browsers are able to process web pages,

That is very useful information, thank you. i've been running that branch on my system since yesterday, with no charset tags, and it seems to work okay, but i will definitely dig through that article after this morning's dog walk.

In althttpd (from which much of fossil's HTTP code derives) we recently, because of this very same use case, extended the mimetype definitions to include a flag which specifies that a type should not get a charset flag. After back-porting that into fossil yesterday i discovered that it can't work without significant surgery in fossil because the mimetype determination and propagation is much different there. In althttpd it's centralized into one location, but in fossil it's scattered around approximately 50 places and the current interface isn't suitable for replacing "under the hood" without touching all of those places.

(11) By Stephan Beal (stephan) on 2022-06-07 07:55:00 in reply to 10 [link] [source]

That is very useful information, thank you. i've been running that branch on my system since yesterday, with no charset tags, and it seems to work okay, but i will definitely dig through that article after this morning's dog walk.

After reading that...

According to this post by the same author:

JS and CSS do not require explicitly encoding because they inherit the encoding from the file they're loaded from. The first line of our pages (the DOCTYPE header) declares the file as HTML5 and therefore UTF-8.
JSON is always UTF-8, no exceptions.
XML defaults to UTF-8. Note that pikchr SVG output is embedded directly in HTML, so inherits the document's encoding.
XHR requests assume UTF-8 unless explicitly told otherwise.

Any binary content we serve, primarily zip, tar, and non-SVG images, don't benefit from a charset declaration and adding one is technically incorrect (but having one has never caused us any grief for any mimetype except application/wasm). Fossil's support for client-defined mimetypes did not foresee this issue and does not support specifying a mimetype-specific charset.

Summary: the article's request that everybody label their character sets is an "in a perfect world we'd all do this" type of scenario. He also admits that many content creators don't declare explicit encodings and that world is somehow still functioning (possibly because the browsers do such a good job of detecting encodings). My recommendation at this point is to leave off the charset attribute altogether. In the very unlikely case that it breaks something, we can fall back to an approach of adding it only to certain mimetypes (or the opposite, eliding it only from certain types).

Yes, that admittedly bugs me at a pedantically-technically-correct level, but adding an incomplete solution, inferior to the browser-side ones, would bug me even more. Completely eliding the charset seems to be the lesser of the two evils by a significant margin.

(13) By Florian Balmer (florian.balmer) on 2022-06-08 04:44:44 in reply to 11 [link] [source]

Thanks for looking into this, and for your follow-up comments!

I don't have any "hard" arguments to keep the "charset" tag, just some weak considerations:

The linked articles apply to Firefox, and (for the most part) to Chrome, but for other browsers and/or command-line tools, the "charset" tag may have more impact.
Fossil has plain-text web pages, for example vpatch, and the HTTP header is the only place to put and encoding tag, for such cases.
If everybody ignored¹ the "charset" tag (when inappropriate) for years, and now suddenly WASM doesn't, it looks to me like WASM is the oddball to be treated as a special case, and not everybody else.

In any case, I'm curious if there will be new aspects or problems, in the next time.

^{^} That said, I've come to this when enabling HTTP reply compression for "image/vnd.microsoft.icon" in the process of making a local patch to re-assemble the original Fossil program icon from the individual parts scattered across the executable resource section and use it as the favicon. I'm not quite sure if I removed the "charset" tag for icons (and other binary formats relevant to my use of Fossil) due to a problem, or just for "correctness", but I can't find any notes in the comments and history of my local patches, right now.

(12) By Stephan Beal (stephan) on 2022-06-07 23:23:28 in reply to 1 [link] [source]

https://fossil.wanderinghorse.net/home/pikchrshow

Much has happened in that code since the initial announcement. It's now looking much snazzier and is a definite improvement over the legacy version. If no glaring issues are found or objections voiced, that will be merged into trunk in the next day or two.

The "legacy mode" pikchrshow will be retained for the foreseeable future (reachable via a link in the new one) but will eventually be killed off if the new one works out okay. Client-side features implemented using WASM is new territory for us and is still experimental (but promising) at this point.

@George: your "dreams" pikchr is now included in the selection list of examples.

(14) By Florian Balmer (florian.balmer) on 2022-06-08 05:04:32 in reply to 12 [link] [source]

Just to make sure I get it right: This means embedding a new built-in file worth 95 KB in the Fossil executable, containing the "same" pikchr code (probably minus the non-library main() function) already compiled into Fossil, but potentially (dangerously?) outdated (who's going to maintain/update "extsrc/pikchr.wasm", and on what schedule?), and then even serving the WASM package (which seems to be compressible by more than 50%) without HTTP compression at all ("Content-Encoding: gzip")?

I'm only moderately happy ;-)

(15) By Florian Balmer (florian.balmer) on 2022-06-08 05:32:32 in reply to 14 [link] [source]

Ok, I was reading the Building WebAssembly Components section in the updated Compiling and Installing Fossil document.

This all looks like quite a maintenance burden to me, and I can only hope this will continue to work painlessly in the future. Also, are the many introduced dependencies (some only for build-time, but not all?) compatible with Richard's statement that "... Fossil is not going to pick up a jQuery dependency"?

(16) By Stephan Beal (stephan) on 2022-06-08 06:29:01 in reply to 15 [link] [source]

This all looks like quite a maintenance burden to me, and I can only hope this will continue to work painlessly in the future.

If it eases your mind, the wasm bit wasn't my idea: Richard requested it because it allows for render-as-you-type editing and this particular app is a no-risk way to try it out. In terms of code, it's functionally almost identical to sqlite's new "fiddle" app, so it wasn't much effort to create via liberal amounts of copy/paste. In any case, wasm as a whole is new and experimental for all of us and it's possible that we'll find it problematic long-term and move away from it, but that currently seems unlikely.

There are not many fossil features which can feasibly be implemented this way because of fossil's monolithic nature and the requirement of having access to the repo db for most features (like markdown rendering, which would otherwise be nice to do client-side). There are currently no plans to reimplement any other fossil features in wasm.

Also, are the many introduced dependencies (some only for build-time, but not all?) compatible with Richard's statement that ["... Fossil is not going to pick up a jQuery dependency"]

There are the build-time dependencies, but every compiler/tool is such a dependency. There are no new runtime dependencies. The generated JS/wasm glue code is standalone and human-editable/maintainable. The build currently relies on the emscripten toolchain but i am actively exploring alternative build options to reduce the risk of getting locked in (they exist but require more "invasive" setup of the toolchains, whereas emscripten is trivial to get set up).

Unrelated: i will experiment with plain text output before committing (haha) to the current charset approach and will try to recruit Martin and Mark to verify that Safari isn't an issue. MSIE has been abandoned by MS, so supporting it is no longer a factor. You are correct about wasm being the oddball, but our current approach of charsetting all output is "just wrong." The most likely outcome is currently that we'll charset all text/... mimetypes and elide it on all others, as that would be a low-impact/lower-risk change.

(17) By Florian Balmer (florian.balmer) on 2022-06-08 06:38:00 in reply to 16 [link] [source]

Thanks for the feedback!

Also, when you're in the mimetype/charset code, maybe it's worth checking whether the WASM package can be delivered with HTTP compression?

(18) By Stephan Beal (stephan) on 2022-06-08 06:42:44 in reply to 17 [link] [source]

maybe it's worth checking whether the WASM package can be delivered with HTTP compression?

It can and will be. The only reason it's not is because the logic which guesses whether or not to compress is limited to a handful of mimetypes. Adding wasm to that check is on my todo list.

(19) By Florian Balmer (florian.balmer) on 2022-06-08 06:52:00 in reply to 18 [link] [source]

... the logic which guesses whether or not to compress is limited to a handful of mimetypes. Adding wasm to that check is on my todo list.

I know this because I've added "image/vnd.microsoft.icon" to the list.

My concern is more that the WASM loader may be as picky as with the "charset" tag and not accept the "Content-Encoding: gzip" header. ;-)

(20) By Stephan Beal (stephan) on 2022-06-08 07:48:55 in reply to 19 [link] [source]

The most likely outcome is currently that we'll charset all text/... mimetypes and elide it on all others, as that would be a low-impact/lower-risk change.

...

Adding wasm to that check is on my todo list.

Please check whether fossil:86db2d94c60e2384 addresses the open concerns. (Disregard the strncmp() vs fossil_strncmp() inconsistency: those will be patched in a separate checkin.)

I know this because I've added "image/vnd.microsoft.icon" to the list.

That one is now also in the compressible list and will be part of the next checkin. The "is it compressibe" routine was refactored so that we can extend that list without as much of performance hit as before. It's now O(1) for most mimetypes.

My concern is more that the WASM loader may be as picky as with the "charset" tag and not accept the "Content-Encoding: gzip" header. ;-)

LOL! i've been serving the in-development "fiddle" apps using zip compression for the past couple of weeks (using althttpd's new gzip support), so just happened to know that that wouldn't be an issue.

(21) By Florian Balmer (florian.balmer) on 2022-06-08 10:56:59 in reply to 20 [link] [source]

Please check whether fossil:86db2d94c60e2384 addresses the open concerns.

Thanks! It "looks" good conceptually and in code, I think. I'll do detailed tests as soon as possible, at least when I get around to update my local patches (reminded by the expected merge conflict), and come back if there's any problems.

That one ("image/vnd.microsoft.icon") is now also in the compressible list and will be part of the next checkin.

I need to add: ICO files contain multiple non-compressed BMP-like images (with an extra transparency mask) for various resolutions and color-depths, but since Windows Vista (I think) they may also contain (at least one, possibly more) high-resolution high-color PNG images, which may reduce overall ICO compressibility. I'm not using such ICO files, nor does Fossil, but some people may. (Fossil uses a PNG for the favicon by default, anyway.)

I had plans to use different favicons for local(host) and remote web UI views, but it was already easy enough to tell them apart by looking at the URL. However, the custom favicon turned out to be really useful for developing and testing Fossil patches on my local machine, so it's clear from a glance at the browser tab whether the web UI is from the original or my modified version of Fossil.

Disregard the strncmp() vs fossil_strncmp() inconsistency: those will be patched in a separate checkin.

It's nice to pay attention to this!

A recent OpenSSL bug report went like this:

$ LANG=tr_TR.utf8 curl -L https://google.com/
curl: (35) error:03000072:digital envelope routines::decode error

It turned out this was due to the CRT case-insensitive string comparison routines being locale-aware by default and failing on a comparison of ASCII-only protocol-related strings, because in Turkish, the ASCII letters I and i don't case-map to each other (at least, not in a round-trip-safe manner).

The fix was to perform a locale-invariant string comparison with the "C" locale. Or they could have called fossil_strnicmp().

This doesn't matter for the case-sensitive strncmp(), but it's interesting to be aware of such details and the possible obscure consequences.

The "is it compressibe" routine was refactored so that we can extend that list without as much of performance hit as before. It's now O(1) for most mimetypes.

Interesting optimization ... or are you just trying to confuse the branch predictor and keep it from running speculative execution, so that Fossil is less vulnerable to Spectre and Meltdown bugs? ;-)

(22) By Stephan Beal (stephan) on 2022-06-08 11:14:32 in reply to 21 [link] [source]

The fix was to perform a locale-invariant string comparison with the "C" locale. Or they could have called fossil_strnicmp().

That actually wasn't the reason for me replacing strcmp with fossil_strcmp, but that is certainly a good one. My reason was simply for cosmetic consistency, nothing technical :/.

Interesting optimization ... or are you just trying to confuse the branch predictor and keep it from running speculative execution, so that Fossil is less vulnerable to Spectre and Meltdown bugs?

i have no clue whatsoever how a CPU will chose a execution path so never try to optimize based on that. The point of that restructure is simply to reduce the number of strcmp calls (CPU instruction reduction, since branches are(???) always(???) cheaper(???) than function calls). If we have to add a strcmp for every single mimetype we want to support, i start to lose sleep over all of the wasted comparisons. In althttpd we solved this differently by adding a flags field to the mimetype mappings and the flags tell us whether we need a charset or not, but fossil's internal content-type handling doesn't support that approach without much more invasive changes (and it won't work for user-custom mimetypes). That might be interesting to refactor someday, but today is not that day ;).

(23) By Florian Balmer (florian.balmer) on 2022-06-08 11:26:59 in reply to 22 [link] [source]

I was just kidding!

But since I've come across an article how crazily strcmp() calls are optimized (they want to return as fast as possible, not like strcoll() calls), I don't care about using them "too much" any more ... If I can find the link to the article, I'll post it, but sorry for now :-(

(24) By Florian Balmer (florian.balmer) on 2022-06-09 07:05:00 in reply to 23 [link] [source]

Updating my patches was smooth, and it looks like the favicon is compressed, and the "utf-8" tags for other documents also look reasonable, so far. Thanks!

I'm sorry I'm not able to re-google exactly what I've read, maybe it was some source code comments.

The starting point of my searching and reading was that for the mentioned bug report the OpenSSL people discussed whether they should stick to the CRT stricmp() called with the invariant "C" locale or roll their own ASCII-only variant of the function. They decided to go with the former, to take advantage of platform-specific optimizations available with the CRT functions. When I noticed the __strcasecmp_avx() and __strcasecmp_l_avx() calls in the pasted stack traces, I started googling from there.

(25) By Florian Balmer (florian.balmer) on 2022-06-24 06:47:21 in reply to 24 [link] [source]

... whether they should stick to the CRT stricmp() called with the invariant "C" locale or roll their own ASCII-only variant of the function. They decided to go with the former, to take advantage of platform-specific optimizations available with the CRT functions. ...

Well, at least for one minor release:

[OpenSSL] Changes between 3.0.3 and 3.0.4 [21 June 2022]