Fossil Forum

Reducing the size of the delivered JavaScript
Login

Reducing the size of the delivered JavaScript

Reducing the size of the delivered JavaScript

(1) By Warren Young (wyoung) on 2020-08-20 21:00:28 [source]

I'm forking the conversation from another thread.

The Problem

According to cloc, fossil*.js is 3296 SLOC. (Source Lines of Code = functional lines, not comments, not all-whitespace...) A different tool (sloc) agrees to 3 significant digits.

Now we have to decide what to compare against, if we want to talk about "too big." Let's pick the current version (3.5.1) of jQuery, since it is the big scary thing that was absolutely precluded then repeatedly argued against when re-proposed. (I'm not trying to re-fight old battles, just setting the stage, both historically and going forward by proposing a common comparison point.) jQuery is 6687 SLOC by the same measure.

This finding surprised me.

If this is "version 1," and we're already at half the SLOC size of jQuery, it's a bit of a push to call this "simpler." I mean, yes, half the SLOC, but I would've expected more savings than that given all the push-back on the preexisting code options I gave.

But let us set aside SLOC. It isn't nearly as useful a measure as delivered bytes of code on the wire. SLOC doesn't account for whitespace and comment stripping, bundling, gzipping, etc.

The minified version of jQuery 3.5.1 is 30.6 kB.

fossil*.js is 21.33 kB, as currently delivered. (Method) This is not quite the savings (30%) that I was hoping to see after everyone jumped on me for suggesting jQuery, claiming it's too big.

To be fair, some of this code isn't in jQuery (or Umbrella, or...) at all, so a fairer comparison would be to jQuery plus the purely Fossil-specific code. Since I have no way to refactor the code like that short of doing the redesign, we'll have to go with the data we have at hand.

You might correctly guess that some of the on-the-wire size difference is due to more efficient minification for jQuery. I've checked with the most efficient minifier I have on hand (Google Closure Compiler) and it drops the size of fossil*.js to 17.6 kB.

That means the on-the-wire size is not as good as we'd guess from the SLOC results above: 42.5% best case byte size savings for jQuery vs our custom code instead of the 50.7% we'd have guessed based on SLOC. Apparently the average amount of code is greater per line in fossil*.js than in jQuery.

But that's not the biggest problem: --jsmode bundle doesn't amortize this cost across all pages. It amortizes it within any subset of pages that all use the same set of JS files. If you search the code for calls to builtin_request_js() and style_emit_fossil_js_apis() (which is built atop the former) you find that every page seemingly has a different number and sequence of these JS files. If there are pages that share the same subset, they must be in the minority, according to my brief survey.

Because of this, /wikiedit ships 15 kB of JS code (compressed, on the wire) and /file ships a different 6 kB of code that has to be pulled separately even though the two share several JS modules. The same code is being shipped multiple times because you get two different /builtin?m URLs for these two pages, so the browser has to cache them separately. In aggregate, these variegated bundles likely exceed the cost of jQuery plus whatever custom code we'd need atop it.

You can switch to --jsmode separate mode to avoid that cost, but you pay a different cost: multiple HTTP checks to verify each file's cache status. I hesitate to guess which cost is greater, and I'm not sufficiently interested to measure it.

The point is, shipping a minified copy of all JS used on the site in a single file that all pages that need it include is cheaper: once it's cached for one page, it's cached for all of them, and the browser needs either zero or one HTTP hit to verify the cached status of the bundle. (0 if it's aggressive about caching, 1 if its checks even though it could get away with not.)


Test Methods

I got the 21.33 kB size value for the current on-the-wire cost of fossil*.js by applying this patch:

Index: src/wiki.c
==================================================================
--- src/wiki.c
+++ src/wiki.c
@@ -1267,11 +1267,14 @@
     CX("</div>"/*#wikiedit-tab-save*/);
   }
 
   builtin_request_js("sbsdiff.js");
   style_emit_fossil_js_apis(0, "fetch", "dom", "tabs", "confirmer",
-                            "storage", "page.wikiedit", 0);
+                            "storage", "page.wikiedit",
+                            "copybutton", "numbered-lines", "page.fileedit",
+                            "page.forumpost", "popupwidget",
+                            0);
   builtin_fulfill_js_requests();
   /* Dynamically populate the editor... */
   style_emit_script_tag(0,0);
   {
     /* Render the current page list to save us an XHR request

I then built Fossil with the change, ran it as ./fossil ui --jsmode bundled, loaded /wikiedit in my browser's developer tools' with the Network tab showing, and found the /builtin call's size.

The 17.5 kB number comes from this script:

#!/bin/bash
rm -f all all.gz min min.gz
for b in bootstrap fetch dom tabs confirmer storage page.wikiedit \
    copybutton numbered-lines page.fileedit page.forumpost popupwidget
do
    cat "src/fossil.$b.js" >> all
done

google-closure-compiler --compilation_level ADVANCED 2> /dev/null < all > min
gzip -9 all min
ls -l all.gz min.gz

You can't just run GCC on src/fossil*.js because in "ADVANCED" mode, the order of files matters: it has to obey the dependency order, else you get complaints about variables being undefined and such.


Possible Solutions

All right, so that's the problem. What do we do about it?

First, the non-solutions: Short of rewriting the current code atop jQuery, Umbrella, or whatever, it's useless to speculate on whether we'd have seen a net savings by going another way. Aside from the duplication of work, that ship's sailed.

What should concern us instead is what we do about it going forward.

Single Bundle

For --jsmode bundle, all of the functions that inject a single file into the bundle should instead result in a single common bundle that all pages include so the cost of it is amortized among all of them.

Optimization

It may be that some of the new code could be made smaller. That's pure speculation, but given that we're within a 2x factor of jQuery's on-the-wire byte size, I think that's worth looking into.

Minification

I've repeatedly suggested that we use some sort of aggressive minification on the JS we ship, but each time, I've gotten either crickets or outright rejection.

I realize this makes some people nervous, since it means the resulting JS is "obscure" now. It's a strange sentiment for a project that's primarily written in C, which when compiled is far more obscured that JS is after going through a minifier.

Modern browsers will expand the code again for you, albeit with the shortened variable names and such. To avoid that, it's not difficult to craft a "debug" mode that skips the minification step.

The best argument against minification is that it's difficult to do this step reliably: done wrong, it can break correct code.

Crockford's jsmin is under a BSDish licence, and it is a single C file, so it would be easy to integrate into Fossil for server-side on-the-fly minification. It would be no worse a dependency than we currently have on zlib.

I've had jsmin break working code on me. This was a fatal flaw for me in my private projects, because I use third-party JS, so while you can avoid these problems with jsmin by adjusting the code to accommodate the tool's weaknesses, it would have meant either forking my dependencies or getting upstream to apply fixes to make jsmin happy.

However, we don't have that problem in Fossil, since all code is written by us, so we can adjust the code to placate the tool. I think jsmin is still a potential option for us, particularly if we want it to work on-the-fly with a statically-compiled fossil binary.

I went through a few different JS minifiers after my problems with jsmin, finally ending up with Google Closure Compiler. This is a complex beast, slow to execute, and it has annoying dependencies. You must either:

  • Install Oracle's Java 8 to run the JAR version. It won't run against OpenJDK or the pre-give-us-money-or-else versions of Java.

  • Go through NPM to get the native-compiled version. This avoids the dependency on Java 8, but it's only available for Windows, macOS, and Linux.

  • Use the transpiled JavaScript version (also from NPM), but then you're dependent on Node.js. Also, not all features are available in this version.

On the plus side, GCC (yes, confusing acronym) has never broken working code on me, and it gives the smallest outputs of the tools I've tried. By contrast, jsmin only manages to compress the "all" bundle of fossil*.js to 19.5 kB:

  $ gzcat all.gz | jsmin | gzip -9 | wc -c 
      19919

If we took Google Closure Compiler on as a dependency, it would have to be optional and build-time-only. I think the best way to handle it would be to gate it on a configure test: if GCC is available and the option to use it is given, build the minified version of the JS into the Fossil binary for use by --jsmode bundle. Otherwise, fall back to the current method.

(2) By Richard Hipp (drh) on 2020-08-20 21:49:43 in reply to 1 [link] [source]

Are you aware that the "builtin" virtual table in the "fossil sql" command can quickly give you the text and sizes of all of the built-in JS files?

I am not opposed to more aggressive minification, as long as unminified is also available. Would it be too much to store both version, and deliver the unminified version if requested by a query parameter?

Currently, you get unminified JS if you use the --fossil-debug option to the ./configure script. That works ok for development, where you can easily recompile. I wonder if there should be a way to deliver unminified content on-demand, however. I don't want to take the CPU hit to run minification for each request, so if both version can be delivered on-the-fly, then both should be built-in.

(3) By Warren Young (wyoung) on 2020-08-20 22:00:11 in reply to 2 [link] [source]

Are you aware that the "builtin" virtual table in the "fossil sql" command can quickly give you the text and sizes of all of the built-in JS files?

I don't see how it gets me the results I want:

sqlite> select sum(size) from builtin where name like 'fossil.%.js';
75942

What I want is the on-the-wire size. That requires going through the current simple minification (leading whitespace & comment removal only) and compression steps.

Would it be too much to store both version

17.6 kiB? I don't see why that'd be a major problem in a 3-5 MiB binary.

That would also sweep aside most of the objections to GCC, since only those building binaries would need to care, and of those, only those that wanted fully minified JS. The official binaries might do this, and mine would, to be certain, but many others may not care about this, so they wouldn't have to bother.

I wonder if there should be a way to deliver unminified content on-demand

Perhaps --jsmode bundled stays as it is, and we add --jsmode minified, which is only compiled in if at build time we found a minifier and embedded the fully bundled and minified form of the JS?

(5) By Stephan Beal (stephan) on 2020-08-20 22:22:37 in reply to 3 [link] [source]

I don't see how it gets me the results I want:

sqlite> select sum(size) from builtin where name like 'fossil.%.js'; 75942

A more precise result would be to add AND name NOT LIKE 'fossil.page.%', and then explicitly add the fossil.page.xxx.js you're looking for.

The official binaries might do this, and mine would, to be certain, but many others may not care about this, so they wouldn't have to bother.

i'm in that latter group ;).

(6.1) By Warren Young (wyoung) on 2020-08-20 22:47:26 edited from 6.0 in reply to 5 [link] [source]

Deleted

(4) By Stephan Beal (stephan) on 2020-08-20 22:20:18 in reply to 1 [link] [source]

The minified version of jQuery 3.5.1 is 30.6 kB.

We don't truly minify, though. We just strip comments and leading whitespace, but leave newlines intact. True minification requires out-of-tree build infrastructure we don't have and cannot reasonably expect all potential clients to have. It also effectively obfuscates the code, even if it's not technically obfuscated.

... I was hoping to see after everyone jumped on me for suggesting jQuery, claiming it's too big.

My #1 personal objection to jQuery is not the size but maintenance. That's simply not a library we can sensibly fork and maintain. Sure, we could drop in a given version and be done with it, but... where's the fun in that? My #2 is that, since ES2015, jQuery is no longer the must-have it was before.

Note, also, that the scope of fossil.*.js is different. There are, IMO, no APIs in there which have 1-to-1 jQuery feature parity except for maybe fossil.fetch(), and some of the APIs are higher-level than jQuery works at (most recently, tooltip-like popups), so we'd need to have something equivalent even if we had jQuery. Likewise, the page-specific JS (fossil.page.*.js) cannot be sensibly counted against this size because it's app-level code, not framework-level. Even fossil.dom doesn't have a jQuery counterypart: jQuery DOM element construction uses innerHTML, which is a security hole (script injection) and to be avoided at all costs. We use innerHTML only when receiving an HTML-format preview from the server, as we have no alternative for displaying one, otherwise we build everything using the DOM API. e.g.:

// jQuery:
const d = $("<div></div>"); // d is a jQuery object wrapping a DIV element
// fossil.dom:
const d = fossil.dom.div(); // d is a DIV HTMLElement
// but the higher-level code all aliases F to fossil and
// D to fossil.dom, leaving us with:
const d = D.div();

But that's not the biggest problem: --jsmode bundle doesn't amortize this cost across all pages.

That's absolutely true, but we're currently talking about only 3 pages (soon 4).

The point is, shipping a minified copy of all JS used on the site in a single file that all pages that need it include is cheaper: once it's cached for one page, it's cached for all of them, and the browser needs either zero or one HTTP hit to verify the cached status of the bundle. (0 if it's aggressive about caching, 1 if its checks even though it could get away with not.)

That would be trivial change to make, and you detail that approach below so i won't say more about it here. That should probably be limited to bundled mode, though. For non-bundled mode it would add undue weight by loading unused modules.

i'll create a branch which does that, for comparison's sake, but probably not until the weekend. Note, though...

I got the 21.33 kB size value for the current on-the-wire cost of fossil*.js by applying this patch

The 17.5 kB number comes from this script:

That concatenation is semantically broken, though: the *.page.js files are the app-level code for specific pages. If two fossil.page.X.js files are loaded together, they will break. Each expects and requires that it is only loaded on the corresponding UI page, and it "takes over" that page for its own purposes.

If you remove all of the .page scripts, it's semantically correct, as the rest are library-/framework-level bits.

That means, though, for bundling purposes, we need to bundle the framework level code together (all of it, for all pages), and then emit the page-specific code as a separate request. We can do that, not a problem, just pointing out that it will requires 2 requests.

Optimization

It may be that some of the new code could be made smaller. That's pure speculation, but given that we're within a 2x factor of jQuery's on-the-wire byte size, I think that's worth looking into.

If that 2x accounts for jQuery being truly minified and us not, that's not a fair comparison. That's in your next point, though:

Minification

I've repeatedly suggested that we use some sort of aggressive minification on the JS we ship, but each time, I've gotten either crickets or outright rejection.

Personally i'm against it, solely on the grounds that it requires additional out-of-free tools which not only we would need but everyone who builds it, on all diverse platforms, would need. However, you address this below, so we'll get back to it.

However, we don't have that problem in Fossil, since all code is written by us, so we can adjust the code to placate the tool. I think jsmin is still a potential option for us, particularly if we want it to work on-the-fly with a statically-compiled fossil binary.

FWIW, i'm not against that. i've used Crockford's extensively on my own code for years. It's a single-file solution which compiles everywhere. It essentially does what mkbuiltin already does, except that it strips out all extraneous space rather than just start-of-line space. We could possibly, rather than add another binary to the build, integrate it into mkbuiltin to replace its current JS-shrinking bits.

However, there is actually a genuine problem with Crockford's license:

The Software shall be used for Good, not Evil.

(From my local jsmin, dates 2011-09-30.)

That clause will get a package rejected by Debian, as they consider it to be unduly restrictive. (Which is kinda funny, but at least they're being consistent.) Even if it doesn't affect fossil binaries, it would affect the source distribution - Debian won't include a distribution with that clause. That is to say, they've been known to reject packages which included it. Some relevant tickets:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=692614 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=673727#46 https://github.com/jshint/jshint/issues/1234

A closely-related topic is CSS minification, and i'll take this opportunity to plug a lightweight, license-compatible CSS minification API which "could" also be integrated directly into mkbuiltin:

https://fossil.wanderinghorse.net/r/cssminc

(Just to toss that out there.)

If we took Google Closure Compiler on as a dependency, it would have to be optional and build-time-only.

Meh. i wouldn't disagree to that, i think the concern is overblown, especially if we bundle All Of It and serve in bundled mode. If we're in caching bundled mode with all files, a few kb difference, even if it's 20kb, makes, in the aggregate, no effective difference on clients. It may make more sense for approaches other than bundled.

(7) By Warren Young (wyoung) on 2020-08-20 23:04:37 in reply to 4 [link] [source]

...personal objection to jQuery...etc.

I was using jQuery as a point of comparison, something common we can all point to and agree on as a benchmark. "Too big" is a pointless complaint without a comparison point. I don't want to re-fight the jQuery arguments.

Do you have a better point of comparison? What is fossil.*js most like, out in the world?

we're currently talking about only 3 pages (soon 4).

/fileedit pulls 23.5 kB in bundled mode.

/wikiedit pulls a different 15 kB, which is partially redundant w.r.t. /fileedit

/file pulls yet a different pair of bundles, a 2.7 kB one without ln and a 6.6 kB one with ln enabled.

All together, that's 48 kB, 56% bigger than jQuery.

If two fossil.page.X.js files are loaded together, they will break...it will requires 2 requests.

That would be fine, though it might be nicer if the code was modularized in a way that put the init code for each "page" JS behind a function that doesn't run until it's called from a page's onload handler, so they wouldn't conflict.

Each round trip costs 10-100 ms on typical Internet connections.

there is actually a genuine problem with Crockford's license:

Then it can only be a build-time-only dependency, an alternative to google-closure-compiler for producing the single-file minified bundle. That would be a useful fallback for those who find GoogCC's dependencies too onerous.

(8) By Stephan Beal (stephan) on 2020-08-21 00:25:54 in reply to 7 [link] [source]

I was using jQuery as a point of comparison, something common we can all point to and agree on as a benchmark.

We can agree on jQuery as a basis for comparison. It's ubiquitous.

Do you have a better point of comparison? What is fossil.*js most like, out in the world?

Nothing, really. fossil.dom is essentially a simplifier for using the DOM API, but it doesn't do any magic of on the level of jQuery:

const d = D.div();
D.addClass(d, 'a', 'b', 'c'); // or ['a', 'b', 'c']

fossil.fetch provides a feature similar to the not-quite-widely-available window.fetch(). jQuery provides something very similar.

fossil.confirmer is for confirmation buttons, but it will almost certainly be removed as soon we get an HTML/CSS modal dialog.

fossil.storage is a tiny wrapper around window.localStorage and friends, so that apps do not need to know/care which storage they're using. And, it turns out, it was critical in recently sandboxing repos from each other when they're hosted under the same origin - that would have been much more painful without this abstraction API between the apps and the storage.

All of the others are UI features written specifically for the fossil pages.

All together, that's 48 kB, 56% bigger than jQuery.

jquery doesn't magically give anyone app-level features, so app-level code (all of fossil.page.xxx.js and most of the non-page files) "doesn't count" in an comparison against an app-agnostic framework. jQuery provides only framework-level code. Even if we have jQuery, we still have to write the equivalent of fossil.page.* and most of the others, so that cost is there one way or the other.

The only ones of the current JS files which we could, right now, drop and replace with equivalent jQuery features, are fossil.(dom|fetch).js. None of the other code is covered by jQuery features.

That is to say: a size comparison solely against an app-agnostic/generic framework isn't a fair one, as we necessarily have app-level code to add on top of that.

$ cat $(ls -1 fossil.*.js | grep -v .page.) | jsmin > foo.js
$ l foo.js
-rw-r--r-- 1 pi pi 28000 Aug 21 02:12 foo.js
$ gzip -1 foo.js  # absolute minimum compression level
$ l foo.js.gz 
-rw-r--r-- 1 pi pi 9951 Aug 21 02:12 foo.js.gz

If we bundle the non-app-specific code into a single jsmin'd request, that 10k would be the approximate over-the-wire overhead. That skyrockets to 30k if it's not minified, 28k if we just strip leading spaces, and 12k if we strip comments and leading spaces. jsmin-style minification gains us very little compared to:

$ cat $(ls -1 fossil.*.js | grep -v .page.) \
    | sed -e 's,^ +,,' \  # strip leading spaces
    | stripcomments \     # strip comments. duh.
    | gzip -1c > foo.js.gz \ # minimum compression (-3 isn't significantly better)
    && ls -la foo.js.gz 
-rw-r--r-- 1 pi pi 12074 Aug 21 02:17 foo.js.gz

The combination of stripping leading spaces and comments is what mkbuiltin already does, so approx. 12k over-the-wire is what i'd expect for all of the non-page files. (Sidebar: their order would need to be correct, though - some depend on others, and they get initialized in the order they are loaded.)

That would be fine, though it might be nicer if the code was modularized in a way that put the init code for each "page" JS behind a function that doesn't run until it's called from a page's onload handler, so they wouldn't conflict.

That's essentially what happens: the onload handler is what initializes the app, but it expects to have/requires only a single app, not multiple apps.

Your proposal is that we add another layer which instead dispatches to one of the init routines depending on which page is loaded.

Yeah, we "could" do that, but sheesh. It really wouldn't sit well with me to know that the wikiedit and forum code were compiled and in memory, but completely unused, in the fileedit page. As more pages/features are added, that multi-page bloat would swell, all for the sake of potentially saving 1 more fully cacheable request.

No, if fossil were a single-page app, that would be another story, but we're probably at least a decade away from anyone wanting to overhaul fossil to support that.

Another alternative, though nowhere as slim as a 2nd cacheable request to get the page-specific code, is to send the app-agnostic bits bundled and the page JS embedded in a script tag in the page.

Then it can only be a build-time-only dependency, an alternative to google-closure-compiler for producing the single-file minified bundle. That would be a useful fallback for those who find GoogCC's dependencies too onerous.

If package inclusion into Debian and its derivatives is a concern (and it probably should be), than we can't include jsmin.c as-is in-tree.

That doesn't stop an enterprising hacker from making their own, though ;) (not me!). It also doesn't stop our configure script from trying to download it if curl or wget are available on the system, or to use it if it's found in a certain spot in the tree (copied by the user or downloaded by the configure script). However, jsmin saves us surprisingly little over what mkbuiltin already does (exactly 1 EOL per file line).

(10.1) By Stephan Beal (stephan) on 2020-08-24 23:08:42 edited from 10.0 in reply to 4 [link] [source]

The point is, shipping a minified copy of all JS used on the site in a single file that all pages that need it include is cheaper: once it's cached for one page, it's cached for all of them, and the browser needs either zero or one HTTP hit to verify the cached status of the bundle. (0 if it's aggressive about caching, 1 if its checks even though it could get away with not.)

i'm working on an experimental branch which, among other things, is testing that out:

https://fossil-scm.org/fossil/timeline?r=misc-js-experiments

The short and the long of it is that when we emit all of the non-page-specific JS for any page which uses any part of the fossil.XYZ JS APIsmisref, and the client cache is warm, visiting wikiedit and fileedit each become, best case, a single HTTP request with 10k resp. 13k over the wire for the core page content (which can't be reasonably cached because they include non-constant state in their bodies).

The editor pages require 2 JS requests each: one for the shared bundled and one for their page-specific bits (wikiedit=8.5k, fileedit=7.4k), but hose are also cachable long-term.

That result is similar for both "separate" and "bundled" JS distribution modes, but bundled requires only a single request to fetch the app-agnostic parts, whereas separate mode currently requires 8 requests. The JS bundle itself is currently just shy of 10k over-the-wire (32k uncompressed), including new "?" help button code added this evening in that branch to replace title-attribute hoverhelp (see wikiedit and fileedit for what that looks like). Inline JS mode is, of course, horrid for purposes of caching but best for purposes of eliminating HTTP requests.

Bundled mode is the clear overall winner, and sending the whole bundle for any page which uses any single part of the fossil.XYZ APIs is the clear winner once a user visits at least 2 such pages. For bundled mode, that's the approach i'm now using, but sending all of it is a plain waste for non-bundled mode, so those modes cherry-pick the APIs the they need, e.g.:

https://fossil-scm.org/fossil/info?name=5888dd53fdc4989d895c5629d2f6f07a5e845886ca6943540d50ba8c9a6f4cac&ln=1991-1994

misref = currently /wikiedit, /fileedit, and line-number-mode /file or /info.


  1. ^ a b Misreference

(11) By Warren Young (wyoung) on 2020-08-25 05:04:35 in reply to 10.1 [link] [source]

Thank you!

I've updated javascript.md to track this and other recent changes.

(9) By anonymous on 2020-08-21 09:54:01 in reply to 1 [link] [source]

Modern browsers will expand the code again for you, albeit with the shortened variable names and such. To avoid that, it's not difficult to craft a "debug" mode that skips the minification step.

I think that browsers can unminify the code even further if supplied with a source map, which modern minifiers should be able to produce.