Fossil User Forum

Adopting a JS helper library
Login

Adopting a JS helper library

Adopting a JS helper library

(1) By Warren Young (wyoung) on 2020-04-24 18:07:48 [link] [source]

This is a post-2.11 proposal.

Fossil has a long standing policy of not including third-party JS libraries. That was set in September 2018 in terms of jQuery, which is small-ish (30 kB delivered), MIT licensed, and used on something like 70 million web sites. That last not only means there's a tremendous amount of help available for jQuery, there are plug-ins and ready CDN sources for pulling it down efficiently.

When I rewrote the code in question to not use jQuery, its size roughly tripled, and it did so in a way that meant the new code wasn't cached by the browser because it's injected into the page. Given that jQuery would be served as cacheable separate virtual file URL from Fossil, that means it executes slower now, too, because network I/O costs more than local file I/O. You pay that hit on each and every pull of that code, which is on every Fossil UI page with the default skin.

This bloating is a common effect. The linked page is trying to show that jQuery can always be replaced by straight DOM manipulation code, but notice that the number of lines is usually greater, and the lines are usually longer. This is where the tripling of the size of my code came from. The browser's native Ajax APIs are particularly terrible sources of code bloat.

I didn't fight that decision at the time, but recently I came across umbrella.js, which is about a tenth the size of jQuery, still MIT licensed, and more or less in the style of jQuery, though not strictly a subset of it.

I'm not specifically advocating for Umbrella. There are several other good alternatives. The main point is that we could shrink the net size of JS code that Fossil ships and make it easier to understand by rebasing our existing code atop it.

That's the real difficulty, of course: to take full advantage of it, a large portion of the existing JS code has to be rewritten to make use of it. I can do this, but I have reason to believe we have others here who'd be willing to help.

The main blocker is whether we're allowed to take on such a dependency.

(2) By Stephan Beal (stephan) on 2020-04-24 18:59:21 in reply to 1 [link] [source]

FWIW...

i have mixed feelings about it, but wouldn't fight it hard either way. While i used jquery heavily from 2007 until about 2014, since HTML5 and its related JS APIs (specifically, CSS selectors), jquery has slipped from "must have" to "nice to have," and i no longer use it in my own sites. That does, however, mean writing more lines of code to do most of the same things, but the lack of deps is worth it to me. Since fossil embeds most JS directly in pages with a nonce, and therefore cannot cache it, including a framework is more compelling. jQuery has an excellent track record of browser compatibility and backwards compatibility, and i credit jquery almost single-handedly for making JS useful after JS's first decade of fails. Before jquery, browser JS incompatibilities were the norm, not the exception. jquery hid most of that from developers and revolutionized how JS apps were written.

i am not familiar with umbrella but will take a look at it - it sounds like something i can make use of in my own HTML UIs.

(3) By Warren Young (wyoung) on 2020-04-24 20:01:32 in reply to 2 [link] [source]

One of the reasons the likes of Umbrella are smaller is that they only support newer browsers, thus need to bring along less code.

A huge chunk of jQuery is the Sizzle selector engine, for example, which we no longer need in the modern world if you can accept a restriction like "IE10 or better."

However, I don't see the size of such wrappers diminishing to zero because a lot of the JS DOM API is simply badly designed. XHR is a stunningly good (or is that bad?) example of this.

(4) By anonymous on 2020-04-25 02:54:21 in reply to 1 [link] [source]

I just think the client-side JavaScript code should be reduced in general, and should be made optional in the case the client has JavaScript disabled (or if it is unimplemented, or if the wanted feature is unimplemented even though JavaScript is mainly implemented).

I added some JavaScript codes to my own Fossil repositories, but the only reason for that is to add accesskey attributes to some elements (and I use document.evaulate for that; no jQuery), which is parts of the HTML code which cannot be altered by the settings. I would rather have the server add these attributes instead, but except for the top menu and the ticket form (which I do set the access keys on the server, so it will work if scripts are disabled on the client), it does not seem possible to do.

There are some places in Fossil where the use of client scripting is suitable, although even then it should not be added too much, I think.

(6) By Warren Young (wyoung) on 2020-04-25 09:05:56 in reply to 4 [link] [source]

I just think the client-side JavaScript code should be reduced in general

Short of refactoring (this thread's topic) or removing features, I don't see how we'd do that, but if you have concrete ideas for removing any of the current JS, we're happy to entertain them.

should be made optional

It already is. There are a few items on that list that could be improved. Patches thoughtfully considered.

I would rather have the server add these attributes instead

That topic already came up. (Perhaps you are the same "anonymous"?) All we need here is someone to take on the project and make a contribution.

(5) By sean (jungleboogie) on 2020-04-25 05:41:27 in reply to 1 [link] [source]

What kinds of things would you want to implement if a JS library were permitted in the repo? I didn't re-read the Sept 2018 thread, but you intelligent folks did make a nice sitemap.

(7) By Warren Young (wyoung) on 2020-04-25 09:15:56 in reply to 5 [link] [source]

At the moment, I mainly want to:

  1. Serve the js.txt portion of the skin the same way as style.css so that browsers will do long-term caching, so that in the main use case, it's only ever fetched once until changed.

  2. Prepend Umbrella.js to js.txt programmatically so that only one JS hit occurs.

  3. Call on the code added by #2 to reduce the net amount of JS served by Fossil, further reducing the page load time. It's already fast, but we can make it faster.

Substantial decreases in load time are within reach with this combination of techniques.

Eventually, it would also be nice to add some JS minification. Comments and whitespace compress well, but they don't compress to zero.

(8) By Stephan Beal (stephan) on 2020-04-25 11:01:11 in reply to 7 [link] [source]

Umbrella: it hasn't had a code update in 2 years, only build updates. Maybe that means it's unmaintained, maybe it means it's perfect. They unfortunately disabled the ticket tracker, so it's difficult to get an idea of how actively used it is. It looks nice, though, and super slim.

JS minification: i've been using Douglas Crockford's jsmin.c for many years: https://github.com/douglascrockford/JSMin/blob/master/jsmin.c

(Funnily enough, though, the license's "do no evil" clause is considered to be too restrictive for Debian licensing rules, and i have seen packages which contain other software of his with that clause rejected from Debian on those grounds, so we possibly don't want to include it in fossil.)

Pardon brevity - tablet-typing.

(9) By Florian Balmer (florian.balmer) on 2020-04-25 12:19:36 in reply to 1 [link] [source]

My experience:

  • Code is harder to read, i.e. all meaningful names replaced by $ symbols.

  • Code is harder to debug, i.e. the F12 console needs helper infrastructure to decode and navigate the JQ code.

  • I've seen two breaking changes during a short JQ use period, one related to event handlers of dynamic elements, one to properties vs. attributes. This will be a risk, while vanilla JS is unlikely to ever break.

  • Rewritig all JS code also seems risky, and may introduce new, or even old bugs.

  • Size impact compared to gzipped Fossil JS may be negligible, and the additional network hit may also slow down things.

So my hope is that we can go without this ...

Regarding JS minification: some Fossil JS omits terminating semicolons, so line breaks can't be compressed.

(10) By anonymous on 2020-04-26 06:27:34 in reply to 9 [link] [source]

I agree.

Serving js.txt once, like style.css is, would be helpful.

A helper JS library just adds more headaches and encourages more dependence on JS.

(11) By OgunFossil on 2020-04-30 00:26:54 in reply to 10 [link] [source]

^ Aye / +1.

The JS in its own file would be great but a library seems like it could become an issue. Would also agree that pure JS is much easier to parse. If a library is added, am hoping that it wouldn't be in such a way that users adding their own scripts would have to do anything special to work around it (which is something that can happen with jquery).

One of the reasons the likes of Umbrella are smaller is that they only support newer browsers, thus need to bring along less code.

I thought the JS in fossil (e.g. xmlhttprequest instead of fetch) was intended to support older browsers?

If there is a requirement to support old browsers, maybe they could have an alternative skin and the default could go all ES6 with something polyfilled for the alternative, or the alternative could just do without the JS altogether?

(12) By Florian Balmer (florian.balmer) on 2020-05-05 12:24:33 in reply to 9 [link] [source]

Given the recent discussions about AJAX and maintaining <noscript> compatibility, please allow me to add a few remarks.

I'm not opposing against Javascript in Fossil -- but I'm advocating for Vanilla Javascript, because:

  • The mentioned breaking changes in jQuery may happen again with any dependencies, and make updating cumbersome.

  • The same goes for Umbrella JS, for which the second line in README.md (displayed on the front page, right below the file list) has a link labeled "Migrate from 2.0 to 3.0", and the linked document explains a breaking change in the ajax() function. Keeping Fossil up-to-date with this library doesn't sound funny, either.

  • Vanilla Javascript just works. I have a few simple AJAX pages, and they can be used with anything from IE6, to exotic mobile devices, to bleeding-edge Edge, Chromium and Firefox.

Regarding bandwidth, the following samples compare the size of the final delivered web page, once with the default Javascript for the hamburger menu, and once with the Javascript removed.

(Sizes are in bytes, for pages delivered with Content-Encoding: gzip).

  • For a new repository, the size of the /timeline page showing just the initial empty check-in goes down from 10203 to 7170.
  • The size of the Fossil homepage www/index.wiki goes down from 5968 to 2854.

So it looks like a few KB per request could be saved if the Javascript for the hamburger menu is served from a separate URL. After ~10 pageloads, the savings approach the download size of jQuery (minus the per-page Javascript rewritten in jQuery).

On old browsers, which do not yet support canvas, Fossil is still very useful! Despite the timeline lacking the graph, it still shows chronological history with clickable links to the diffs. (I'm inclined to call this "GitHub mode", when comparing the two ...)

(13.1) By Warren Young (wyetr) on 2020-05-05 14:03:52 edited from 13.0 in reply to 12 [link] [source]

The mentioned breaking changes in jQuery may happen again

Yes, but I'm proposing an in-tree dependency, which means breakages occur on our schedule. We could even fork the library, if we wanted.

Also, the counter condition carries an implicit assumption that "Vanilla Javascript" never changes. It's done so a bunch of times.

And if your reaction to that point is that these changes usually don't demand much in the way of change to client code, then I'll tell you from personal experience that the same is true of jQuery. For a minimalist wrapper like Umbrella, I expect it's even more true.

Keeping Fossil up-to-date with this library doesn't sound funny, either.

So Stephan is arguing that adopting Umbrella is risky because it hasn't been updated in a couple of years, and you're arguing that it's risky because it might change. Which is it?

Vanilla Javascript just works.

Only after you check and re-check against every browser commonly in use, which is why wrappers like this and caniuse.com exist in the first place.

This just happened here in Fossil land just days ago with the hand-wringing over whether we can adopt Ajax fetch() over Microsoft's abomination, XHR.

For a new repository, the size of the /timeline page showing just the initial empty check-in goes down from 10203 to 7170.

...which means a combined Umbrella.js + js.txt pays for itself on the first hit.

EDIT: Removed comment about canvas. I was remembering the pre-December 2017 method,

(14) By Florian Balmer (florian.balmer) on 2020-05-05 15:06:39 in reply to 13.1 [link] [source]

Regarding updates: Maybe stephan would like to see periodic maintenance commits, and I think they should keep client interfaces stable, so both?

(15) By anonymous on 2020-05-05 15:57:54 in reply to 13.1 [link] [source]

My wish for Fossil is to have 'fossil cgi' just output clean HTML, semantically clear, without specific JS, even without CSS, only loads, but no actual js/css content. It would sure look basic, dated, primitive, whatever... At the same time ship as built-ins the implementations which complement this HTML; these built-in resources could be fetched to user on demand as part of configuration and then integrated into repo db, like what's currently done with skins.

This separates the concerns and lets users style it, enhance it, choose a preferred JS framework, build a whole FossilHub around it :)

(16) By Warren Young (wyetr) on 2020-05-05 17:34:58 in reply to 15 [link] [source]

You should be able to do that with a "null" skin. Give it a try and tell us what breaks.

(18) By Stephan Beal (stephan) on 2020-05-05 17:47:05 in reply to 15 [link] [source]

My wish for Fossil is to have 'fossil cgi' just output clean HTML, semantically clear, without specific JS, even without CSS, only loads, but no actual js/css content. It would sure look basic, dated, primitive, whatever...

Out of curiosity, what would be the point of such a 1990s-style interface? It's certianly not something which would have wide appeal, and many features would become far less useful without coloring. e.g. diffs and timeline branch coloring... not that the timeline would work without CSS and JS... nor the diffs, for that matter, since CSS is used to set the fixed-width font and white-space styles and whatnot. Without CSS the diffs would possibly be an absolute jumbled mess. The site would be quite useless that way.

(19) By anonymous on 2020-05-05 18:35:59 in reply to 18 [link] [source]

Out of curiosity, what would be the point of such a 1990s-style interface?

The point is not to ask users to use "1990s-style", but not to lock into any timeframe or framework choices for the matter, and make customization go beyond just skinning. Meanwhile, to ship a default implementation, just as done with skinning, to make it "usable".

In a general sense, Fossil webview is acting like a templating system for repository's content. So ideally, the underlying templates could be made available to the user for customization or extension. This includes hooking up whatever JS editor or code highlighter.

Sure there's no simple fix for this now, but it's a point to consider. This also brings a question of what level of presentation, reference detail, and control can Fossil expose to the outside?

(20) By sean (jungleboogie) on 2020-05-05 23:39:21 in reply to 19 [link] [source]

Meanwhile, to ship a default implementation, just as done with skinning, to make it "usable".

Who decides what's usable? On the timeline page, there's been many enhancements over the years to make things easier to spot and follow the flow of commits. You can turn all that off in your browser and not see any of that. Does that mean everyone should find this view usable?

I don't think anyone here is in favor of feature creep, if that's your concern.

(21) By anonymous on 2020-05-06 00:28:25 in reply to 20 [link] [source]

I'm not sure I fully understood your point. This is not to reignite js/no-js debate. It's about having more vectors for customization of Fossil's webview. So far there's only one vector: skin. Skin supports a handful of vars and a number of CSS classes, that's as much of what's exposed from the Fossil internals.

In a broader view, Fossil could define a number of explicit templates (currently these are tightly coupled to the code) with a support for the content-variables, the timeline could be one of these. These templates could be also customized by user and imported into Fossil, like what's done for skins.

(22) By Stephan Beal (stephan) on 2020-05-06 01:23:03 in reply to 21 [link] [source]

(currently these are tightly coupled to the code)

That seems highly unlikely to change unless fossil is ever massively refactored. The current architecture simply doesn't support scriptable/configurable templates for the built-in pages. The customization options are limited to what CSS, JS, and a few TH1 vars can do.

(17) By Stephan Beal (stephan) on 2020-05-05 17:43:04 in reply to 13.1 [link] [source]

So Stephan is arguing that adopting Umbrella is risky because it hasn't been updated in a couple of years, and you're arguing that it's risky because it might change. Which is it?

i didn't say "risky." i said it's either perfect or unmaintained. At only a few kb, though, it's well within an amount of code we could self-maintain. jQuery, OTOH, is a much larger beast, and not something i'd look forward to forking or maintaining with patches.

Given the choice between jQuery and umbrella i'd go with umbrella simply because it's tiny enough that maintaining it is not a serious issue - i read through the whole thing in a single brief sitting a couple weeks ago, and there's not a line over-complicated code in the whole thing.

That said, i'm almost ambivalent on whether we import a framework or not, depending on the time of day and how much sleep i've had (currently 2 hours in the past 30-ish). jQuery, despite my overwhelmingly positive experiences with it, it seems almost like overkill for most DOM stuff since JS introduced querySelector() and friend (granted, jQ's interfaces are easier to use).

Y'all can sort it out. i wouldn't be averse to maintaining something like umbrella. For that matter, we could rename it to "fossbrella" and "make it our own" with whatever enhancements we need.

(23) By Stephan Beal (stephan) on 2020-05-06 03:28:56 in reply to 1 [link] [source]

jQuery or umbrella here or there...

In the context of /filepage ajaxification i've been porting over slimmed-down variants of my own personal pure-JS toolbox APIs, as needed, and adding them as "builtin files," which allows them to be easily embedded directly in any page or loaded via the /builtin/FILENAME route, as the page developer prefers, as demonstrated here:

  style_emit_script_fossil_bootstrap(0);
  style_emit_script_fetch(0); // AJAX API
  style_emit_script_tabs(0);  // tabbed UI API
  style_emit_script_builtin("fossil.page.fileedit.js",0);
  style_emit_script_confirmer(0); // two-step confirmation buttons[^1]

misref = https://www.fossil-scm.org/home/finfo?name=src/fossil.confirmer.js

They start off by bootstrapping the global JS environment with a fossil object, which is populated with certain C-level info like the top-most URL path for the CGI/server. That object becomes the namespace for all of the other functionality.

Here's a screenshot of their sizes, as reported by Firefox's dev tools:

https://fossil.wanderinghorse.net/screenshots/fossil-js-apis-request-sizes.png

The far right column is their real size (noting that the builtin-files process strips their comments and all leading spaces), roughly 21kb, and the next column to the left is their gzipped (over-the-wire) size, not quite 10kb. The original files, with all of their formatting/comments/docs intact, are roughly 43kb.

It's not as small as umbrella, but it's not focused on the same things (and they're not mutually exclusive).

With the right toolbox, i'm not entirely convinced that we need a framework, but am also not against one, provided it's not a potential maintenance nightmare.


  1. ^ Misreference

(24) By Warren Young (wyoung) on 2020-05-06 08:33:23 in reply to 23 [link] [source]

i've been porting over slimmed-down variants of my own personal pure-JS toolbox APIs

I don't mind if we "reinvent" the likes of Umbrella, rather than adopt one of them, so long as we have someone doing the work to create and maintain it.

To make this our "official" JS wrapper for the project, we should then start porting code over to make use of it wherever it shortens the code. That project should probably wait until all of this fileedit stuff lands on trunk, though.

(25) By Stephan Beal (stephan) on 2020-05-06 13:12:38 in reply to 24 [link] [source]

I don't mind if we "reinvent" the likes of Umbrella, rather than adopt one of them, so long as we have someone doing the work to create and maintain it.

These particular ones i've been assembling in the contexts of my own websites/apps, some of it going back to 2007 (but most of it written since HTML5 became a thing), so it would seem to qualify as "maintained" (even if only in my little neck of the woods) ;).

That said, none of those bits actually do DOM stuff the way umbrella and friends do. In a recent hobby project i developed an allergy to innerHTML, as it's much-maligned and potentially dangerous, and started writing any and all DOM-editing bits using a raw-HTML-free convenience wrapper around the DOM API (which is now known as fossil.dom.js).

Long story short: umbrella/jQuery's DOM search/manipulation conveniences are definitely a separate niche, as yet uncovered by in-tree code.

(26) By Florian Balmer (florian.balmer) on 2020-05-07 10:57:46 in reply to 24 [link] [source]

... we should then start porting code over to make use of it wherever it shortens the code.

So, everywhere? Because X('id') will always be shorter than document.querySelector('id') -- but also much less readable. `

I still think it's not a good idea to rewrite all the existing Javascript code. This harbors a huge risk to introduce bugs, and it may also cut down on the currently very good browser compatibility of Fossil, as using a framework doesn't mean cross-browser testing will be redundant.

Also, if size/bandwidth is a concern, Javascript should be served as a separate, gzipped and cacheable resource.† I don't think that size reduction from "Vanilla code" to "any-framework code" is relevant regarding the size of the final gzipped resource. (My samples with the hamburger menu were "all or nothing", not "Vanilla vs. any-framework".)

The same goes for minification: I don't think this is a big win regarding the size of the final gzipped resource, but makes debugging the Javascript code more complicated.

† As far as I remember, things used to be this way, but it was a design decision by drh to deliver the Javascript embedded in web pages, probably to generate more self-contained modules.

(27.1) By Stephan Beal (stephan) on 2020-05-07 11:12:58 edited from 27.0 in reply to 26 [link] [source]

Because X('id') will always be shorter than...

Apropos: we "really shouldn't" pollute the global namespace - that makes it really hard to differentiate fossil-injected functions from builtin ones. My recommendations in that regard are:

One) If possible, don't use any globals. Wrap the code in an anonymous function and use function-local symbols:

(function(){
  const X = (s)=>document.querySelector(s);
  ...
})();

Two) If that's not an option then create a single global symbol and put your app-local methods in it:

window.fossil = {
  X: (s)=>document.querySelector(s)
};
fossil.Y = (s)=>document.querySelectorAll(s);

The fileedit branch adds a C-level function:

https://fossil-scm.org/home/artifact?udc=1&ln=1427-1478&name=508f9ca15de35230

which injects a global fossil object, populated with a small amount of app-agnostic/generic utility code. The tiny core of that code (which embeds C-runtime-level config information like the preferred number of hash digits), is emitted as an inline SCRIPT tag, but the larger part of it can optionally be emitted as either inlined code or a SCRIPT src='builtin/fossil.bootstrap.js' tag, as the caller prefers.

hashtag 0.02€

(28) By Stephan Beal (stephan) on 2020-05-07 11:20:40 in reply to 26 [link] [source]

The same goes for minification: I don't think this is a big win regarding the size of the final gzipped resource, but makes debugging the Javascript code more complicated.

i'm going to agree wholeheartedly with that. When Firefox says "error in file X, line Y," line Y refers to the minified version and does not line up with the sources i'm using. That complicates debugging notably. At least it's not completely minified, though, with the EOLs stripped, as that's much more difficult to wade through.

That said: i tend to... ah... abundantly document code, often to the order of 3-4x more docs than code, and having that embedded in the binary (via the builtin files mechanism) probably doesn't really help anyone:

$ wc fossil.bootstrap.js 
 273    1064  9152 fossil.bootstrap.js
#lines  words bytes, vs w/o comments:
$ stripcomments  < fossil.bootstrap.js | wc
 128    279   3218
# vs completely minified:
$ jsmin  < fossil.bootstrap.js | wc
 10     60    2448

(29) By Florian Balmer (florian.balmer) on 2020-05-07 13:22:46 in reply to 28 [link] [source]

Something like running the comment removal step performed for built-in Javascript resources -- but prior to the build?

I too love stripping any unused bytes from my executable files. On the other hand, as long as the default Fossil executable carries around almost 100 KB worth of CSS for the Bootstrap skin, less than 1 KB for your comments probably don't matter ...

For one of my Win32 projects, resources are packed into the executable file as compressed CAB archives. The CAB format supports efficient compression algorithms, and performs compression in blocks across file boundaries, similar to TAR+GZ, and there's built-in APIs to uncompress the resources.

(30) By Stephan Beal (stephan) on 2020-05-07 14:08:22 in reply to 29 [link] [source]

Something like running the comment removal step performed for built-in Javascript resources -- but prior to the build?

The removal of leading spaces and JS comments happens via mkbuiltin.c. It makes sense to do, in that it nearly halves the size of JS code, but it isn't helpful when debugging. It might be worth adding a flag to mkbuiltin.c which tells it "we're currently developing, so leave the spaces and comments intact" - i'll take a look at that the next time it bugs me.

less than 1 KB for your comments probably don't matter

Ha!

$ ls -1 fossil.*js
fossil.bootstrap.js
fossil.confirmer.js
fossil.dom.js
fossil.fetch.js
fossil.page.fileedit.js
fossil.tabs.js

# With comments:
$ cat fossil.*js | wc -lc
   1670   53250

# Without:
$ cat fossil.*js | stripcomments | wc -lc
   1117   30163

== ~550 lines/~23kb of comments which are only potentially useful for someone actually working on or debugging that code.

PS: stripcomments = https://fossil.wanderinghorse.net/r/www-wh/finfo?name=site-tools/stripcomments.c

(31) By anonymous on 2020-05-07 17:12:59 in reply to 30 [source]

I took a quick look at stripcomments.c, in case it could help with another task I've contemplated. It appears to not handle escaped newlines correctly.

Content of test.c:

int a; // end of line hidden \
int b;

Session to demo problem, where sc is the C-compiled form of stripcomments.c:

$ ./sc < test.c 
int a; 
int b;
$  gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "test.c"
int a;
$ 

(32) By Stephan Beal (stephan) on 2020-05-07 19:10:32 in reply to 31 [link] [source]

// end of line hidden

i'm going to call that one a garbage in, garbage out corner case.

How C and C++ precisely define that case (namely, that the comment continues to the next line) is actually irrelevant to stripcomments, which is used on sources in many languages. Unless it's defined 100% the same way in every language which uses that comment style, stripcomments' behaviour is not "wrong" - it's just different from your use case.

No developer can expect to abuse a backslash that way in C or C++ without getting scolded by their colleagues and/or tools.

$ cat n.c
int main(){
  int i = 1; // end of line \
  int x = 0;
  return x;
}

$ gcc n.c -o n
n.c: In function ‘main’:
n.c:5:10: error: ‘x’ undeclared (first use in this function)
   return x;
          ^

When that construct appears outside of a C++-style comment, it is handled properly, in that it does not change the semantics of the input:

$ cat n.c
int main(){
  int i = 1; // end of line \
  char const * a = "a\
// comment in a string \
b";
#define foo \
  bar \
  baz
  int x = 0;
  return x;
}
[stephan@lapdog:~/tmp]$ stripcomments < n.c
int main(){
  int i = 1; 
  char const * a = "a\
// comment in a string \
b";
#define foo \
  bar \
  baz
  int x = 0;
  return x;
}

Whether or not that counts as a point to change, i'm still undecided, especially since in 30-ish years of coding i've never come across that construct in anything but (counting today) two pieces of contrived example code. Every minute spent addressing non-problems is a minute not spent on improving fossil ;).

See also: https://stackoverflow.com/a/12305397

(33) By anonymous on 2020-05-07 19:33:38 in reply to 32 [link] [source]

I do not argue with how you elect to spend your time.

I was under the impression that stripcomments.c was intended "to strip C- and C++-style comments from stdin," and because elimination of escaped newlines has long been a well defined phase of translation, for both C and C++, I do not see this as a "garbage in" situation. As an example, it is occasionally useful, for a multi-line #define, to temporarily alter it with a // comment. Granted, it is not something that should normally be left in code to perplex those who read it later, particularly since the effect is not so well understood as most others.

A useful tool that I once wrote, (but lost track of), scanned C source at an elementary level (phases 1-4), normalizing whitespace (including comments which count as such), with # constructs normalized per their line-oriented meaning, and computed a hash of the result. It was good for being sure that various manipulations had not altered how a (correct) compiler's parser would see the code. It never occurred to me to declare that some inputs acceptable to a compiler would be transformed into something with another meaning.

(34.1) By Stephan Beal (stephan) on 2020-05-07 20:22:54 edited from 34.0 in reply to 33 [link] [source]

"to strip C- and C++-style comments from stdin,"

Those are generic terms used in the documentation/vocabularies of most programming languages i've worked with (few of which don't support at least one of those styles). If it were intended specifically and only for C/C++, (A) the "-style" part wouldn't be in the docs and (B) the tool wouldn't exist because the preprocessor can already perform that function, with all of the well-defined quirks of both C and C++. gcc has flags to do strip only comments, leaving macros intact.

In a generic sense, since comments are specifically intended to not have any programmatic meaning, processing escape sequences in them seems downright wrong to me.

But now i'm curious how JS and PHP and my own scripting engine deal with this. It's never come up, and honestly seems unlikely to in real-world code.

Edit: https://fossil.wanderinghorse.net/r/cwal/info/d40e79250d1df996

(35) By anonymous on 2020-05-07 23:44:24 in reply to 34.1 [link] [source]

Somewhat fewer characters than your added warning are needed to just fix it. At line 118, before:

          case 2: /* C++ comment */
              if('\n' == ch){

and after:

          case 2: /* C++ comment */
              if('\n' == ch && '\\' != prev){
		  /* Unescaped newline ends a C++ to-end-of-line comment. */