TH1: Confusion about HTMLized output

(1) By Florian Balmer (florian.balmer) on 2018-10-12 15:20:16

Consider this TH1 snippet in the Header, Footer, CSS or Javascript part of a skin template:
<verbatim>
<th1>
puts "puts: &<>\"'"
html "\nhtml: &<>\"'"
puts [ htmlize "\nputs\[htmlize\]: &<>\"'" ]
set test "&<>\"'"
</th1>
Outside TH1 block: test = $test
</verbatim>

Output:

<verbatim>
puts: &amp;&lt;&gt;&quot;&#39;
html: &<>"'
puts[htmlize]: &amp;amp;&amp;lt;&amp;gt;&amp;quot;&amp;#39;
Outside TH1 block: test = &<>"'
</verbatim>

It seems that <code>puts</code> already does what I would expect <code>html</code> to do, while the latter seems to work more like a bare-metal print function. Consequently, the results of <code><nowiki>puts[htmlize]</nowiki></code> are double-HTMLized.

The document [https://fossil-scm.org/index.html/doc/trunk/www/th1.md |The TH1 Scripting Language] states:

  *  <code>puts STRING</code>: Outputs the STRING unchanged.
  *  <code>html STRING</code>: Outputs the STRING escaped for HTML.
  *  <code>htmlize STRING</code>: Escape all characters of STRING which have special meaning in HTML. Returns the escaped string.

I must admit I don't understand this behavior, somehow. Is it possible that the code in the underlying scripting engine for <code>puts</code> and <code>html</code> was exchanged by mistake?

However, as several skin templates seem to rely on <code>html</code>, changing this may require a lot of careful testing.

TH1 variables in skin templates seem to be used mostly to construct hyperlinks, and the standards seem to allow both non-HTMLized and HTMLized forms:

<verbatim>
<a href="url&param">
<a href="url&amp;param">
</verbatim>

<verbatim>
<script> console.log("url&param"); /* → url&param */ </script>
<script> console.log("url&amp;param"); /* → url&amp;param */ </script>
</verbatim>

Problems might occur with string variables containing quotation marks, but this doesn't seem to be a common case.

(2) By Richard Hipp (drh) on 2018-10-12 18:08:05 in reply to 1 [link]

The "puts" command escapes its output so that it is safe to include it in
the middle of HTML.  The "html" command is like "puts" except it does raw
output, with no escaping.

The "htmlize" command escapes its argument so that it is safe to output as
part of a webpage.

Perhaps the names were not well chosen.  But they are what they are so we
need to live with them for historical compatibility.

(3) By Florian Balmer (florian.balmer) on 2018-10-12 20:46:51 in reply to 2 [link]

A better example would have been:

<verbatim>
<th1>
set test "&<>\"'"
puts "puts: $test"
html "\nhtml: $test"
puts [ htmlize "\nputs\[htmlize\]: $test" ]
</th1>
Outside TH1 block: test = $test
</verbatim>

Resulting in:

<verbatim>
puts: &amp;&lt;&gt;&quot;&#39;
html: &<>"'
puts[htmlize]: &amp;amp;&amp;lt;&amp;gt;&amp;quot;&amp;#39;
Outside TH1 block: test = &<>"'
</verbatim>

Because my initial assumption was that the <code>html</code> command would only escape variables, to make them fit with the rest of the literal (already escaped) HTML string.

But from the perspective of a HTML/CGI-oriented scripting language (at least in the context of Fossil skinning), the <code>html</code> command could also be considered as "outputting HTML, already escaped", and the <code>puts</code> command as "outputting text, need escaping".

I may revert the related changes to the hamburger menu customization template, as it looks somewhat simpler without the explicit <code><nowiki><TH1></nowiki></code> blocks.