Forum does not filter HTML
(1.2) Originally by Midar3 with edits by Stephan Beal (stephan) on 2020-06-01 03:07:15 from 1.1 [source]
Hi!
It seems that the forum does not filter HTML at all if the markup style is set to Markdown. Neither CSS nor JavaScript is filtered. That seems very dangerous. There really should be a whitelist of allowed tags.
I would have reported this privately, as this might be worthy of a CVE, but I could not find any way to do so, hence posting here. It also seems the bug tracker is disabled.
As a demonstration, this has a green background.
<style>body { background-color: lime; }</style>
// Edit: To make matters worse, I can also just leave a tag open forever. Everything is now clickable. <a href='https://google.com'>
Edited by Stephan to escape the "problematic" HTML: i get that you're trying to demonstrate a problem, but a description would have sufficed. Please don't intentionally abuse the limitations of the forum.
(2.1) By sean (jungleboogie) on 2020-06-01 03:02:54 edited from 2.0 in reply to 1.1 [link] [source]
Warren noticed this the other day
(3) By Richard Hipp (drh) on 2020-06-01 11:54:20 in reply to 1.2 [link] [source]
I checked in a change so that js's original test case no longer causes problems. Now the "Recent" link on the menu bar omits the lime background color. But my change from last night is insufficient from the general case. There are still many ways to sneak anti-social CSS into a Markdown-formatted wiki page or forum post.
I looked at the Commonmark spec for embedded HTML today. Wow - what a mess. I thought the whole point of Markdown was that it was suppose to be simpler than things like raw HTML. Not so. Take a look at the rules for HTML in Commonmark. Sometimes inline markup within an HTML block is rendered, and other times not. There are 7 distinct cases that you have to remember and understand.
In any event, Natacha Porté's Markdown implementation used by Fossil appears to be based on John Gruber’s original Markdown.pl implementation and hence does not implement all of the byzantine rules of Commonmark. And I'm not sure that is worth fixing.
Contrast the commonmark HTML rules with the rules for including HTML in Fossil-wiki. Fossil-wiki has a white-list of supported ("safe") HTML markup. The HTML can occur anywhere. The formatter automatically takes care of adding omitted close elements.
What to do about Markdown
I think there are basically two options for fixing this:
Major brain surgery on the existing Markdown formatter, so that it can enforce limitations on embedded HTML, and add missing close-elements.
Leave the Markdown formatter unchanged (perhaps even undoing my change from last night) and add a second pass that goes over the generated HTML to remove problematic elements and insert missing close tags.
I'm not sure which approach is better here.
(4) By Stephan Beal (stephan) on 2020-06-01 12:32:33 in reply to 3 [link] [source]
I'm not sure which approach is better here.
i had an idea about this last night which might resolve our forum-side markdown tag pain without too much grief/effort:
When saving a post, count the number of openers and closers for each tag type. Don't check that they're properly nested or make semantic sense, just ensure that for each <FOO ...>
we have a </FOO>
(ignoring case, of course, and not even caring whether FOO is a genuine HTML tag name or not). If those counts are not equal, refuse to save, pointing out that there's a tag count mismatch. There would necessarily be some exceptions for self-closing tags like BR, HR, IMG, INPUT (not that INPUT should be allowed at all in markdown, though).
That doesn't solve the <STYLE>
tag case (which we could arguably elide altogether in forum output, or convert <style>...</style>
to <!--style>...</style-->
), but it would eliminate the troubling missing </TEXTAREA>
tag and similar cases.
There's one case i can see this breaking: weird constructs like 1<4>3
, where this hypothetical checker would confirm that there's a </4>
tag. That seems like a negligible risk, though, with an easy workaround for users (add a space before the '4').
(5) By Richard Hipp (drh) on 2020-06-01 12:49:11 in reply to 4 [link] [source]
For robustness, I think it needs to do a second pass over the output, with full HTML parsing. No compromises or short-cuts. Attackers will exploit compromises and short-cuts. The second pass should:
Omit any markup that is not on the white-list. Perhaps the white-list should be expanded to include some of the newer HTML5 elements that have been added since the original formatter was written.
Omit any HTML element attributes not specifically allowed by the white-list. Without this step, an attacker would be able to add id= tags that could interfere with the Javascript at the bottom of the page.
Omit any extra end-tags. For example, we probably do not want allow
</body>
in a forum post. Or, for that matter, an extra</div>
that would close the division that contains the post early.Insert all missing end-tags so that the tags are strictly nested.
(6) By Stephan Beal (stephan) on 2020-06-01 13:47:52 in reply to 5 [link] [source]
For robustness, I think it needs to do a second pass over the output
Could that not be done in the first pass, before any generated markup is added? If done in the second pass, the amount of effort for the parser (as opposed to development effort) jumps considerably as we've already emitted lots of new markup. So: POST ==> tidy-up filter ==> parser (ummodified)
(7) By Richard Hipp (drh) on 2020-06-01 14:00:09 in reply to 6 [link] [source]
The approach I'm experimenting with now is to replace many of the calls to blob_append() in markdown_html.c with calls to a new function blob_append_safe_html(). The new blob_append_safe_html() makes the necessary transformations as the text is being appended to the output buffer. We'll see how it goes.
(18) By Midar3 on 2020-06-01 17:35:15 in reply to 5 [link] [source]
This sounds like a good idea. But should 4 really be done, or is it better to refuse the input? I think in 99% of the cases, someone is just forgetting to close a tag and not someone malicious and they would appreciate the hint.
How should markdown that is coming from the command line tool be handled?
(19) By Stephan Beal (stephan) on 2020-06-01 19:23:36 in reply to 18 [link] [source]
How should markdown that is coming from the command line tool be handled?
Markdown via the wiki and embedded docs (which are all handled via the same underlying routines) should, IMO, not be affected by this. We've had markdown since 2012 or 13 or so and it's only proven to be remotely problematic since the introduction of the forum, where random people can drop random content into it. (Even then, somewhat surprisingly, it actually took about a year and a half for it to ever even get recognized as a genuine problem, through a few posts which were innocently missing closing tags of one sort or another.)
(23) By anonymous on 2020-06-02 18:12:49 in reply to 5 [link] [source]
I think those ideas are good one, both in forum and in wiki (unless the administrator has enabled use of arbitrary HTML in wiki). However, additionally I think that:
Inline CSS should be disabled.
If a CSS class is specified, it must be one of the whitelisted classes (allowing the administrator to set the list of whitelisted CSS classes) or else it is disallowed. However, any CSS class whose name begins with "
language-
" and only consists of whitelisted characters should be permitted, even if the CSS class name is not whitelisted.There are some commands which may be useful in wiki but possibly should not be allowed in forums, such as
<NAV>
(a forum post isn't a set of navigation links) and<ARTICLE>
(the forum post is already a article; it shouldn't be necessary to make a nested article).
(25) By Stephan Beal (stephan) on 2020-06-02 18:36:50 in reply to 23 [link] [source]
If a CSS class is specified, it must be one of the whitelisted classes (allowing the administrator to set the list of whitelisted CSS classes) or else it is disallowed.
FWIW, that's a level of micro-management which i would really hate to see fossil try to impose. It's completely onerous to those who develop docs in fossil: if they're an admin, they have to go whitelist every class they try to use and if they're not an admit they have to go ask an admin to do so.
The policy could not be versionable because that would allow any user to bypass it locally and whitelist what they want, then push those changes back to the server (keeping in mind that they can edit docs, which means they have commit access). Thus only an admin on the central-most copy of the repo would be able to whitelist anything. That would be an utter tragedy in terms of usability/flexibility, IMO.
(26) By anonymous on 2020-06-03 18:43:38 in reply to 25 [link] [source]
FWIW, that's a level of micro-management which i would really hate to see fossil try to impose. It's completely onerous to those who develop docs in fossil: if they're an admin, they have to go whitelist every class they try to use and if they're not an admit they have to go ask an admin to do so.
You could have an option to disable that restriction if wanted, I suppose. However, I would find it useful, since I want to avoid having too much CSS in general (actually, if it was up to me, I would have no CSS at all); the user can specify their own CSS if wanted and ignore that on the server. The end user's CSS code should be able to distinguish what context something is, too (although this could also be done by writing more complicated selectors, I suppose, too).
(8) By Richard Hipp (drh) on 2020-06-01 15:02:06 in reply to 1.2 [link] [source]
Code to fix this (and hopefully also the unclosed tag issue previously identified) should now be on trunk, and on the main Fossil website. Please try to break it. Report any success or failure here. Thanks.
(9) By anonymous on 2020-06-01 15:52:14 in reply to 8 [link] [source]
Looks like this will "eat" the blacklisted tags.
Maybe it should escape them and show as part of text instead (either preformatted or html-escape the angle brackets, after all it's user input?
FWIW GitHub seems to just print such tags as text.
(10.1) By Richard Hipp (drh) on 2020-06-10 20:21:11 edited from 10.0 in reply to 9 [link] [source]
OK. It is a simple modification to show the elided tags a plain text. But I took this one step further: I enclose the elided tags within an error <span>, so that they are more readily apparent. I don't know if this is a good idea or not, but it seems like something worth trying out.
Example #1: js's original test case. <style>body{background-color:lime;}</style>
Example #2: Surplus end-tags </div></body>
Another possibility is to only use the error <span> in the Preview
and just show the elided tags as plain text if they do get committed.
That will involve an extra "flags" parameter that will need to get passed
down through the formatter routines, to let the HTML sanitizer know whether
or not it is suppose to generate the error <span>
s. So it is more work,
but quite doable.
(11) By Richard Hipp (drh) on 2020-06-01 16:22:28 in reply to 10.0 [link] [source]
Yet another possibility: Use the error <span>
on Preview but also
refuse to commit a forum post or wiki edit that has any HTML errors.
Force the user to fix the error in order for their post to go in.
(15) By Stephan Beal (stephan) on 2020-06-01 16:32:20 in reply to 11 [link] [source]
... but also refuse to commit a forum post or wiki edit that has any HTML errors
Harsh but fair.
We could maybe make use of these warnings in the wiki/fileedit markdown preview, too, but without a cannot-commit restriction imposed on it. Once you're done i'll look at adding that (presumably a flag needs to be added to the relevant calls to pass on that preference).
(12) By Stephan Beal (stephan) on 2020-06-01 16:26:05 in reply to 10.0 [link] [source]
But I took this one step further:...
That is a thing of beauty. Whether or not to show those only in preview mode is a tough call. It would arguably be better for arbitrary forum-goers if some broken markup outside of their control is not trying to steal their attention with bright colors, though, which argues for only showing the error tags in preview mode.
(14) By Richard Hipp (drh) on 2020-06-01 16:29:06 in reply to 12 [link] [source]
If Preview renders differently from the default display, is it still really "Preview"?
(16) By Stephan Beal (stephan) on 2020-06-01 16:33:34 in reply to 14 [link] [source]
If Preview renders differently from the default display, is it still really "Preview"?
LOL! i'm not foolish enough to enter into a battle of philosophy against you ;). That's a fair question, though.
(13) By Stephan Beal (stephan) on 2020-06-01 16:28:04 in reply to 10.0 [link] [source]
<script>body{background-color:lime;}</script>
PS: that should be "style" instead of script.
(17) By anonymous on 2020-06-01 16:58:41 in reply to 10.0 [link] [source]
I'd prefer this to be shown verbatim, no coloring.
It's Markdown, not HTML validator. Anything allowed is rendered in HTML, the rest is just text, or ?code.
This way the Preview is just as what would be commited.
(20) By Richard Hipp (drh) on 2020-06-01 20:55:33 in reply to 8 [link] [source]
The new formatter is not working quite right. See the four error tokens in markdown.md.
(21) By Stephan Beal (stephan) on 2020-06-01 21:21:53 in reply to 20 [link] [source]
The new formatter is not working quite right. See the four error tokens in markdown.md.
But it really should only be enabled for the forum, shouldn't it? For embedded docs and wiki pages this whole thing is a non-problem.
(22) By Warren Young (wyoung) on 2020-06-01 22:19:56 in reply to 21 [link] [source]
Some Fossil repos allow public wiki additions, in the spirit of Wikipedia.
(24) By Stephan Beal (stephan) on 2020-06-02 18:30:15 in reply to 22 [link] [source]
Some Fossil repos allow public wiki additions, in the spirit of Wikipedia.
Mis-markup in a wiki or embedded doc page doesn't really break anything except that page (and anyone allowing public edits deserves what they get - i've had fossil repos gets spammed by bots that way when i quite inadvertently allowed anonymous users to edit their wikis). The forum content is another matter because each unit is no longer atomic, but can have side effects on a whole thread.
IMO trying to police the contents of wiki pages and embedded docs for well-formedness is a dark, slippery path.