Fossil Forum

Markdown formatter bug.
Login

Markdown formatter bug.

(1.1) By Richard Hipp (drh) on 2021-08-06 12:29:22 edited from 1.0 [link] [source]

This bug was discovered over on the Pikchr forum. If you include an explicit <table> in the Markdown and put Pikchr diagrams in cells of the table, and if you include additional text after the </table>, then the Pikchr is not rendered. If you omit the trailing text, the Pikchr is rendered.

This post contains the <table> with two pikchr diagrams in cells without the trailing text:

1st Pikchr diagram
arrow
box "1st" "Pikchr" "diagram"
arrow

2nd Pikchr diagram
arrow
oval "2nd" "Pikchr" "diagram"
arrow

(2) By Richard Hipp (drh) on 2021-08-06 12:27:36 in reply to 1.0 [source]

Here is the same <table> with text after the </table>:

1st Pikchr diagram
arrow
box "1st" "Pikchr" "diagram"
arrow

2nd Pikchr diagram
arrow
oval "2nd" "Pikchr" "diagram"
arrow

Here is the additional text.

(3) By Stephan Beal (stephan) on 2021-08-06 16:30:04 in reply to 2 [link] [source]

Here is the additional text.

Though i've been unable to fix it, here's a curious detail: adding any non-whitespace anywhere on the same line after the closing table tag stops the bug. Whether that non-whitespace is surrounded by whitespace is irrelevant. Whether that's a red herring or not is currently anyone's guess, but it's curious. Based solely on that, it would seem that markdown.c:htmlblock_end(), at least in some contexts, is related or adjacent to this bug.

(4) By Richard Hipp (drh) on 2021-08-06 16:45:52 in reply to 3 [link] [source]

To prove this point, here is a copy of the penultimate post with the added non-whitespace after </table>.

1st Pikchr diagram
arrow
box "1st" "Pikchr" "diagram"
arrow

2nd Pikchr diagram
arrow
oval "2nd" "Pikchr" "diagram"
arrow

(Text after <table> on the same line)

Here is the additional text.

(5) By Richard Hipp (drh) on 2021-08-06 23:29:59 in reply to 1.1 [link] [source]

Problem Analysis

The way the markdown interpreter works is that if an HTML block tag span begins after a blank line and is followed immediately by a blank line, then that entire HTML block tag span is copied into the output verbatim. None of the content is subject to Markdown formatting. The list of HTML block tags that followed this rule use to be large, including:

  • <p>
  • <table>
  • <ins>
  • <del>
  • ... and many others

The fact that <table> was on this list was the source of the problem. If the </table> was followed by a blank line, then the interior text was not markdown-interpreted. If extra text appeared after the </table> then int content was markdown-interpreted.

The fix

The change (check-in cdbf0bf179989a2d) was to dramatically reduce the number of HTML tags that behave this way. The list is now just these two:

  • <pre>
  • <script>

(6) By Stephan Beal (stephan) on 2021-08-08 12:14:55 in reply to 5 [link] [source]

The way the markdown interpreter works is that if an HTML block tag span begins after a blank line and is followed immediately by a blank line, then that entire HTML block tag span is copied into the output verbatim.

i can't shake the nagging feeling that that might, on occasion, be useful, so long as the user knows about it and how to work around it when it's not useful. If extra non-whitespace is necessary after a closing tag in order to force markdown parsing, a &nbsp; could be used to make it invisible.

(7) By Richard Hipp (drh) on 2021-08-08 17:09:57 in reply to 6 [link] [source]

What if we add <html> to the set of tags that work this way. That way, if you have a chunk of text that you want to be pure HTML without any Markdown interpretation, you simply enclose it inside <html>...</html> preceded and followed by a blank line?

(8) By Stephan Beal (stephan) on 2021-08-08 18:14:36 in reply to 7 [link] [source]

What if we add...

That would certainly be much more intuitive.

... preceded and followed by a blank line?

Would the blank lines be necessary? We'd be recycling a tag which otherwise "cannot," outside of verbatim blocks, appear in markdown docs, so accepting it inlined at any point in the text seems(?) okay.

(9) By Brian Tiffin (btiffin) on 2021-08-08 19:59:41 in reply to 7 [link] [source]

Personal opinion.

When I first read this, I went to the Fossil wiki format page to look into the verbatim and nowiki tags. I'd prefer a different tag than html.

Isn't there a rule somewhere about starting CGI output with html avoiding (or including?) header and footer processing? Asking, because I don't remember, but spider sense is tingling.

I'd prefer something like noparse, or asis, or an even more magic word that covers the various semantics of nowiki, verbatim and don't touch any of this, even the angle brackets, until you find a close tag of the magic word.

Have good.

(10) By Stephan Beal (stephan) on 2021-08-08 20:08:18 in reply to 9 [link] [source]

Isn't there a rule somewhere about starting CGI output with html avoiding (or including?) header and footer processing? Asking, because I don't remember, but spider sense is tingling.

That doesn't apply to markdown files, though. Yes, we'd have a bit of mixed semantics with a tag called "html" but i've been unable to come up with a case where that would bite us. But... that's not to say it's impossible. A completely custom tag would probably cause less confusion.

(11) By Richard Hipp (drh) on 2021-08-08 22:38:28 in reply to 10 [link] [source]

Ideas:

  • <purehtml>
  • <nomarkdown>
  • <htmlonly>

Or, just omit the <html> and its corresponding </html> but keep all the (non-markdown formatted) text in between.

(12) By Larry Brasfield (larrybr) on 2021-08-08 23:47:56 in reply to 11 [link] [source]

(Piping in reminds me of the alleged one-to-one mapping between GI tracts and opinions.)

As a signal to the markdown processor, <html> seems perfect. Better even than <html_for_sure>, <nothing_but_ordinary_html>, or other elaborations. If the input to the markdown processor was purported to be ready for interpretation as a response to a web page request, maybe some adherence to standards for that would be smart. But it is not such input. Simple is easier to remember.

(13) By Stephan Beal (stephan) on 2021-08-09 00:01:18 in reply to 12 [link] [source]

Simple is easier to remember

FWIW, Larry has me convinced there. Assuming we don't have any semantic collisions (i don't believe we do), "html" sounds perfect.

(14) By Richard Hipp (drh) on 2021-08-09 01:16:17 in reply to 13 [link] [source]

Changes are on trunk. Try it out. Report any problems.

(15) By Brian Tiffin (btiffin) on 2021-08-09 01:56:52 in reply to 13 [link] [source]

Agree. Thanks to the hint about separating the special-ness of the <html> tag from Markdown and CGI, Stephan, allowed Larry's wisdom to sink in faster.

Allowing raw to the browser with <html> blocks is a sane, easy thing to remember.

Allowing exception control of header/footer with <html> triggers in CGI is a sane, easy thing to remember, even though I'd still have to go look it up for reminder-ing on specifics.

And the internet gets to cheer as someone makes a good point, shifts other opinions and the design to the better. For the win. An internet win.

And new fossil powers. Thanks, Richard.