Fossil Forum

TODOs and RFC for /chat features
Login

TODOs and RFC for /chat features

TODOs and RFC for /chat features

(1.1) By Stephan Beal (stephan) on 2021-09-18 03:43:07 edited from 1.0 [link] [source]

This post is mostly a "reminder to self", but at the same time an RFC, regarding potential /chat improvements (stressing the word potential):

In no particular order...

  1. Change the configuration popup box to a full-page DOM element which temporarily replaces the whole viewport with a new element full of options. This would be inherently more flexible than our current popup, not subject to specific screen size limitations and related scrolling problems. This is really easy to do, just needs to be done. (Of the potential TODOs listed here, this is currently the only WILL-DO.) Update: implemented.

  2. "Rooms" (a.k.a. named threads) can hypothetically be added to chat via the addition of a free-form text field to the chat table to name the target room. When posting, the default room (NULL) means to go to the main chat area and * means to ignore room filtering and show everyone the message. If a room value is set, only users who've enabled that room via the list of current rooms (could be added as a widget in point #1 or a separate widget in that same style), will see that message. The room-based filtering would be done on the client, not the server, so clients could freely swap between rooms without additional server round trips. The list of rooms is based solely on the the room values in the JSON objects of messages received by the client so far: there is no central server-side list of rooms. As the chat history expands or is reloaded, the list of rooms the client knows about is adjusted accordingly.

  3. The client supports the client-side selection of the beep tone, but there is no option to select the tone (partly because of point #1) - it's currently set server side and can't be changed unless one knows how to do so via the JS dev console. Resolving point #1 would give us "infinite" space to add toggles for features like this one. Update: implemented/re-enabled.

  4. It's become conventional to prefix chat messages with "#NNN" when referring to a previous message. This might become less necessary if point 2 is implemented. Regardless, we should arguably parse (server side) the leading text of messages for a pattern line \b#NNN[ :.]\b and replace those with something like <span class='msgref' data-msgid='NNN'>#NNN</span>. The chat client could then, after injecting the message, use querySelectorAll() to fish out such references and tie an event handler to them which would jump to the given message if it's in the client's loaded history. (If it's not already loaded, the event would be a no-op.) It's important, however, to recognize that the chat messages come from the server fully processed and ready for injection as HTML. The client cannot sensibly go parsing through that content to do its own replacement of such pseudo-markup.

(2) By Warren Young (wyoung) on 2021-09-18 04:13:09 in reply to 1.1 [link] [source]

Markdown?

(3) By Stephan Beal (stephan) on 2021-09-18 04:24:25 in reply to 2 [link] [source]

Markdown?

Indeed. i don't know for certain that anyone else is against that (as i initially was, for whatever reason). If not, it would seem to be a simple matter of swapping out the text-processing routine in the server part which fetches the messages.

(4) By Warren Young (wyoung) on 2021-09-18 07:52:07 in reply to 3 [link] [source]

Squee!

(5) By Stephan Beal (stephan) on 2021-09-18 21:33:54 in reply to 2 [link] [source]

Markdown?

It occurs to me that switching to md will make point #4 (addressing chat messages via #NNN) impossible unless we post-process the markdown, but That Way Lies Madness. No big deal. Markdown would arguably be a the bigger win.

Whether or not we can just switch over is really up to Richard. If he's okay with it, i'll make the code switch. It will, however, affect the posting of links in chat: bare URLs won't be auto-linked anymore. Not a problem, just something to be aware of.

(6.2) By Warren Young (wyoung) on 2021-09-19 02:40:18 edited from 6.1 in reply to 5 [link] [source]

This problem results from Fossil Markdown being out-of-spec: a leading # on a line is only supposed to be interpreted as creating a header if it has 0-3 leading spaces and is followed by at least one space or tab.

According to the same spec, the need for a space after the last opening # mark is a break with pre-standardized Markdown, the restriction added to fix parsing problems. Another example we've all seen here on the forums is someone pasting C code in without putting it in a fenced code block: each #include becomes a huge header, making a mess of the post. Following the spec here is worthwhile even all the rest of this aside.

GFM complies with CommonMark, allowing it to handle things like #1 as an issue number. (Demo) Fossil's closest analog to "issues" are tickets, which are only and always referenced with hashes, which means we can take that same idea and go in a different direction, using it to refer to post numbers. It could work not just in chat, but in the forum as well. I could reference your first edit to the initial post in this thread with "#1.1" in this way.

Realize also that comment references in chat don't have to be at the start of a line. "Bob said in #1234 that..." You still want the reference to be made, even though there's no way it should be interpreted as a header, even under Fossil Markdown's current rules.

(7.1) By Warren Young (wyoung) on 2021-09-19 03:17:32 edited from 7.0 in reply to 6.2 [source]

I believe this patch fixes the problem, allowing us to use "#1" and such freely:

Index: src/markdown.c
==================================================================
--- src/markdown.c
+++ src/markdown.c
@@ -1665,10 +1665,11 @@
 
   if( !size || data[0]!='#' ) return 0;
 
   while( level<size && level<6 && data[level]=='#' ){ level++; }
   for(i=level; i<size && (data[i]==' ' || data[i]=='\t'); i++);
+  if ( i == level ) return parse_paragraph(ob, rndr, data, size);
   span_beg = i;
 
   for(end=i; end<size && data[end]!='\n'; end++);
   skip = end;
   if( end<=i ) return parse_paragraph(ob, rndr, data, size);

I'm not committing it because it needs testing and evaluation by someone who groks the Markdown processor.

(8) By Warren Young (wyoung) on 2021-09-19 03:41:09 in reply to 7.1 [link] [source]

I came up with this test to compare my patch against the latest trunk:

for f in $(find www -name \*.md) ; do
    /usr/local/bin/fossil test-markdown-render $f > a
                 ./fossil test-markdown-render $f > b
    diff a b
done

That flagged only one file in Fossil's doc tree that was taking advantage of the freedom to have zero spaces between the last # and the start of the header text, but I fixed that.

All of the hashes this test emits appear to be Pikchr nonce comments. I don't know why those exist, but they're harmless with respect to this test.

(9) By Stephan Beal (stephan) on 2021-09-19 04:43:01 in reply to 7.1 [link] [source]

I'm not committing it because it needs testing and evaluation by someone who groks the Markdown processor.

AFAIK that's only one of us (and it ain't me) ;).

Chat is now using markdown over in the chat-markdown branch. It "seems" to work just fine, but has the caveat that chat messages are rendered when they're sent, not when they're saved, so the new rendering applies to all historical messages. That won't(? probably!) break anything more significant than automated hyperlinking, which now requires md syntax, but that's a tiny price to pay.

My recommendation is that we use that branch over in /chat for a few days to make sure we haven't made made a horrible decision, but so far, so good. This one is already running on my server and Scott Robison and i have played with it a bit.

One relevant snippet from testing on that server:

The lack of a preview mode makes it tricky to really make full correct use of md, but we knew that going in. We might (might) end up implementing a preview mode which sends the message to the server and sends it rendered right back to you, instead of storing it and broadcasting it. The preview could appear as just another message with a bit of style to make it obvious that it's a preview. Strictly hypothetically. The lack of a preview will first have to bite me a few times before i'm itched enough to implement it.

(10) By Warren Young (wyoung) on 2021-09-19 04:55:54 in reply to 9 [link] [source]

that's only one of us

I taste the Markdown processor and know its flavor. I do not grok it in fullness.

chat messages are rendered when they're sent, not when they're saved

That seems a good thing to me. Every other common source of rendered Markdown does this, including Fossil itself. We should definitely not be storing rendered HTML.

the new rendering applies to all historical messages.

That's, what, 7 days' worth at most? Chat's ephemeral. We'll push the misrenderings out soon enough.

tricky to really make full correct use of md

One of the primary reasons to do this in the first place is that so many other text input boxes in our world — not just Fossil's — use Markdown, so people Fossil normally targets can be expected to compose it reliably.

Chat permits deleting egregious misrenderings, in part so the user can try again.

And for the rest, there's the sandbox.

But yes, a preview mode would be nice, too.

(11) By Stephan Beal (stephan) on 2021-09-19 07:04:17 in reply to 7.1 [link] [source]

I believe this patch fixes the problem, allowing us to use "#1" and such freely:

i think you just solved, perhaps unintentionally, an interesting range of problems for us...

Consider what happens if we teach the markdown parser to wrap all non-header references of #XYZ with something like:

<span data-ref='XYZ'>#XYZ</span>

First, it would have no visual effect on existing docs, as that span is essentially a no-op.

However, with such a bit in place, it becomes possible for page-specific JS to document.querySelectorAll() those and do page-specific stuff:

  • Wiki or tickets or technotes forum: tap to run a search for term XYZ or #XYZ. i.e. we gain a hashtag search/filter with almost no effort.
  • Chat: tap to filter all chat messages on that message ID (if it's an integer) or perform a client-side search for messages which contain that same data-ref value. i.e. all messages marked with that hashtag.
  • ???: ???

🤯

(12) By Stephan Beal (stephan) on 2021-09-19 07:46:49 in reply to 8 [link] [source]

I came up with this test to compare my patch against the latest trunk:

For completeness's sake, the same from my largest md collection:

$ for f in $(find . -name '*.md'); do
  ~/fossil/fossil/fossil.trunk test-markdown-render $f > _a
  f test-markdown-render $f > _b; diff _a _b
done

(no output)

$ find . -name '*.md' | xargs wc | tail -1
 18942 115842 778743 total

i.e. looks good to me. That patch is currently in the chat-markdown branch.

(13) By Stephan Beal (stephan) on 2021-09-25 08:19:48 in reply to 11 [link] [source]

Consider what happens if we teach the markdown parser to wrap all non-header references of #XYZ with something like:

<span data-ref='XYZ'>#XYZ</span>

Mostly @Warren, but for whoever wants to chime in...

Regarding hashtag parsing in markdown (which i think is working reasonably well)... we have two fundamentally different ways of emitting these, both of which make some sort of sense and neither of which currently feels strongly better than the other to me:

#foo ==>

  1. <span data-hashtag='foo'>#foo</span>
  2. <span data-hashtag>#foo</span>

In form (2), we define the convention that the data-hashtag is there simply to mark the element as a tag and the innerText of the element is the tag text itself (including leading hash).

The purist in me calls for (1) but the space-saver in me calls for (2) (while also noting that hashtags make up only a tiny percentage of any non-twitter content).

Are there any significant down-sides of one form compared to the other?

The output currently looks like:

$ ./fossil test-markdown-render '___#tag___ xyz #_not_ #tag_ok #tag__not #tagNOT#NotATag *#ValidTag*' 
<div class="markdown">

<p><strong><em><span data-hashtag="tag">#tag</span></em></strong> xyz #<em>not</em> <span data-hashtag="tag_ok">#tag_ok</span> #tag__not #tagNOT#NotATag <em><span data-hashtag="ValidTag">#ValidTag</span></em></p>

</div>

Noting that i've made some rather arbitrary decisions there, such as (A) requiring at least 2 characters in the hashtag and making two+ adjacent underscores illegal (that's all up for debate, of course).

Not yet implemented is #X.Y for forum post references, but that can easily be added later on once i've convinced myself (or been convinced) that the results make sense in the face of edits and post deletion.

(14) By Scott Robison (sdr) on 2021-09-25 09:48:31 in reply to 13 [link] [source]

My web fu is not strong, but the first form with an attribute value seems superior. Yes, it takes a little more memory somewhere, but the second form looks like a boolean attribute. I know it's not, and the data can be fetched a different way, but it seems better to me to make the attribute explicitly have a value rather than making it implicit. Otherwise it might seem to some people that its implicit value is the name of the attribute itself like checked="checked" or disabled="disabled".

But I don't plan to maintain it or write any skins or do other types of processing that will care, so take my opinion with a grain or two of salt.

(15) By Stephan Beal (stephan) on 2021-09-25 11:23:32 in reply to 14 [link] [source]

My web fu is not strong, but the first form with an attribute value seems superior.

FWIW, that's the impl i'm currently going with. It sits better with me.

... but the second form looks like a boolean attribute

Just FYI, in a way it kind of is. In CSS selectors:

span[data-foo] {...}

will match any SPAN element with data-foo="..." regardless of its value. It's also possible to select with various operators, e.g. "value begins with X".

But I don't plan to maintain it or write any skins or do other types of processing that will care, so take my opinion with a grain or two of salt.

Salt taken, but i think the purist in me is going to win this self-internal debate, and it agrees with you entirely.

i've got this branch currently running on libf's chat if you care to stop by and try it out.

That said: it's still very possible that the hashtags support will not hit trunk depending, at least in part, on whether or not it breaks any/many existing markdown docs. So far, so good, though. My end goal is to enable hashtag search support in the forum, so users can apply their own hashtags to posts and search by simply tapping hashtags in posts. That would require teaching the search index to treat hashtags as words, but that's all doable, i think.

(16) By John Rouillard (rouilj) on 2021-09-25 14:04:23 in reply to 13 [link] [source]

Late to the party, but I would go with option 1 since I can use:

[data-hashtag^=foo]

to style that element to call attention to it while ignoring data-hashtags that start with bar. I don't have a similar method for option 2 without using javascript.

(17) By Warren Young (wyoung) on 2021-09-26 10:42:08 in reply to 13 [link] [source]

Regarding whether to put parsed data redundantly into the data- tags, I vote yes, because it is work done by the Markdown processor, extracting a definitive version of whatever text it noted.

Contrast the text between the span tags, which may be prettied-up in various ways, and not just server-side. This is presentation, not data. Today, we're prepending # or @, but later it might be avatar image tags and who knows what all.

It would be wrong for code to make an innerHTML() call on the span and then try to back those stylings out to recover the text the Markdown processor identified.