Fossil Forum

Misformatted search output

Misformatted search output

Misformatted search output

(1) By Richard Hipp (drh) on 2020-04-06 11:52:57 [link] [source]

Bug report:

The search output embeds <mark> tags to highlight the search terms within each snippet. But sometimes these <mark> tags are escaped. Examples:

The escaping of <mark> seems to only occur when the search term contains an embedded underscore.

(2) By george on 2021-06-28 17:57:05 in reply to 1 [link] [source]

This highlighting/marking uses fossil_isalnum() to validate a marker. This works correctly only for the very trivial cases.

IMHO, for the purpose concerned fossil_isalnum() is an unfortunate oversimplification: it spoils way too much of the valid search terms. The lack of support for underscores and Unicode letters (i.e. non-English alphabets) are my primary pain points here.

What is the rationale for such a strict criteria? What about changing it?
Is there a suitable Unicode-aware isalnum() within Fossil's or SQLite's code base?
Or maybe simply accept everything except '<' and '>'? Would it be secure enough or not?

Somewhat related: Can fossil grep handle non-ASCII characters?

(3) By george on 2021-06-30 19:03:28 in reply to 2 [link] [source]

There are some functions in the src/sqlite3.c that distinguish alphanumeric Unicode code points:

** Return true if, for the purposes of tokenization, codepoint iCode is
** considered a token character (not a separator).
static int unicodeIsAlnum(unicode_tokenizer *p, int iCode){
  assert( (sqlite3FtsUnicodeIsalnum(iCode) & 0xFFFFFFFE)==0 );
  return sqlite3FtsUnicodeIsalnum(iCode) ^ unicodeIsException(p, iCode);

This function is not exported. Also it requires tokenization of a string.

I suggest that line 982 of src/search.c be changed

  while( fossil_isalnum(z[n]) ) n++;


  while( z[n]!=0 && z[n]!='<' ) n++;

It seems that it solves the issue. Does anybody see how the proposed change can lead to a vulnerability? I don't.

Richard, what do you think?

(4) By Richard Hipp (drh) on 2021-06-30 19:17:14 in reply to 3 [source]

I'm not comfortable with that change until I have done a detailed analysis, which will take time, and hence will not happen right away. In the meantime, you are welcome to experiment with it on your own provide clones. Just don't push the change.

(5) By george on 2021-07-02 00:37:46 in reply to 4 [link] [source]

A live demo of the patched version can be seen at
I just reused a repository from another topic.

(6) By Alan Bram (flyboy) on 2021-12-07 23:34:33 in reply to 4 [link] [source]

I just stumbled onto this bug in my own repo. Is there any chance of the proposed fix getting some progress? (May I "vote up" my interest in this?)

BTW, the way I got to this was that I was looking for a way to decorate tickets with "tags." And since I didn't see any explicit feature, I thought I could fake it by simply adding a few words like "tag_something" to the text of the ticket. The search part seems to work, of course; it's just the snippet highlighting that's affected.