Fossil Forum

fossil markdown - pipe character interpreted in "bacticked" string
Login

fossil markdown - pipe character interpreted in "bacticked" string

fossil markdown - pipe character interpreted in "bacticked" string

(1) By anonymous on 2019-11-15 10:04:05 [link] [source]

Hello,

It seemes that pipe character `|` in "backticked" string is processed in
fossil mardown implementation as a table separator. You can see it in 
second and third copy of this paragraph. (First one is intended with 
four spaces so it is a literal block).

It seemes that pipe character | in "backticked" string is processed in fossil mardown implementation as a table separator. You can see it in second and third copy of this paragraph. (First one is intended with four spaces so it is a literal block).

It seemes that pipe character | in "backticked" string is processed in fossil mardown implementation as a table separator. You can see it in second and third copy of this paragraph. (First one is intended with four spaces so it is a literal block).

Other "special characters" can be bacticked just fine.

[ this is [ OK

( this is ( OK

* this is * OK

| this is | it is bold as a table header

I run into this while trying to backtick command line example with pipe in it. Is this intentional? Otherwise could it be fixed?

Thanks.

(2.1) By aitap on 2020-04-15 15:37:56 edited from 2.0 in reply to 1 [link] [source]

Ran into this while trying to post a reply on the SQLite forum; had to replace pipe characters with + in the code blocks to get hopefully equivalent, but semantically incorrect code. Alternatively, I could escape the pipe characters with a backslash, but then the backslash would not be consumed and would remain visible in the code block:

The standard C library allocator is a complicated thing; it requests big memory chunks by running `mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)`

The standard C library allocator is a complicated thing; it requests big memory chunks by running mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)

The standard C library allocator is a complicated thing; it requests big memory chunks by running `mmap(NULL, length, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0)`

The standard C library allocator is a complicated thing; it requests big memory chunks by running mmap(NULL, length, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS, -1, 0)

I think that is_tableline should be modified to identify codespans (including multi-backtick codespans) and completely overlook whatever happens inside them, like find_emph_char does. The alternative path, allowing backslash-escapes inside codespans, seems against the spirit of Markdown.

I would ask what the spec says, but with Markdown there are probably multiple specs with different opinions on the topic, none of them fully implemented.

(3.1) By aitap on 2020-04-15 20:00:25 edited from 3.0 in reply to 2.1 [link] [source]

Here is a patch:

Index: src/markdown.c
==================================================================
--- src/markdown.c
+++ src/markdown.c
@@ -466,10 +466,34 @@
       end = i;
     }
   }
 }
 
+
+/*
+ * skip_codespan:
+ * looks for the end of the next code span if it is valid,
+ * or the end of the backtick group, if not
+ */
+static size_t skip_codespan(char *data, size_t size) {
+  size_t i = 0, span_nb = 0, bt;
+  /* counting the number of opening backticks */
+  while( i<size && data[i]=='`' ){
+    i++;
+    span_nb++;
+  }
+  if( i>=size ) return span_nb;
+
+  /* finding the matching closing sequence */
+  bt = 0;
+  while( i<size && bt<span_nb ){
+    if( data[i]=='`' ) bt += 1; else bt = 0;
+    i++;
+  }
+  return bt == span_nb ? i : span_nb;
+}
+
 
 /* find_emph_char -- looks for the next emph char, skipping other constructs */
 static size_t find_emph_char(char *data, size_t size, char c){
   size_t i = 1;
 
@@ -483,32 +507,13 @@
       continue;
     }
 
     if( data[i]==c ) return i;
 
-    /* skipping a code span */
+    /* skipping a code span or a backtick cluster */
     if( data[i]=='`' ){
-      size_t span_nb = 0, bt;
-      size_t tmp_i = 0;
-
-      /* counting the number of opening backticks */
-      while( i<size && data[i]=='`' ){
-        i++;
-        span_nb++;
-      }
-      if( i>=size ) return 0;
-
-      /* finding the matching closing sequence */
-      bt = 0;
-      while( i<size && bt<span_nb ){
-        if( !tmp_i && data[i]==c ) tmp_i = i;
-        if( data[i]=='`' ) bt += 1; else bt = 0;
-        i++;
-      }
-      if( i>=size ) return tmp_i;
-      i++;
-
+      i += skip_codespan(data + i, size - i);
     /* skipping a link */
     }else if( data[i]=='[' ){
       size_t tmp_i = 0;
       char cc;
       i++;
@@ -1172,12 +1177,16 @@
 
   /* check for initial '|' */
   if( i<size && data[i]=='|') outer_sep++;
 
   /* count the number of pipes in the line */
-  for(n_sep=0; i<size && data[i]!='\n'; i++){
+  n_sep = 0;
+  while( i<size && data[i]!='\n' ){
     if( is_table_sep(data, i) ) n_sep++;
+    /* skip code spans while doing so */
+    if( data[i] == '`' ) i += skip_codespan(data + i, size - i);
+    else i++;
   }
 
   /* march back to check for optional last '|' before blanks and EOL */
   while( i && (data[i-1]==' ' || data[i-1]=='\t' || data[i-1]=='\n') ){ i--; }
   if( i && is_table_sep(data, i-1) ) outer_sep += 1;
@@ -1834,11 +1843,14 @@
     /* skip blanks */
     while( i<size && (data[i]==' ' || data[i]=='\t') ){ i++; }
     beg = i;
 
     /* forward to the next separator or EOL */
-    while( i<size && !is_table_sep(data, i) && data[i]!='\n' ){ i++; }
+    while( i<size && !is_table_sep(data, i) && data[i]!='\n' ){
+        if (data[i] == '`') i += skip_codespan(data + i, size - i);
+        else i++;
+    }
     end = i;
     if( i<size ){
       i++;
       if( data[i-1]=='\n' ) total = i;
     }

After applying the patch, I can see that feeding the sources of the forum posts mentioned in this thread to ./fossil test-markdown-render produces the desired result (pipe characters are ignored in code spans). Renders of https://fossil-scm.org/forum/md_rules look identical before and after the patch, too. I am having unrelated problems with make test -- it seems to produce same-looking failures before and after the patch.

Feedback would be welcome; there probably are corner cases I have overlooked.

(4) By PF (peter) on 2020-04-15 20:46:28 in reply to 3.1 [link] [source]

Hello aitap, OP here.

Patched version of fossil is working for me nicely with both my original example above as well as your SQLite forum case. Thank you for the patch.

(5.1) By Stephan Beal (stephan) on 2020-05-26 16:46:15 edited from 5.0 in reply to 3.1 [source]

@aitap: i would be really grateful if you would send Richard a contributor's agreement so that we could get you set up with an account and get this checked in. i keep stumbling across this backtick bug. Most recently, this line is triggering it, but only in some contexts, not all:

After experimenting with add|rm|addremove --reset, i prefer that approach because:

(but it's not triggered when it's the last line of a post)

(Edit: Richard fixed this - see /forumpost/5f62eda0ec)

(6) By aitap on 2020-04-24 16:32:59 in reply to 5.0 [link] [source]

i would be really grateful if you would send Richard a contributor's agreement so that we could get you set up with an account and get this checked in

I have printed and filled the form and intend to mail it on Monday during business hours of the post office. Should I e-mail my patch right away (possibly with a scan of the form), on Monday (among with the tracking number for the letter) or some time later (after the letter is delivered)?

(7) By Stephan Beal (stephan) on 2020-04-24 18:43:28 in reply to 6 [link] [source]

I have printed and filled the form and intend to mail it on Monday

Fantastic, i'm glad to hear it! There's no hurry, i just don't want this fix to get lost, as it's a really annoying, and seemingly random, bug. When your form arrives Richard will set you up or ask me to, then you can check it in.