Fossil

Check-in [b86a2fc7]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix two bugs (introduced with this branch) that become manifest with invalid UTF-8 sequences.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | comment-formatter-utf8
Files: files | file ages | folders
SHA1: b86a2fc7eb209681531ce6df583cdab254a7140b
User & Date: florian 2018-11-24 07:16:00.000
Context
2018-11-24
07:49
Minor optimizations: drop a few redundant comparisons and calculations, and take advantage of the logical AND short-circuit by testing the least expensive and most unlikely condition first. Also fold away the iterative comments into cross references. ... (check-in: 490d38ff user: florian tags: comment-formatter-utf8)
07:16
Fix two bugs (introduced with this branch) that become manifest with invalid UTF-8 sequences. ... (check-in: b86a2fc7 user: florian tags: comment-formatter-utf8)
2018-11-16
19:39
Fix a bug (already present on trunk) with the (non-legacy) comment printing algorithm, detected while running the regression tests from test/comment.test with UTF-8 text: the function to print the indent (modified to a calculate-only function on this branch) was handed a pointer to the current line index and the current line index, thus performing checks at (current line index * 2), causing random increments of the current line index. ... (check-in: 70dd8f74 user: florian tags: comment-formatter-utf8)
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/comformat.c.
169
170
171
172
173
174
175

176
177
178
179
180
181
182
      int maxUTF8=1; /* Expected sequence length. */
      if( (c&0xe0)==0xc0 )maxUTF8=2;          /* UTF-8 lead byte 110vvvvv */
      else if( (c&0xf0)==0xe0 )maxUTF8=3;     /* UTF-8 lead byte 1110vvvv */
      else if( (c&0xf8)==0xf0 )maxUTF8=4;     /* UTF-8 lead byte 11110vvv */
      while( i<lengthBytes-1 &&
              cchUTF8<maxUTF8 &&
              (zString[i+1]&0xc0)==0x80 ){    /* UTF-8 trail byte 10vvvvvv */

        i++;
      }
    }
  }
  return lengthUTF8;
}








>







169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
      int maxUTF8=1; /* Expected sequence length. */
      if( (c&0xe0)==0xc0 )maxUTF8=2;          /* UTF-8 lead byte 110vvvvv */
      else if( (c&0xf0)==0xe0 )maxUTF8=3;     /* UTF-8 lead byte 1110vvvv */
      else if( (c&0xf8)==0xf0 )maxUTF8=4;     /* UTF-8 lead byte 11110vvv */
      while( i<lengthBytes-1 &&
              cchUTF8<maxUTF8 &&
              (zString[i+1]&0xc0)==0x80 ){    /* UTF-8 trail byte 10vvvvvv */
        cchUTF8++;
        i++;
      }
    }
  }
  return lengthUTF8;
}

402
403
404
405
406
407
408

409
410
411
412
413
414
415
        int maxUTF8=1; /* Expected sequence length. */
        if( (c&0xe0)==0xc0 )maxUTF8=2;        /* UTF-8 lead byte 110vvvvv */
        else if( (c&0xf0)==0xe0 )maxUTF8=3;   /* UTF-8 lead byte 1110vvvv */
        else if( (c&0xf8)==0xf0 )maxUTF8=4;   /* UTF-8 lead byte 11110vvv */
        zBuf[k++] = c;
        while( cchUTF8<maxUTF8 &&
                (zText[i+1]&0xc0)==0x80 ){    /* UTF-8 trail byte 10vvvvvv */

          zBuf[k++] = zText[++i];
        }
      }
      else if( fossil_isspace(c) ){
        si = i;
        sk = k;
        if( k==0 || zBuf[k-1]!=' ' ){







>







403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
        int maxUTF8=1; /* Expected sequence length. */
        if( (c&0xe0)==0xc0 )maxUTF8=2;        /* UTF-8 lead byte 110vvvvv */
        else if( (c&0xf0)==0xe0 )maxUTF8=3;   /* UTF-8 lead byte 1110vvvv */
        else if( (c&0xf8)==0xf0 )maxUTF8=4;   /* UTF-8 lead byte 11110vvv */
        zBuf[k++] = c;
        while( cchUTF8<maxUTF8 &&
                (zText[i+1]&0xc0)==0x80 ){    /* UTF-8 trail byte 10vvvvvv */
          cchUTF8++;
          zBuf[k++] = zText[++i];
        }
      }
      else if( fossil_isspace(c) ){
        si = i;
        sk = k;
        if( k==0 || zBuf[k-1]!=' ' ){