Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add output buffering to the (non-legacy) comment printing algorithm, to reduce calls to fossil_print(). The resulting performance improvement can be up to factor 10, with a perceptible difference even for short comments (measured and tested on Windows with MSVC builds, and on Ubuntu with GCC builds). (For comparison: for the legacy comment printing algorithm, the extra UTF-8 checks added by this branch impair performance by 0.12-1.8%, depending on whether the input contains predominantly multi-byte vs. ASCII-only sequences.) |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | comment-formatter-utf8 |
Files: | files | file ages | folders |
SHA1: |
16fde3ff666cf0733102f7a061756c71 |
User & Date: | florian 2018-11-15 12:43:00.000 |
Context
2018-11-15
| ||
15:16 | Fix a problem with initial indent introduced by the previous check-in, so that all regression tests from test/comment.test now succeed. Also eliminate three more calls to fossil_print(). Regarding performance, the legacy comment printing algorithm is outnumbered by factor 2-3, with these changes. ... (check-in: b029ed22 user: florian tags: comment-formatter-utf8) | |
12:43 | Add output buffering to the (non-legacy) comment printing algorithm, to reduce calls to fossil_print(). The resulting performance improvement can be up to factor 10, with a perceptible difference even for short comments (measured and tested on Windows with MSVC builds, and on Ubuntu with GCC builds). (For comparison: for the legacy comment printing algorithm, the extra UTF-8 checks added by this branch impair performance by 0.12-1.8%, depending on whether the input contains predominantly multi-byte vs. ASCII-only sequences.) ... (check-in: 16fde3ff user: florian tags: comment-formatter-utf8) | |
2018-10-17
| ||
14:16 | Modify the comment formatter to avoid output of incomplete UTF-8 sequences, and to avoid line breaks inside UTF-8 sequences. See https://fossil-scm.org/forum/forumpost/1247e4a3c4 for detailed information and tests. ... (check-in: 1bbca2c3 user: florian tags: comment-formatter-utf8) | |
Changes
Changes to src/comformat.c.
︙ | ︙ | |||
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | int trimSpace, /* [in] Non-zero to trim leading/trailing spaces. */ int wordBreak, /* [in] Non-zero to try breaking on word boundaries. */ int origBreak, /* [in] Non-zero to break before original comment. */ int *pLineCnt, /* [in/out] Pointer to the total line count. */ const char **pzLine /* [out] Pointer to the end of the logical line. */ ){ int index = 0, charCnt = 0, lineCnt = 0, maxChars; if( !zLine ) return; if( lineChars<=0 ) return; comment_print_indent(zLine, indent, trimCrLf, trimSpace, &index); maxChars = lineChars; for(;;){ int useChars = 1; char c = zLine[index]; if( c==0 ){ break; }else{ if( origBreak && index>0 ){ const char *zCurrent = &zLine[index]; if( comment_check_orig(zOrigText, zCurrent, &charCnt, &lineCnt) ){ comment_print_indent(zCurrent, origIndent, trimCrLf, trimSpace, &index); maxChars = lineChars; } } index++; } if( c=='\n' ){ lineCnt++; charCnt = 0; useChars = 0; }else if( c=='\t' ){ int nextIndex = comment_next_space(zLine, index); if( nextIndex<=0 || (nextIndex-index)>maxChars ){ break; } charCnt++; useChars = COMMENT_TAB_WIDTH; if( maxChars<useChars ){ | > > > > > > > > > > > > > > | | 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 | int trimSpace, /* [in] Non-zero to trim leading/trailing spaces. */ int wordBreak, /* [in] Non-zero to try breaking on word boundaries. */ int origBreak, /* [in] Non-zero to break before original comment. */ int *pLineCnt, /* [in/out] Pointer to the total line count. */ const char **pzLine /* [out] Pointer to the end of the logical line. */ ){ int index = 0, charCnt = 0, lineCnt = 0, maxChars; char zBuf[400]; int iBuf=0; /* Output buffer and counter. */ if( !zLine ) return; if( lineChars<=0 ) return; comment_print_indent(zLine, indent, trimCrLf, trimSpace, &index); maxChars = lineChars; for(;;){ int useChars = 1; char c = zLine[index]; /* Flush the output buffer if there's no space left for at least one more ** (potentially 4-byte) UTF-8 sequence and a terminating NULL. */ if ( iBuf>sizeof(zBuf)-5 ){ zBuf[iBuf]=0; iBuf=0; fossil_print("%s", zBuf); } if( c==0 ){ break; }else{ if( origBreak && index>0 ){ const char *zCurrent = &zLine[index]; if( comment_check_orig(zOrigText, zCurrent, &charCnt, &lineCnt) ){ /* Flush the output buffer before printing the indentation. */ if ( iBuf>0 ){ zBuf[iBuf]=0; iBuf=0; fossil_print("%s", zBuf); } comment_print_indent(zCurrent, origIndent, trimCrLf, trimSpace, &index); maxChars = lineChars; } } index++; } if( c=='\n' ){ lineCnt++; charCnt = 0; useChars = 0; }else if( c=='\t' ){ int nextIndex = comment_next_space(zLine, index); if( nextIndex<=0 || (nextIndex-index)>maxChars ){ break; } charCnt++; useChars = COMMENT_TAB_WIDTH; if( maxChars<useChars ){ zBuf[iBuf++] = ' '; break; } }else if( wordBreak && fossil_isspace(c) ){ int nextIndex = comment_next_space(zLine, index); if( nextIndex<=0 || (nextIndex-index)>maxChars ){ break; } |
︙ | ︙ | |||
232 233 234 235 236 237 238 | ** inside UTF-8 sequences. Incomplete, ill-formed and overlong sequences are ** kept together. The invalid lead bytes 0xC0 to 0xC1 and 0xF5 to 0xF7 are ** allowed to initiate (ill-formed) 2- and 4-byte sequences, respectively, ** the other invalid lead bytes 0xF8 to 0xFF are treated as invalid 1-byte ** sequences (as lone trail bytes). */ if( (c&0xc0)==0xc0 && zLine[index]!=0 ){ /* Any UTF-8 lead byte 11xxxxxx */ | < | > | < < | | > > > > > > | 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 | ** inside UTF-8 sequences. Incomplete, ill-formed and overlong sequences are ** kept together. The invalid lead bytes 0xC0 to 0xC1 and 0xF5 to 0xF7 are ** allowed to initiate (ill-formed) 2- and 4-byte sequences, respectively, ** the other invalid lead bytes 0xF8 to 0xFF are treated as invalid 1-byte ** sequences (as lone trail bytes). */ if( (c&0xc0)==0xc0 && zLine[index]!=0 ){ /* Any UTF-8 lead byte 11xxxxxx */ int cchUTF8=1; /* Code units consumed. */ int maxUTF8=1; /* Expected sequence length. */ zBuf[iBuf++]=c; if( (c&0xe0)==0xc0 )maxUTF8=2; /* UTF-8 lead byte 110vvvvv */ else if( (c&0xf0)==0xe0 )maxUTF8=3; /* UTF-8 lead byte 1110vvvv */ else if( (c&0xf8)==0xf0 )maxUTF8=4; /* UTF-8 lead byte 11110vvv */ while( cchUTF8<maxUTF8 && (zLine[index]&0xc0)==0x80 ){ /* UTF-8 trail byte 10vvvvvv */ cchUTF8++; zBuf[iBuf++] = zLine[index++]; } } else zBuf[iBuf++] = c; if( (c&0x80)==0 || (zLine[index+1]&0xc0)!=0xc0 ) maxChars -= useChars; if( maxChars<=0 ) break; if( c=='\n' ) break; } if( charCnt>0 ){ zBuf[iBuf++] = '\n'; lineCnt++; } /* Flush the remaining output buffer. */ if ( iBuf>0 ) { zBuf[iBuf]=0; iBuf=0; fossil_print("%s", zBuf); } if( pLineCnt ){ *pLineCnt += lineCnt; } if( pzLine ){ *pzLine = zLine + index; } |
︙ | ︙ |