Fossil

Check-in [cc9c422d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor stylistic changes to the comment formatter.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | comment-formatter-utf8
Files: files | file ages | folders
SHA3-256: cc9c422d8325bb0af5c3cf533f7a4cc1137094c444441757bf9ed59229e043f9
User & Date: drh 2018-11-28 23:43:36
Context
2018-11-29
11:09
Improvements to the command-line comment formatter so that it works better with non-ASCII characters. check-in: 1c84a0c1 user: drh tags: trunk
2018-11-28
23:43
Minor stylistic changes to the comment formatter. Closed-Leaf check-in: cc9c422d user: drh tags: comment-formatter-utf8
2018-11-24
07:49
Minor optimizations: drop a few redundant comparisons and calculations, and take advantage of the logical AND short-circuit by testing the least expensive and most unlikely condition first. Also fold away the iterative comments into cross references. check-in: 490d38ff user: florian tags: comment-formatter-utf8
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/comformat.c.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
150
151
152
153
154
155
156
157
158


159
160
161
162
163
164
165
166
167
168
169
170
171
...
225
226
227
228
229
230
231

232
233

234
235
236
237
238
239

240
241

242
243
244
245
246
247
248
/*
** Copyright (c) 2007 D. Richard Hipp
**
** This program is free software; you can redistribute it and/or
** modify it under the terms of the Simplified BSD License (also
** known as the "2-Clause License" or "FreeBSD License".)

** This program is distributed in the hope that it will be useful,
** but without any warranty; without even the implied warranty of
** merchantability or fitness for a particular purpose.
**
** Author contact information:
**   drh@hwaci.com
**   http://www.hwaci.com/drh/
................................................................................
** overlong sequences are counted as one sequence. The invalid lead bytes 0xC0
** to 0xC1 and 0xF5 to 0xF7 are allowed to initiate (ill-formed) 2- and 4-byte
** sequences, respectively, the other invalid lead bytes 0xF8 to 0xFF are
** treated as invalid 1-byte sequences (as lone trail bytes).
** Combining characters and East Asian Wide and Fullwidth characters are counted
** as one, so this function does not calculate the effective "display width".
*/
int strlen_utf8(const char *zString, int lengthBytes)
{


#if 0
  assert( lengthBytes>=0 );
#endif
  int i;          /* Counted bytes. */
  int lengthUTF8; /* Counted UTF-8 sequences. */
  for( i=0, lengthUTF8=0; i<lengthBytes; i++, lengthUTF8++ ){
    char c = zString[i];
    int cchUTF8=1; /* Code units consumed. */
    int maxUTF8=1; /* Expected sequence length. */
    if( (c&0xe0)==0xc0 )maxUTF8=2;          /* UTF-8 lead byte 110vvvvv */
    else if( (c&0xf0)==0xe0 )maxUTF8=3;     /* UTF-8 lead byte 1110vvvv */
    else if( (c&0xf8)==0xf0 )maxUTF8=4;     /* UTF-8 lead byte 11110vvv */
    while( cchUTF8<maxUTF8 &&
................................................................................
  int cchUTF8, maxUTF8;       /* Helper variables to count UTF-8 sequences. */
  if( !zLine ) return;
  if( lineChars<=0 ) return;
#if 0
  assert( indent<sizeof(zBuf)-5 );       /* See following comments to explain */
  assert( origIndent<sizeof(zBuf)-5 );   /* these limits. */
#endif

  if( indent>sizeof(zBuf)-6 )   /* Limit initial indent to fit output buffer. */
    indent = sizeof(zBuf)-6;

  comment_calc_indent(zLine, indent, trimCrLf, trimSpace, &index);
  if( indent>0 ){
    for( i=0; i<indent; i++ ){
      zBuf[iBuf++] = ' ';
    }
  }

  if( origIndent>sizeof(zBuf)-6 )  /* Limit line indent to fit output buffer. */
    origIndent = sizeof(zBuf)-6;

  maxChars = lineChars;
  for(;;){
    int useChars = 1;
    char c = zLine[index];
    /* Flush the output buffer if there's no space left for at least one more
    ** (potentially 4-byte) UTF-8 sequence, one level of indentation spaces,
    ** a new line, and a terminating NULL. */






|







 







|
<
>
>



<
<
|







 







>
|

>


|



>
|

>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
...
150
151
152
153
154
155
156
157

158
159
160
161
162


163
164
165
166
167
168
169
170
...
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
/*
** Copyright (c) 2007 D. Richard Hipp
**
** This program is free software; you can redistribute it and/or
** modify it under the terms of the Simplified BSD License (also
** known as the "2-Clause License" or "FreeBSD License".)
**
** This program is distributed in the hope that it will be useful,
** but without any warranty; without even the implied warranty of
** merchantability or fitness for a particular purpose.
**
** Author contact information:
**   drh@hwaci.com
**   http://www.hwaci.com/drh/
................................................................................
** overlong sequences are counted as one sequence. The invalid lead bytes 0xC0
** to 0xC1 and 0xF5 to 0xF7 are allowed to initiate (ill-formed) 2- and 4-byte
** sequences, respectively, the other invalid lead bytes 0xF8 to 0xFF are
** treated as invalid 1-byte sequences (as lone trail bytes).
** Combining characters and East Asian Wide and Fullwidth characters are counted
** as one, so this function does not calculate the effective "display width".
*/
int strlen_utf8(const char *zString, int lengthBytes){

  int i;          /* Counted bytes. */
  int lengthUTF8; /* Counted UTF-8 sequences. */
#if 0
  assert( lengthBytes>=0 );
#endif


  for(i=0, lengthUTF8=0; i<lengthBytes; i++, lengthUTF8++){
    char c = zString[i];
    int cchUTF8=1; /* Code units consumed. */
    int maxUTF8=1; /* Expected sequence length. */
    if( (c&0xe0)==0xc0 )maxUTF8=2;          /* UTF-8 lead byte 110vvvvv */
    else if( (c&0xf0)==0xe0 )maxUTF8=3;     /* UTF-8 lead byte 1110vvvv */
    else if( (c&0xf8)==0xf0 )maxUTF8=4;     /* UTF-8 lead byte 11110vvv */
    while( cchUTF8<maxUTF8 &&
................................................................................
  int cchUTF8, maxUTF8;       /* Helper variables to count UTF-8 sequences. */
  if( !zLine ) return;
  if( lineChars<=0 ) return;
#if 0
  assert( indent<sizeof(zBuf)-5 );       /* See following comments to explain */
  assert( origIndent<sizeof(zBuf)-5 );   /* these limits. */
#endif
  if( indent>sizeof(zBuf)-6 ){
    /* Limit initial indent to fit output buffer. */
    indent = sizeof(zBuf)-6;
  }
  comment_calc_indent(zLine, indent, trimCrLf, trimSpace, &index);
  if( indent>0 ){
    for(i=0; i<indent; i++){
      zBuf[iBuf++] = ' ';
    }
  }
  if( origIndent>sizeof(zBuf)-6 ){
    /* Limit line indent to fit output buffer. */
    origIndent = sizeof(zBuf)-6;
  }
  maxChars = lineChars;
  for(;;){
    int useChars = 1;
    char c = zLine[index];
    /* Flush the output buffer if there's no space left for at least one more
    ** (potentially 4-byte) UTF-8 sequence, one level of indentation spaces,
    ** a new line, and a terminating NULL. */