Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merged from trunk to verify fix in [62352847].
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | rkb-2.0-tests
Files: files | file ages | folders
SHA3-256:4077357a38ddb2f17c466960c473fb2e1e1408d10f774b28ef0104519fa4f580
User & Date: rberteig 2017-03-13 21:53:44
Context
2017-03-13
22:24
Merged from trunk for testing before pushing back to trunk. Closed-Leaf check-in: f955c632 user: rberteig tags: rkb-2.0-tests
21:53
Merged from trunk to verify fix in [62352847]. check-in: 4077357a user: rberteig tags: rkb-2.0-tests
2017-03-08
20:35
Fix the "fossil wiki export --technote" command so that it allows hash abbreviations again, as it should. check-in: 62352847 user: drh tags: trunk
20:05
Fixed isses in the JSON and wiki test cases exposed by regressions in fossil-2.0. check-in: 5ee57d84 user: rberteig tags: rkb-2.0-tests
Changes

Changes to VERSION.

1
2.0
|
1
2.1

Deleted compat/zlib/doc/algorithm.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
1. Compression algorithm (deflate)

The deflation algorithm used by gzip (also zip and zlib) is a variation of
LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
the input data.  The second occurrence of a string is replaced by a
pointer to the previous string, in the form of a pair (distance,
length).  Distances are limited to 32K bytes, and lengths are limited
to 258 bytes. When a string does not occur anywhere in the previous
32K bytes, it is emitted as a sequence of literal bytes.  (In this
description, `string' must be taken as an arbitrary sequence of bytes,
and is not restricted to printable characters.)

Literals or match lengths are compressed with one Huffman tree, and
match distances are compressed with another tree. The trees are stored
in a compact form at the start of each block. The blocks can have any
size (except that the compressed data for one block must fit in
available memory). A block is terminated when deflate() determines that
it would be useful to start another block with fresh trees. (This is
somewhat similar to the behavior of LZW-based _compress_.)

Duplicated strings are found using a hash table. All input strings of
length 3 are inserted in the hash table. A hash index is computed for
the next 3 bytes. If the hash chain for this index is not empty, all
strings in the chain are compared with the current input string, and
the longest match is selected.

The hash chains are searched starting with the most recent strings, to
favor small distances and thus take advantage of the Huffman encoding.
The hash chains are singly linked. There are no deletions from the
hash chains, the algorithm simply discards matches that are too old.

To avoid a worst-case situation, very long hash chains are arbitrarily
truncated at a certain length, determined by a runtime option (level
parameter of deflateInit). So deflate() does not always find the longest
possible match but generally finds a match which is long enough.

deflate() also defers the selection of matches with a lazy evaluation
mechanism. After a match of length N has been found, deflate() searches for
a longer match at the next input byte. If a longer match is found, the
previous match is truncated to a length of one (thus producing a single
literal byte) and the process of lazy evaluation begins again. Otherwise,
the original match is kept, and the next match search is attempted only N
steps later.

The lazy match evaluation is also subject to a runtime parameter. If
the current match is long enough, deflate() reduces the search for a longer
match, thus speeding up the whole process. If compression ratio is more
important than speed, deflate() attempts a complete second search even if
the first match is already long enough.

The lazy match evaluation is not performed for the fastest compression
modes (level parameter 1 to 3). For these fast modes, new strings
are inserted in the hash table only when no match was found, or
when the match is not too long. This degrades the compression ratio
but saves time since there are both fewer insertions and fewer searches.


2. Decompression algorithm (inflate)

2.1 Introduction

The key question is how to represent a Huffman code (or any prefix code) so
that you can decode fast.  The most important characteristic is that shorter
codes are much more common than longer codes, so pay attention to decoding the
short codes fast, and let the long codes take longer to decode.

inflate() sets up a first level table that covers some number of bits of
input less than the length of longest code.  It gets that many bits from the
stream, and looks it up in the table.  The table will tell if the next
code is that many bits or less and how many, and if it is, it will tell
the value, else it will point to the next level table for which inflate()
grabs more bits and tries to decode a longer code.

How many bits to make the first lookup is a tradeoff between the time it
takes to decode and the time it takes to build the table.  If building the
table took no time (and if you had infinite memory), then there would only
be a first level table to cover all the way to the longest code.  However,
building the table ends up taking a lot longer for more bits since short
codes are replicated many times in such a table.  What inflate() does is
simply to make the number of bits in the first table a variable, and  then
to set that variable for the maximum speed.

For inflate, which has 286 possible codes for the literal/length tree, the size
of the first table is nine bits.  Also the distance trees have 30 possible
values, and the size of the first table is six bits.  Note that for each of
those cases, the table ended up one bit longer than the ``average'' code
length, i.e. the code length of an approximately flat code which would be a
little more than eight bits for 286 symbols and a little less than five bits
for 30 symbols.


2.2 More details on the inflate table lookup

Ok, you want to know what this cleverly obfuscated inflate tree actually
looks like.  You are correct that it's not a Huffman tree.  It is simply a
lookup table for the first, let's say, nine bits of a Huffman symbol.  The
symbol could be as short as one bit or as long as 15 bits.  If a particular
symbol is shorter than nine bits, then that symbol's translation is duplicated
in all those entries that start with that symbol's bits.  For example, if the
symbol is four bits, then it's duplicated 32 times in a nine-bit table.  If a
symbol is nine bits long, it appears in the table once.

If the symbol is longer than nine bits, then that entry in the table points
to another similar table for the remaining bits.  Again, there are duplicated
entries as needed.  The idea is that most of the time the symbol will be short
and there will only be one table look up.  (That's whole idea behind data
compression in the first place.)  For the less frequent long symbols, there
will be two lookups.  If you had a compression method with really long
symbols, you could have as many levels of lookups as is efficient.  For
inflate, two is enough.

So a table entry either points to another table (in which case nine bits in
the above example are gobbled), or it contains the translation for the symbol
and the number of bits to gobble.  Then you start again with the next
ungobbled bit.

You may wonder: why not just have one lookup table for how ever many bits the
longest symbol is?  The reason is that if you do that, you end up spending
more time filling in duplicate symbol entries than you do actually decoding.
At least for deflate's output that generates new trees every several 10's of
kbytes.  You can imagine that filling in a 2^15 entry table for a 15-bit code
would take too long if you're only decoding several thousand symbols.  At the
other extreme, you could make a new table for every bit in the code.  In fact,
that's essentially a Huffman tree.  But then you spend too much time
traversing the tree while decoding, even for short symbols.

So the number of bits for the first lookup table is a trade of the time to
fill out the table vs. the time spent looking at the second level and above of
the table.

Here is an example, scaled down:

The code being decoded, with 10 symbols, from 1 to 6 bits long:

A: 0
B: 10
C: 1100
D: 11010
E: 11011
F: 11100
G: 11101
H: 11110
I: 111110
J: 111111

Let's make the first table three bits long (eight entries):

000: A,1
001: A,1
010: A,1
011: A,1
100: B,2
101: B,2
110: -> table X (gobble 3 bits)
111: -> table Y (gobble 3 bits)

Each entry is what the bits decode as and how many bits that is, i.e. how
many bits to gobble.  Or the entry points to another table, with the number of
bits to gobble implicit in the size of the table.

Table X is two bits long since the longest code starting with 110 is five bits
long:

00: C,1
01: C,1
10: D,2
11: E,2

Table Y is three bits long since the longest code starting with 111 is six
bits long:

000: F,2
001: F,2
010: G,2
011: G,2
100: H,2
101: H,2
110: I,3
111: J,3

So what we have here are three tables with a total of 20 entries that had to
be constructed.  That's compared to 64 entries for a single table.  Or
compared to 16 entries for a Huffman tree (six two entry tables and one four
entry table).  Assuming that the code ideally represents the probability of
the symbols, it takes on the average 1.25 lookups per symbol.  That's compared
to one lookup for the single table, or 1.66 lookups per symbol for the
Huffman tree.

There, I think that gives you a picture of what's going on.  For inflate, the
meaning of a particular symbol is often more than just a letter.  It can be a
byte (a "literal"), or it can be either a length or a distance which
indicates a base value and a number of bits to fetch after the code that is
added to the base value.  Or it might be the special end-of-block code.  The
data structures created in inftrees.c try to encode all that information
compactly in the tables.


Jean-loup Gailly        Mark Adler
jloup@gzip.org          madler@alumni.caltech.edu


References:

[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
pp. 337-343.

``DEFLATE Compressed Data Format Specification'' available in
http://tools.ietf.org/html/rfc1951
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<


































































































































































































































































































































































































































Deleted compat/zlib/doc/rfc1950.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619






Network Working Group                                         P. Deutsch
Request for Comments: 1950                           Aladdin Enterprises
Category: Informational                                      J-L. Gailly
                                                                Info-ZIP
                                                                May 1996


         ZLIB Compressed Data Format Specification version 3.3

Status of This Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   The IESG takes no position on the validity of any Intellectual
   Property Rights statements contained in this document.

Notices

   Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly

   Permission is granted to copy and distribute this document for any
   purpose and without charge, including translations into other
   languages and incorporation into compilations, provided that the
   copyright notice and this notice are preserved, and that any
   substantive changes or deletions from the original are clearly
   marked.

   A pointer to the latest version of this and related documentation in
   HTML format can be found at the URL
   <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.

Abstract

   This specification defines a lossless compressed data format.  The
   data can be produced or consumed, even for an arbitrarily long
   sequentially presented input data stream, using only an a priori
   bounded amount of intermediate storage.  The format presently uses
   the DEFLATE compression method but can be easily extended to use
   other compression methods.  It can be implemented readily in a manner
   not covered by patents.  This specification also defines the ADLER-32
   checksum (an extension and improvement of the Fletcher checksum),
   used for detection of data corruption, and provides an algorithm for
   computing it.




Deutsch & Gailly             Informational                      [Page 1]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


Table of Contents

   1. Introduction ................................................... 2
      1.1. Purpose ................................................... 2
      1.2. Intended audience ......................................... 3
      1.3. Scope ..................................................... 3
      1.4. Compliance ................................................ 3
      1.5.  Definitions of terms and conventions used ................ 3
      1.6. Changes from previous versions ............................ 3
   2. Detailed specification ......................................... 3
      2.1. Overall conventions ....................................... 3
      2.2. Data format ............................................... 4
      2.3. Compliance ................................................ 7
   3. References ..................................................... 7
   4. Source code .................................................... 8
   5. Security Considerations ........................................ 8
   6. Acknowledgements ............................................... 8
   7. Authors' Addresses ............................................. 8
   8. Appendix: Rationale ............................................ 9
   9. Appendix: Sample code ..........................................10

1. Introduction

   1.1. Purpose

      The purpose of this specification is to define a lossless
      compressed data format that:

          * Is independent of CPU type, operating system, file system,
            and character set, and hence can be used for interchange;

          * Can be produced or consumed, even for an arbitrarily long
            sequentially presented input data stream, using only an a
            priori bounded amount of intermediate storage, and hence can
            be used in data communications or similar structures such as
            Unix filters;

          * Can use a number of different compression methods;

          * Can be implemented readily in a manner not covered by
            patents, and hence can be practiced freely.

      The data format defined by this specification does not attempt to
      allow random access to compressed data.







Deutsch & Gailly             Informational                      [Page 2]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


   1.2. Intended audience

      This specification is intended for use by implementors of software
      to compress data into zlib format and/or decompress data from zlib
      format.

      The text of the specification assumes a basic background in
      programming at the level of bits and other primitive data
      representations.

   1.3. Scope

      The specification specifies a compressed data format that can be
      used for in-memory compression of a sequence of arbitrary bytes.

   1.4. Compliance

      Unless otherwise indicated below, a compliant decompressor must be
      able to accept and decompress any data set that conforms to all
      the specifications presented here; a compliant compressor must
      produce data sets that conform to all the specifications presented
      here.

   1.5.  Definitions of terms and conventions used

      byte: 8 bits stored or transmitted as a unit (same as an octet).
      (For this specification, a byte is exactly 8 bits, even on
      machines which store a character on a number of bits different
      from 8.) See below, for the numbering of bits within a byte.

   1.6. Changes from previous versions

      Version 3.1 was the first public release of this specification.
      In version 3.2, some terminology was changed and the Adler-32
      sample code was rewritten for clarity.  In version 3.3, the
      support for a preset dictionary was introduced, and the
      specification was converted to RFC style.

2. Detailed specification

   2.1. Overall conventions

      In the diagrams below, a box like this:

         +---+
         |   | <-- the vertical bars might be missing
         +---+




Deutsch & Gailly             Informational                      [Page 3]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


      represents one byte; a box like this:

         +==============+
         |              |
         +==============+

      represents a variable number of bytes.

      Bytes stored within a computer do not have a "bit order", since
      they are always treated as a unit.  However, a byte considered as
      an integer between 0 and 255 does have a most- and least-
      significant bit, and since we write numbers with the most-
      significant digit on the left, we also write bytes with the most-
      significant bit on the left.  In the diagrams below, we number the
      bits of a byte so that bit 0 is the least-significant bit, i.e.,
      the bits are numbered:

         +--------+
         |76543210|
         +--------+

      Within a computer, a number may occupy multiple bytes.  All
      multi-byte numbers in the format described here are stored with
      the MOST-significant byte first (at the lower memory address).
      For example, the decimal number 520 is stored as:

             0     1
         +--------+--------+
         |00000010|00001000|
         +--------+--------+
          ^        ^
          |        |
          |        + less significant byte = 8
          + more significant byte = 2 x 256

   2.2. Data format

      A zlib stream has the following structure:

           0   1
         +---+---+
         |CMF|FLG|   (more-->)
         +---+---+








Deutsch & Gailly             Informational                      [Page 4]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


      (if FLG.FDICT set)

           0   1   2   3
         +---+---+---+---+
         |     DICTID    |   (more-->)
         +---+---+---+---+

         +=====================+---+---+---+---+
         |...compressed data...|    ADLER32    |
         +=====================+---+---+---+---+

      Any data which may appear after ADLER32 are not part of the zlib
      stream.

      CMF (Compression Method and flags)
         This byte is divided into a 4-bit compression method and a 4-
         bit information field depending on the compression method.

            bits 0 to 3  CM     Compression method
            bits 4 to 7  CINFO  Compression info

      CM (Compression method)
         This identifies the compression method used in the file. CM = 8
         denotes the "deflate" compression method with a window size up
         to 32K.  This is the method used by gzip and PNG (see
         references [1] and [2] in Chapter 3, below, for the reference
         documents).  CM = 15 is reserved.  It might be used in a future
         version of this specification to indicate the presence of an
         extra field before the compressed data.

      CINFO (Compression info)
         For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
         size, minus eight (CINFO=7 indicates a 32K window size). Values
         of CINFO above 7 are not allowed in this version of the
         specification.  CINFO is not defined in this specification for
         CM not equal to 8.

      FLG (FLaGs)
         This flag byte is divided as follows:

            bits 0 to 4  FCHECK  (check bits for CMF and FLG)
            bit  5       FDICT   (preset dictionary)
            bits 6 to 7  FLEVEL  (compression level)

         The FCHECK value must be such that CMF and FLG, when viewed as
         a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
         is a multiple of 31.




Deutsch & Gailly             Informational                      [Page 5]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


      FDICT (Preset dictionary)
         If FDICT is set, a DICT dictionary identifier is present
         immediately after the FLG byte. The dictionary is a sequence of
         bytes which are initially fed to the compressor without
         producing any compressed output. DICT is the Adler-32 checksum
         of this sequence of bytes (see the definition of ADLER32
         below).  The decompressor can use this identifier to determine
         which dictionary has been used by the compressor.

      FLEVEL (Compression level)
         These flags are available for use by specific compression
         methods.  The "deflate" method (CM = 8) sets these flags as
         follows:

            0 - compressor used fastest algorithm
            1 - compressor used fast algorithm
            2 - compressor used default algorithm
            3 - compressor used maximum compression, slowest algorithm

         The information in FLEVEL is not needed for decompression; it
         is there to indicate if recompression might be worthwhile.

      compressed data
         For compression method 8, the compressed data is stored in the
         deflate compressed data format as described in the document
         "DEFLATE Compressed Data Format Specification" by L. Peter
         Deutsch. (See reference [3] in Chapter 3, below)

         Other compressed data formats are not specified in this version
         of the zlib specification.

      ADLER32 (Adler-32 checksum)
         This contains a checksum value of the uncompressed data
         (excluding any dictionary data) computed according to Adler-32
         algorithm. This algorithm is a 32-bit extension and improvement
         of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
         standard. See references [4] and [5] in Chapter 3, below)

         Adler-32 is composed of two sums accumulated per byte: s1 is
         the sum of all bytes, s2 is the sum of all s1 values. Both sums
         are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
         Adler-32 checksum is stored as s2*65536 + s1 in most-
         significant-byte first (network) order.








Deutsch & Gailly             Informational                      [Page 6]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


   2.3. Compliance

      A compliant compressor must produce streams with correct CMF, FLG
      and ADLER32, but need not support preset dictionaries.  When the
      zlib data format is used as part of another standard data format,
      the compressor may use only preset dictionaries that are specified
      by this other data format.  If this other format does not use the
      preset dictionary feature, the compressor must not set the FDICT
      flag.

      A compliant decompressor must check CMF, FLG, and ADLER32, and
      provide an error indication if any of these have incorrect values.
      A compliant decompressor must give an error indication if CM is
      not one of the values defined in this specification (only the
      value 8 is permitted in this version), since another value could
      indicate the presence of new features that would cause subsequent
      data to be interpreted incorrectly.  A compliant decompressor must
      give an error indication if FDICT is set and DICTID is not the
      identifier of a known preset dictionary.  A decompressor may
      ignore FLEVEL and still be compliant.  When the zlib data format
      is being used as a part of another standard format, a compliant
      decompressor must support all the preset dictionaries specified by
      the other format. When the other format does not use the preset
      dictionary feature, a compliant decompressor must reject any
      stream in which the FDICT flag is set.

3. References

   [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/

   [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
       available in ftp://ftp.uu.net/graphics/png/documents/

   [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/

   [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
       Transmissions," IEEE Transactions on Communications, Vol. COM-30,
       No. 1, January 1982, pp. 247-252.

   [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
       November, 1993, pp. 144, 145. (Available from
       gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.







Deutsch & Gailly             Informational                      [Page 7]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


4. Source code

   Source code for a C language implementation of a "zlib" compliant
   library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.

5. Security Considerations

   A decoder that fails to check the ADLER32 checksum value may be
   subject to undetected data corruption.

6. Acknowledgements

   Trademarks cited in this document are the property of their
   respective owners.

   Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
   the related software described in this specification.  Glenn
   Randers-Pehrson converted this document to RFC and HTML format.

7. Authors' Addresses

   L. Peter Deutsch
   Aladdin Enterprises
   203 Santa Margarita Ave.
   Menlo Park, CA 94025

   Phone: (415) 322-0103 (AM only)
   FAX:   (415) 322-1734
   EMail: <ghost@aladdin.com>


   Jean-Loup Gailly

   EMail: <gzip@prep.ai.mit.edu>

   Questions about the technical content of this specification can be
   sent by email to

   Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
   Mark Adler <madler@alumni.caltech.edu>

   Editorial comments on this specification can be sent by email to

   L. Peter Deutsch <ghost@aladdin.com> and
   Glenn Randers-Pehrson <randeg@alumni.rpi.edu>






Deutsch & Gailly             Informational                      [Page 8]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


8. Appendix: Rationale

   8.1. Preset dictionaries

      A preset dictionary is specially useful to compress short input
      sequences. The compressor can take advantage of the dictionary
      context to encode the input in a more compact manner. The
      decompressor can be initialized with the appropriate context by
      virtually decompressing a compressed version of the dictionary
      without producing any output. However for certain compression
      algorithms such as the deflate algorithm this operation can be
      achieved without actually performing any decompression.

      The compressor and the decompressor must use exactly the same
      dictionary. The dictionary may be fixed or may be chosen among a
      certain number of predefined dictionaries, according to the kind
      of input data. The decompressor can determine which dictionary has
      been chosen by the compressor by checking the dictionary
      identifier. This document does not specify the contents of
      predefined dictionaries, since the optimal dictionaries are
      application specific. Standard data formats using this feature of
      the zlib specification must precisely define the allowed
      dictionaries.

   8.2. The Adler-32 algorithm

      The Adler-32 algorithm is much faster than the CRC32 algorithm yet
      still provides an extremely low probability of undetected errors.

      The modulo on unsigned long accumulators can be delayed for 5552
      bytes, so the modulo operation time is negligible.  If the bytes
      are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
      and order sensitive, unlike the first sum, which is just a
      checksum.  That 65521 is prime is important to avoid a possible
      large class of two-byte errors that leave the check unchanged.
      (The Fletcher checksum uses 255, which is not prime and which also
      makes the Fletcher check insensitive to single byte changes 0 <->
      255.)

      The sum s1 is initialized to 1 instead of zero to make the length
      of the sequence part of s2, so that the length does not have to be
      checked separately. (Any sequence of zeroes has a Fletcher
      checksum of zero.)








Deutsch & Gailly             Informational                      [Page 9]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


9. Appendix: Sample code

   The following C code computes the Adler-32 checksum of a data buffer.
   It is written for clarity, not for speed.  The sample code is in the
   ANSI C programming language. Non C users may find it easier to read
   with these hints:

      &      Bitwise AND operator.
      >>     Bitwise right shift operator. When applied to an
             unsigned quantity, as here, right shift inserts zero bit(s)
             at the left.
      <<     Bitwise left shift operator. Left shift inserts zero
             bit(s) at the right.
      ++     "n++" increments the variable n.
      %      modulo operator: a % b is the remainder of a divided by b.

      #define BASE 65521 /* largest prime smaller than 65536 */

      /*
         Update a running Adler-32 checksum with the bytes buf[0..len-1]
       and return the updated checksum. The Adler-32 checksum should be
       initialized to 1.

       Usage example:

         unsigned long adler = 1L;

         while (read_buffer(buffer, length) != EOF) {
           adler = update_adler32(adler, buffer, length);
         }
         if (adler != original_adler) error();
      */
      unsigned long update_adler32(unsigned long adler,
         unsigned char *buf, int len)
      {
        unsigned long s1 = adler & 0xffff;
        unsigned long s2 = (adler >> 16) & 0xffff;
        int n;

        for (n = 0; n < len; n++) {
          s1 = (s1 + buf[n]) % BASE;
          s2 = (s2 + s1)     % BASE;
        }
        return (s2 << 16) + s1;
      }

      /* Return the adler32 of the bytes buf[0..len-1] */




Deutsch & Gailly             Informational                     [Page 10]
 
RFC 1950       ZLIB Compressed Data Format Specification        May 1996


      unsigned long adler32(unsigned char *buf, int len)
      {
        return update_adler32(1L, buf, len);
      }















































Deutsch & Gailly             Informational                     [Page 11]
 
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<






















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Deleted compat/zlib/doc/rfc1951.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955






Network Working Group                                         P. Deutsch
Request for Comments: 1951                           Aladdin Enterprises
Category: Informational                                         May 1996


        DEFLATE Compressed Data Format Specification version 1.3

Status of This Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   The IESG takes no position on the validity of any Intellectual
   Property Rights statements contained in this document.

Notices

   Copyright (c) 1996 L. Peter Deutsch

   Permission is granted to copy and distribute this document for any
   purpose and without charge, including translations into other
   languages and incorporation into compilations, provided that the
   copyright notice and this notice are preserved, and that any
   substantive changes or deletions from the original are clearly
   marked.

   A pointer to the latest version of this and related documentation in
   HTML format can be found at the URL
   <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.

Abstract

   This specification defines a lossless compressed data format that
   compresses data using a combination of the LZ77 algorithm and Huffman
   coding, with efficiency comparable to the best currently available
   general-purpose compression methods.  The data can be produced or
   consumed, even for an arbitrarily long sequentially presented input
   data stream, using only an a priori bounded amount of intermediate
   storage.  The format can be implemented readily in a manner not
   covered by patents.








Deutsch                      Informational                      [Page 1]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


Table of Contents

   1. Introduction ................................................... 2
      1.1. Purpose ................................................... 2
      1.2. Intended audience ......................................... 3
      1.3. Scope ..................................................... 3
      1.4. Compliance ................................................ 3
      1.5.  Definitions of terms and conventions used ................ 3
      1.6. Changes from previous versions ............................ 4
   2. Compressed representation overview ............................. 4
   3. Detailed specification ......................................... 5
      3.1. Overall conventions ....................................... 5
          3.1.1. Packing into bytes .................................. 5
      3.2. Compressed block format ................................... 6
          3.2.1. Synopsis of prefix and Huffman coding ............... 6
          3.2.2. Use of Huffman coding in the "deflate" format ....... 7
          3.2.3. Details of block format ............................. 9
          3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
          3.2.5. Compressed blocks (length and distance codes) ...... 11
          3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
          3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
      3.3. Compliance ............................................... 14
   4. Compression algorithm details ................................. 14
   5. References .................................................... 16
   6. Security Considerations ....................................... 16
   7. Source code ................................................... 16
   8. Acknowledgements .............................................. 16
   9. Author's Address .............................................. 17

1. Introduction

   1.1. Purpose

      The purpose of this specification is to define a lossless
      compressed data format that:
          * Is independent of CPU type, operating system, file system,
            and character set, and hence can be used for interchange;
          * Can be produced or consumed, even for an arbitrarily long
            sequentially presented input data stream, using only an a
            priori bounded amount of intermediate storage, and hence
            can be used in data communications or similar structures
            such as Unix filters;
          * Compresses data with efficiency comparable to the best
            currently available general-purpose compression methods,
            and in particular considerably better than the "compress"
            program;
          * Can be implemented readily in a manner not covered by
            patents, and hence can be practiced freely;



Deutsch                      Informational                      [Page 2]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


          * Is compatible with the file format produced by the current
            widely used gzip utility, in that conforming decompressors
            will be able to read data produced by the existing gzip
            compressor.

      The data format defined by this specification does not attempt to:

          * Allow random access to compressed data;
          * Compress specialized data (e.g., raster graphics) as well
            as the best currently available specialized algorithms.

      A simple counting argument shows that no lossless compression
      algorithm can compress every possible input data set.  For the
      format defined here, the worst case expansion is 5 bytes per 32K-
      byte block, i.e., a size increase of 0.015% for large data sets.
      English text usually compresses by a factor of 2.5 to 3;
      executable files usually compress somewhat less; graphical data
      such as raster images may compress much more.

   1.2. Intended audience

      This specification is intended for use by implementors of software
      to compress data into "deflate" format and/or decompress data from
      "deflate" format.

      The text of the specification assumes a basic background in
      programming at the level of bits and other primitive data
      representations.  Familiarity with the technique of Huffman coding
      is helpful but not required.

   1.3. Scope

      The specification specifies a method for representing a sequence
      of bytes as a (usually shorter) sequence of bits, and a method for
      packing the latter bit sequence into bytes.

   1.4. Compliance

      Unless otherwise indicated below, a compliant decompressor must be
      able to accept and decompress any data set that conforms to all
      the specifications presented here; a compliant compressor must
      produce data sets that conform to all the specifications presented
      here.

   1.5.  Definitions of terms and conventions used

      Byte: 8 bits stored or transmitted as a unit (same as an octet).
      For this specification, a byte is exactly 8 bits, even on machines



Deutsch                      Informational                      [Page 3]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


      which store a character on a number of bits different from eight.
      See below, for the numbering of bits within a byte.

      String: a sequence of arbitrary bytes.

   1.6. Changes from previous versions

      There have been no technical changes to the deflate format since
      version 1.1 of this specification.  In version 1.2, some
      terminology was changed.  Version 1.3 is a conversion of the
      specification to RFC style.

2. Compressed representation overview

   A compressed data set consists of a series of blocks, corresponding
   to successive blocks of input data.  The block sizes are arbitrary,
   except that non-compressible blocks are limited to 65,535 bytes.

   Each block is compressed using a combination of the LZ77 algorithm
   and Huffman coding. The Huffman trees for each block are independent
   of those for previous or subsequent blocks; the LZ77 algorithm may
   use a reference to a duplicated string occurring in a previous block,
   up to 32K input bytes before.

   Each block consists of two parts: a pair of Huffman code trees that
   describe the representation of the compressed data part, and a
   compressed data part.  (The Huffman trees themselves are compressed
   using Huffman encoding.)  The compressed data consists of a series of
   elements of two types: literal bytes (of strings that have not been
   detected as duplicated within the previous 32K input bytes), and
   pointers to duplicated strings, where a pointer is represented as a
   pair <length, backward distance>.  The representation used in the
   "deflate" format limits distances to 32K bytes and lengths to 258
   bytes, but does not limit the size of a block, except for
   uncompressible blocks, which are limited as noted above.

   Each type of value (literals, distances, and lengths) in the
   compressed data is represented using a Huffman code, using one code
   tree for literals and lengths and a separate code tree for distances.
   The code trees for each block appear in a compact form just before
   the compressed data for that block.










Deutsch                      Informational                      [Page 4]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


3. Detailed specification

   3.1. Overall conventions In the diagrams below, a box like this:

         +---+
         |   | <-- the vertical bars might be missing
         +---+

      represents one byte; a box like this:

         +==============+
         |              |
         +==============+

      represents a variable number of bytes.

      Bytes stored within a computer do not have a "bit order", since
      they are always treated as a unit.  However, a byte considered as
      an integer between 0 and 255 does have a most- and least-
      significant bit, and since we write numbers with the most-
      significant digit on the left, we also write bytes with the most-
      significant bit on the left.  In the diagrams below, we number the
      bits of a byte so that bit 0 is the least-significant bit, i.e.,
      the bits are numbered:

         +--------+
         |76543210|
         +--------+

      Within a computer, a number may occupy multiple bytes.  All
      multi-byte numbers in the format described here are stored with
      the least-significant byte first (at the lower memory address).
      For example, the decimal number 520 is stored as:

             0        1
         +--------+--------+
         |00001000|00000010|
         +--------+--------+
          ^        ^
          |        |
          |        + more significant byte = 2 x 256
          + less significant byte = 8

      3.1.1. Packing into bytes

         This document does not address the issue of the order in which
         bits of a byte are transmitted on a bit-sequential medium,
         since the final data format described here is byte- rather than



Deutsch                      Informational                      [Page 5]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


         bit-oriented.  However, we describe the compressed block format
         in below, as a sequence of data elements of various bit
         lengths, not a sequence of bytes.  We must therefore specify
         how to pack these data elements into bytes to form the final
         compressed byte sequence:

             * Data elements are packed into bytes in order of
               increasing bit number within the byte, i.e., starting
               with the least-significant bit of the byte.
             * Data elements other than Huffman codes are packed
               starting with the least-significant bit of the data
               element.
             * Huffman codes are packed starting with the most-
               significant bit of the code.

         In other words, if one were to print out the compressed data as
         a sequence of bytes, starting with the first byte at the
         *right* margin and proceeding to the *left*, with the most-
         significant bit of each byte on the left as usual, one would be
         able to parse the result from right to left, with fixed-width
         elements in the correct MSB-to-LSB order and Huffman codes in
         bit-reversed order (i.e., with the first bit of the code in the
         relative LSB position).

   3.2. Compressed block format

      3.2.1. Synopsis of prefix and Huffman coding

         Prefix coding represents symbols from an a priori known
         alphabet by bit sequences (codes), one code for each symbol, in
         a manner such that different symbols may be represented by bit
         sequences of different lengths, but a parser can always parse
         an encoded string unambiguously symbol-by-symbol.

         We define a prefix code in terms of a binary tree in which the
         two edges descending from each non-leaf node are labeled 0 and
         1 and in which the leaf nodes correspond one-for-one with (are
         labeled with) the symbols of the alphabet; then the code for a
         symbol is the sequence of 0's and 1's on the edges leading from
         the root to the leaf labeled with that symbol.  For example:











Deutsch                      Informational                      [Page 6]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


                          /\              Symbol    Code
                         0  1             ------    ----
                        /    \                A      00
                       /\     B               B       1
                      0  1                    C     011
                     /    \                   D     010
                    A     /\
                         0  1
                        /    \
                       D      C

         A parser can decode the next symbol from an encoded input
         stream by walking down the tree from the root, at each step
         choosing the edge corresponding to the next input bit.

         Given an alphabet with known symbol frequencies, the Huffman
         algorithm allows the construction of an optimal prefix code
         (one which represents strings with those symbol frequencies
         using the fewest bits of any possible prefix codes for that
         alphabet).  Such a code is called a Huffman code.  (See
         reference [1] in Chapter 5, references for additional
         information on Huffman codes.)

         Note that in the "deflate" format, the Huffman codes for the
         various alphabets must not exceed certain maximum code lengths.
         This constraint complicates the algorithm for computing code
         lengths from symbol frequencies.  Again, see Chapter 5,
         references for details.

      3.2.2. Use of Huffman coding in the "deflate" format

         The Huffman codes used for each alphabet in the "deflate"
         format have two additional rules:

             * All codes of a given bit length have lexicographically
               consecutive values, in the same order as the symbols
               they represent;

             * Shorter codes lexicographically precede longer codes.












Deutsch                      Informational                      [Page 7]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


         We could recode the example above to follow this rule as
         follows, assuming that the order of the alphabet is ABCD:

            Symbol  Code
            ------  ----
            A       10
            B       0
            C       110
            D       111

         I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
         lexicographically consecutive.

         Given this rule, we can define the Huffman code for an alphabet
         just by giving the bit lengths of the codes for each symbol of
         the alphabet in order; this is sufficient to determine the
         actual codes.  In our example, the code is completely defined
         by the sequence of bit lengths (2, 1, 3, 3).  The following
         algorithm generates the codes as integers, intended to be read
         from most- to least-significant bit.  The code lengths are
         initially in tree[I].Len; the codes are produced in
         tree[I].Code.

         1)  Count the number of codes for each code length.  Let
             bl_count[N] be the number of codes of length N, N >= 1.

         2)  Find the numerical value of the smallest code for each
             code length:

                code = 0;
                bl_count[0] = 0;
                for (bits = 1; bits <= MAX_BITS; bits++) {
                    code = (code + bl_count[bits-1]) << 1;
                    next_code[bits] = code;
                }

         3)  Assign numerical values to all codes, using consecutive
             values for all codes of the same length with the base
             values determined at step 2. Codes that are never used
             (which have a bit length of zero) must not be assigned a
             value.

                for (n = 0;  n <= max_code; n++) {
                    len = tree[n].Len;
                    if (len != 0) {
                        tree[n].Code = next_code[len];
                        next_code[len]++;
                    }



Deutsch                      Informational                      [Page 8]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


                }

         Example:

         Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
         3, 2, 4, 4).  After step 1, we have:

            N      bl_count[N]
            -      -----------
            2      1
            3      5
            4      2

         Step 2 computes the following next_code values:

            N      next_code[N]
            -      ------------
            1      0
            2      0
            3      2
            4      14

         Step 3 produces the following code values:

            Symbol Length   Code
            ------ ------   ----
            A       3        010
            B       3        011
            C       3        100
            D       3        101
            E       3        110
            F       2         00
            G       4       1110
            H       4       1111

      3.2.3. Details of block format

         Each block of compressed data begins with 3 header bits
         containing the following data:

            first bit       BFINAL
            next 2 bits     BTYPE

         Note that the header bits do not necessarily begin on a byte
         boundary, since a block does not necessarily occupy an integral
         number of bytes.





Deutsch                      Informational                      [Page 9]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


         BFINAL is set if and only if this is the last block of the data
         set.

         BTYPE specifies how the data are compressed, as follows:

            00 - no compression
            01 - compressed with fixed Huffman codes
            10 - compressed with dynamic Huffman codes
            11 - reserved (error)

         The only difference between the two compressed cases is how the
         Huffman codes for the literal/length and distance alphabets are
         defined.

         In all cases, the decoding algorithm for the actual data is as
         follows:

            do
               read block header from input stream.
               if stored with no compression
                  skip any remaining bits in current partially
                     processed byte
                  read LEN and NLEN (see next section)
                  copy LEN bytes of data to output
               otherwise
                  if compressed with dynamic Huffman codes
                     read representation of code trees (see
                        subsection below)
                  loop (until end of block code recognized)
                     decode literal/length value from input stream
                     if value < 256
                        copy value (literal byte) to output stream
                     otherwise
                        if value = end of block (256)
                           break from loop
                        otherwise (value = 257..285)
                           decode distance from input stream

                           move backwards distance bytes in the output
                           stream, and copy length bytes from this
                           position to the output stream.
                  end loop
            while not last block

         Note that a duplicated string reference may refer to a string
         in a previous block; i.e., the backward distance may cross one
         or more block boundaries.  However a distance cannot refer past
         the beginning of the output stream.  (An application using a



Deutsch                      Informational                     [Page 10]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


         preset dictionary might discard part of the output stream; a
         distance can refer to that part of the output stream anyway)
         Note also that the referenced string may overlap the current
         position; for example, if the last 2 bytes decoded have values
         X and Y, a string reference with <length = 5, distance = 2>
         adds X,Y,X,Y,X to the output stream.

         We now specify each compression method in turn.

      3.2.4. Non-compressed blocks (BTYPE=00)

         Any bits of input up to the next byte boundary are ignored.
         The rest of the block consists of the following information:

              0   1   2   3   4...
            +---+---+---+---+================================+
            |  LEN  | NLEN  |... LEN bytes of literal data...|
            +---+---+---+---+================================+

         LEN is the number of data bytes in the block.  NLEN is the
         one's complement of LEN.

      3.2.5. Compressed blocks (length and distance codes)

         As noted above, encoded data blocks in the "deflate" format
         consist of sequences of symbols drawn from three conceptually
         distinct alphabets: either literal bytes, from the alphabet of
         byte values (0..255), or <length, backward distance> pairs,
         where the length is drawn from (3..258) and the distance is
         drawn from (1..32,768).  In fact, the literal and length
         alphabets are merged into a single alphabet (0..285), where
         values 0..255 represent literal bytes, the value 256 indicates
         end-of-block, and values 257..285 represent length codes
         (possibly in conjunction with extra bits following the symbol
         code) as follows:
















Deutsch                      Informational                     [Page 11]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


                 Extra               Extra               Extra
            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)
            ---- ---- ------     ---- ---- -------   ---- ---- -------
             257   0     3       267   1   15,16     277   4   67-82
             258   0     4       268   1   17,18     278   4   83-98
             259   0     5       269   2   19-22     279   4   99-114
             260   0     6       270   2   23-26     280   4  115-130
             261   0     7       271   2   27-30     281   5  131-162
             262   0     8       272   2   31-34     282   5  163-194
             263   0     9       273   3   35-42     283   5  195-226
             264   0    10       274   3   43-50     284   5  227-257
             265   1  11,12      275   3   51-58     285   0    258
             266   1  13,14      276   3   59-66

         The extra bits should be interpreted as a machine integer
         stored with the most-significant bit first, e.g., bits 1110
         represent the value 14.

                  Extra           Extra               Extra
             Code Bits Dist  Code Bits   Dist     Code Bits Distance
             ---- ---- ----  ---- ----  ------    ---- ---- --------
               0   0    1     10   4     33-48    20    9   1025-1536
               1   0    2     11   4     49-64    21    9   1537-2048
               2   0    3     12   5     65-96    22   10   2049-3072
               3   0    4     13   5     97-128   23   10   3073-4096
               4   1   5,6    14   6    129-192   24   11   4097-6144
               5   1   7,8    15   6    193-256   25   11   6145-8192
               6   2   9-12   16   7    257-384   26   12  8193-12288
               7   2  13-16   17   7    385-512   27   12 12289-16384
               8   3  17-24   18   8    513-768   28   13 16385-24576
               9   3  25-32   19   8   769-1024   29   13 24577-32768

      3.2.6. Compression with fixed Huffman codes (BTYPE=01)

         The Huffman codes for the two alphabets are fixed, and are not
         represented explicitly in the data.  The Huffman code lengths
         for the literal/length alphabet are:

                   Lit Value    Bits        Codes
                   ---------    ----        -----
                     0 - 143     8          00110000 through
                                            10111111
                   144 - 255     9          110010000 through
                                            111111111
                   256 - 279     7          0000000 through
                                            0010111
                   280 - 287     8          11000000 through
                                            11000111



Deutsch                      Informational                     [Page 12]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


         The code lengths are sufficient to generate the actual codes,
         as described above; we show the codes in the table for added
         clarity.  Literal/length values 286-287 will never actually
         occur in the compressed data, but participate in the code
         construction.

         Distance codes 0-31 are represented by (fixed-length) 5-bit
         codes, with possible additional bits as shown in the table
         shown in Paragraph 3.2.5, above.  Note that distance codes 30-
         31 will never actually occur in the compressed data.

      3.2.7. Compression with dynamic Huffman codes (BTYPE=10)

         The Huffman codes for the two alphabets appear in the block
         immediately after the header bits and before the actual
         compressed data, first the literal/length code and then the
         distance code.  Each code is defined by a sequence of code
         lengths, as discussed in Paragraph 3.2.2, above.  For even
         greater compactness, the code length sequences themselves are
         compressed using a Huffman code.  The alphabet for code lengths
         is as follows:

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)

         A code length of 0 indicates that the corresponding symbol in
         the literal/length or distance alphabet will not occur in the
         block, and should not participate in the Huffman code
         construction algorithm given earlier.  If only one distance
         code is used, it is encoded using one bit, not zero bits; in
         this case there is a single code length of one, with one unused
         code.  One distance code of zero bits means that there are no
         distance codes used at all (the data is all literals).

         We can now define the format of the block:

               5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
               5 Bits: HDIST, # of Distance codes - 1        (1 - 32)
               4 Bits: HCLEN, # of Code Length codes - 4     (4 - 19)



Deutsch                      Informational                     [Page 13]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


               (HCLEN + 4) x 3 bits: code lengths for the code length
                  alphabet given just above, in the order: 16, 17, 18,
                  0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

                  These code lengths are interpreted as 3-bit integers
                  (0-7); as above, a code length of 0 means the
                  corresponding symbol (literal/length or distance code
                  length) is not used.

               HLIT + 257 code lengths for the literal/length alphabet,
                  encoded using the code length Huffman code

               HDIST + 1 code lengths for the distance alphabet,
                  encoded using the code length Huffman code

               The actual compressed data of the block,
                  encoded using the literal/length and distance Huffman
                  codes

               The literal/length symbol 256 (end of data),
                  encoded using the literal/length Huffman code

         The code length repeat codes can cross from HLIT + 257 to the
         HDIST + 1 code lengths.  In other words, all code lengths form
         a single sequence of HLIT + HDIST + 258 values.

   3.3. Compliance

      A compressor may limit further the ranges of values specified in
      the previous section and still be compliant; for example, it may
      limit the range of backward pointers to some value smaller than
      32K.  Similarly, a compressor may limit the size of blocks so that
      a compressible block fits in memory.

      A compliant decompressor must accept the full range of possible
      values defined in the previous section, and must accept blocks of
      arbitrary size.

4. Compression algorithm details

   While it is the intent of this document to define the "deflate"
   compressed data format without reference to any particular
   compression algorithm, the format is related to the compressed
   formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
   since many variations of LZ77 are patented, it is strongly
   recommended that the implementor of a compressor follow the general
   algorithm presented here, which is known not to be patented per se.
   The material in this section is not part of the definition of the



Deutsch                      Informational                     [Page 14]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


   specification per se, and a compressor need not follow it in order to
   be compliant.

   The compressor terminates a block when it determines that starting a
   new block with fresh trees would be useful, or when the block size
   fills up the compressor's block buffer.

   The compressor uses a chained hash table to find duplicated strings,
   using a hash function that operates on 3-byte sequences.  At any
   given point during compression, let XYZ be the next 3 input bytes to
   be examined (not necessarily all different, of course).  First, the
   compressor examines the hash chain for XYZ.  If the chain is empty,
   the compressor simply writes out X as a literal byte and advances one
   byte in the input.  If the hash chain is not empty, indicating that
   the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
   same hash function value) has occurred recently, the compressor
   compares all strings on the XYZ hash chain with the actual input data
   sequence starting at the current point, and selects the longest
   match.

   The compressor searches the hash chains starting with the most recent
   strings, to favor small distances and thus take advantage of the
   Huffman encoding.  The hash chains are singly linked. There are no
   deletions from the hash chains; the algorithm simply discards matches
   that are too old.  To avoid a worst-case situation, very long hash
   chains are arbitrarily truncated at a certain length, determined by a
   run-time parameter.

   To improve overall compression, the compressor optionally defers the
   selection of matches ("lazy matching"): after a match of length N has
   been found, the compressor searches for a longer match starting at
   the next input byte.  If it finds a longer match, it truncates the
   previous match to a length of one (thus producing a single literal
   byte) and then emits the longer match.  Otherwise, it emits the
   original match, and, as described above, advances N bytes before
   continuing.

   Run-time parameters also control this "lazy match" procedure.  If
   compression ratio is most important, the compressor attempts a
   complete second search regardless of the length of the first match.
   In the normal case, if the current match is "long enough", the
   compressor reduces the search for a longer match, thus speeding up
   the process.  If speed is most important, the compressor inserts new
   strings in the hash table only when no match was found, or when the
   match is not "too long".  This degrades the compression ratio but
   saves time since there are both fewer insertions and fewer searches.





Deutsch                      Informational                     [Page 15]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


5. References

   [1] Huffman, D. A., "A Method for the Construction of Minimum
       Redundancy Codes", Proceedings of the Institute of Radio
       Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.

   [2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
       Compression", IEEE Transactions on Information Theory, Vol. 23,
       No. 3, pp. 337-343.

   [3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/

   [4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
       available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/

   [5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
       encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.

   [6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
       Comm. ACM, 33,4, April 1990, pp. 449-459.

6. Security Considerations

   Any data compression method involves the reduction of redundancy in
   the data.  Consequently, any corruption of the data is likely to have
   severe effects and be difficult to correct.  Uncompressed text, on
   the other hand, will probably still be readable despite the presence
   of some corrupted bytes.

   It is recommended that systems using this data format provide some
   means of validating the integrity of the compressed data.  See
   reference [3], for example.

7. Source code

   Source code for a C language implementation of a "deflate" compliant
   compressor and decompressor is available within the zlib package at
   ftp://ftp.uu.net/pub/archiving/zip/zlib/.

8. Acknowledgements

   Trademarks cited in this document are the property of their
   respective owners.

   Phil Katz designed the deflate format.  Jean-Loup Gailly and Mark
   Adler wrote the related software described in this specification.
   Glenn Randers-Pehrson converted this document to RFC and HTML format.



Deutsch                      Informational                     [Page 16]
 
RFC 1951      DEFLATE Compressed Data Format Specification      May 1996


9. Author's Address

   L. Peter Deutsch
   Aladdin Enterprises
   203 Santa Margarita Ave.
   Menlo Park, CA 94025

   Phone: (415) 322-0103 (AM only)
   FAX:   (415) 322-1734
   EMail: <ghost@aladdin.com>

   Questions about the technical content of this specification can be
   sent by email to:

   Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
   Mark Adler <madler@alumni.caltech.edu>

   Editorial comments on this specification can be sent by email to:

   L. Peter Deutsch <ghost@aladdin.com> and
   Glenn Randers-Pehrson <randeg@alumni.rpi.edu>






























Deutsch                      Informational                     [Page 17]
 
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<






















































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Deleted compat/zlib/doc/rfc1952.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675






Network Working Group                                         P. Deutsch
Request for Comments: 1952                           Aladdin Enterprises
Category: Informational                                         May 1996


               GZIP file format specification version 4.3

Status of This Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   The IESG takes no position on the validity of any Intellectual
   Property Rights statements contained in this document.

Notices

   Copyright (c) 1996 L. Peter Deutsch

   Permission is granted to copy and distribute this document for any
   purpose and without charge, including translations into other
   languages and incorporation into compilations, provided that the
   copyright notice and this notice are preserved, and that any
   substantive changes or deletions from the original are clearly
   marked.

   A pointer to the latest version of this and related documentation in
   HTML format can be found at the URL
   <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.

Abstract

   This specification defines a lossless compressed data format that is
   compatible with the widely used GZIP utility.  The format includes a
   cyclic redundancy check value for detecting data corruption.  The
   format presently uses the DEFLATE method of compression but can be
   easily extended to use other compression methods.  The format can be
   implemented readily in a manner not covered by patents.










Deutsch                      Informational                      [Page 1]
 
RFC 1952             GZIP File Format Specification             May 1996


Table of Contents

   1. Introduction ................................................... 2
      1.1. Purpose ................................................... 2
      1.2. Intended audience ......................................... 3
      1.3. Scope ..................................................... 3
      1.4. Compliance ................................................ 3
      1.5. Definitions of terms and conventions used ................. 3
      1.6. Changes from previous versions ............................ 3
   2. Detailed specification ......................................... 4
      2.1. Overall conventions ....................................... 4
      2.2. File format ............................................... 5
      2.3. Member format ............................................. 5
          2.3.1. Member header and trailer ........................... 6
              2.3.1.1. Extra field ................................... 8
              2.3.1.2. Compliance .................................... 9
      3. References .................................................. 9
      4. Security Considerations .................................... 10
      5. Acknowledgements ........................................... 10
      6. Author's Address ........................................... 10
      7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
      8. Appendix: Sample CRC Code .................................. 11

1. Introduction

   1.1. Purpose

      The purpose of this specification is to define a lossless
      compressed data format that:

          * Is independent of CPU type, operating system, file system,
            and character set, and hence can be used for interchange;
          * Can compress or decompress a data stream (as opposed to a
            randomly accessible file) to produce another data stream,
            using only an a priori bounded amount of intermediate
            storage, and hence can be used in data communications or
            similar structures such as Unix filters;
          * Compresses data with efficiency comparable to the best
            currently available general-purpose compression methods,
            and in particular considerably better than the "compress"
            program;
          * Can be implemented readily in a manner not covered by
            patents, and hence can be practiced freely;
          * Is compatible with the file format produced by the current
            widely used gzip utility, in that conforming decompressors
            will be able to read data produced by the existing gzip
            compressor.




Deutsch                      Informational                      [Page 2]
 
RFC 1952             GZIP File Format Specification             May 1996


      The data format defined by this specification does not attempt to:

          * Provide random access to compressed data;
          * Compress specialized data (e.g., raster graphics) as well as
            the best currently available specialized algorithms.

   1.2. Intended audience

      This specification is intended for use by implementors of software
      to compress data into gzip format and/or decompress data from gzip
      format.

      The text of the specification assumes a basic background in
      programming at the level of bits and other primitive data
      representations.

   1.3. Scope

      The specification specifies a compression method and a file format
      (the latter assuming only that a file can store a sequence of
      arbitrary bytes).  It does not specify any particular interface to
      a file system or anything about character sets or encodings
      (except for file names and comments, which are optional).

   1.4. Compliance

      Unless otherwise indicated below, a compliant decompressor must be
      able to accept and decompress any file that conforms to all the
      specifications presented here; a compliant compressor must produce
      files that conform to all the specifications presented here.  The
      material in the appendices is not part of the specification per se
      and is not relevant to compliance.

   1.5. Definitions of terms and conventions used

      byte: 8 bits stored or transmitted as a unit (same as an octet).
      (For this specification, a byte is exactly 8 bits, even on
      machines which store a character on a number of bits different
      from 8.)  See below for the numbering of bits within a byte.

   1.6. Changes from previous versions

      There have been no technical changes to the gzip format since
      version 4.1 of this specification.  In version 4.2, some
      terminology was changed, and the sample CRC code was rewritten for
      clarity and to eliminate the requirement for the caller to do pre-
      and post-conditioning.  Version 4.3 is a conversion of the
      specification to RFC style.



Deutsch                      Informational                      [Page 3]
 
RFC 1952             GZIP File Format Specification             May 1996


2. Detailed specification

   2.1. Overall conventions

      In the diagrams below, a box like this:

         +---+
         |   | <-- the vertical bars might be missing
         +---+

      represents one byte; a box like this:

         +==============+
         |              |
         +==============+

      represents a variable number of bytes.

      Bytes stored within a computer do not have a "bit order", since
      they are always treated as a unit.  However, a byte considered as
      an integer between 0 and 255 does have a most- and least-
      significant bit, and since we write numbers with the most-
      significant digit on the left, we also write bytes with the most-
      significant bit on the left.  In the diagrams below, we number the
      bits of a byte so that bit 0 is the least-significant bit, i.e.,
      the bits are numbered:

         +--------+
         |76543210|
         +--------+

      This document does not address the issue of the order in which
      bits of a byte are transmitted on a bit-sequential medium, since
      the data format described here is byte- rather than bit-oriented.

      Within a computer, a number may occupy multiple bytes.  All
      multi-byte numbers in the format described here are stored with
      the least-significant byte first (at the lower memory address).
      For example, the decimal number 520 is stored as:

             0        1
         +--------+--------+
         |00001000|00000010|
         +--------+--------+
          ^        ^
          |        |
          |        + more significant byte = 2 x 256
          + less significant byte = 8



Deutsch                      Informational                      [Page 4]
 
RFC 1952             GZIP File Format Specification             May 1996


   2.2. File format

      A gzip file consists of a series of "members" (compressed data
      sets).  The format of each member is specified in the following
      section.  The members simply appear one after another in the file,
      with no additional information before, between, or after them.

   2.3. Member format

      Each member has the following structure:

         +---+---+---+---+---+---+---+---+---+---+
         |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
         +---+---+---+---+---+---+---+---+---+---+

      (if FLG.FEXTRA set)

         +---+---+=================================+
         | XLEN  |...XLEN bytes of "extra field"...| (more-->)
         +---+---+=================================+

      (if FLG.FNAME set)

         +=========================================+
         |...original file name, zero-terminated...| (more-->)
         +=========================================+

      (if FLG.FCOMMENT set)

         +===================================+
         |...file comment, zero-terminated...| (more-->)
         +===================================+

      (if FLG.FHCRC set)

         +---+---+
         | CRC16 |
         +---+---+

         +=======================+
         |...compressed blocks...| (more-->)
         +=======================+

           0   1   2   3   4   5   6   7
         +---+---+---+---+---+---+---+---+
         |     CRC32     |     ISIZE     |
         +---+---+---+---+---+---+---+---+




Deutsch                      Informational                      [Page 5]
 
RFC 1952             GZIP File Format Specification             May 1996


      2.3.1. Member header and trailer

         ID1 (IDentification 1)
         ID2 (IDentification 2)
            These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
            (0x8b, \213), to identify the file as being in gzip format.

         CM (Compression Method)
            This identifies the compression method used in the file.  CM
            = 0-7 are reserved.  CM = 8 denotes the "deflate"
            compression method, which is the one customarily used by
            gzip and which is documented elsewhere.

         FLG (FLaGs)
            This flag byte is divided into individual bits as follows:

               bit 0   FTEXT
               bit 1   FHCRC
               bit 2   FEXTRA
               bit 3   FNAME
               bit 4   FCOMMENT
               bit 5   reserved
               bit 6   reserved
               bit 7   reserved

            If FTEXT is set, the file is probably ASCII text.  This is
            an optional indication, which the compressor may set by
            checking a small amount of the input data to see whether any
            non-ASCII characters are present.  In case of doubt, FTEXT
            is cleared, indicating binary data. For systems which have
            different file formats for ascii text and binary data, the
            decompressor can use FTEXT to choose the appropriate format.
            We deliberately do not specify the algorithm used to set
            this bit, since a compressor always has the option of
            leaving it cleared and a decompressor always has the option
            of ignoring it and letting some other program handle issues
            of data conversion.

            If FHCRC is set, a CRC16 for the gzip header is present,
            immediately before the compressed data. The CRC16 consists
            of the two least significant bytes of the CRC32 for all
            bytes of the gzip header up to and not including the CRC16.
            [The FHCRC bit was never set by versions of gzip up to
            1.2.4, even though it was documented with a different
            meaning in gzip 1.2.4.]

            If FEXTRA is set, optional extra fields are present, as
            described in a following section.



Deutsch                      Informational                      [Page 6]
 
RFC 1952             GZIP File Format Specification             May 1996


            If FNAME is set, an original file name is present,
            terminated by a zero byte.  The name must consist of ISO
            8859-1 (LATIN-1) characters; on operating systems using
            EBCDIC or any other character set for file names, the name
            must be translated to the ISO LATIN-1 character set.  This
            is the original name of the file being compressed, with any
            directory components removed, and, if the file being
            compressed is on a file system with case insensitive names,
            forced to lower case. There is no original file name if the
            data was compressed from a source other than a named file;
            for example, if the source was stdin on a Unix system, there
            is no file name.

            If FCOMMENT is set, a zero-terminated file comment is
            present.  This comment is not interpreted; it is only
            intended for human consumption.  The comment must consist of
            ISO 8859-1 (LATIN-1) characters.  Line breaks should be
            denoted by a single line feed character (10 decimal).

            Reserved FLG bits must be zero.

         MTIME (Modification TIME)
            This gives the most recent modification time of the original
            file being compressed.  The time is in Unix format, i.e.,
            seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
            may cause problems for MS-DOS and other systems that use
            local rather than Universal time.)  If the compressed data
            did not come from a file, MTIME is set to the time at which
            compression started.  MTIME = 0 means no time stamp is
            available.

         XFL (eXtra FLags)
            These flags are available for use by specific compression
            methods.  The "deflate" method (CM = 8) sets these flags as
            follows:

               XFL = 2 - compressor used maximum compression,
                         slowest algorithm
               XFL = 4 - compressor used fastest algorithm

         OS (Operating System)
            This identifies the type of file system on which compression
            took place.  This may be useful in determining end-of-line
            convention for text files.  The currently defined values are
            as follows:






Deutsch                      Informational                      [Page 7]
 
RFC 1952             GZIP File Format Specification             May 1996


                 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
                 1 - Amiga
                 2 - VMS (or OpenVMS)
                 3 - Unix
                 4 - VM/CMS
                 5 - Atari TOS
                 6 - HPFS filesystem (OS/2, NT)
                 7 - Macintosh
                 8 - Z-System
                 9 - CP/M
                10 - TOPS-20
                11 - NTFS filesystem (NT)
                12 - QDOS
                13 - Acorn RISCOS
               255 - unknown

         XLEN (eXtra LENgth)
            If FLG.FEXTRA is set, this gives the length of the optional
            extra field.  See below for details.

         CRC32 (CRC-32)
            This contains a Cyclic Redundancy Check value of the
            uncompressed data computed according to CRC-32 algorithm
            used in the ISO 3309 standard and in section 8.1.1.6.2 of
            ITU-T recommendation V.42.  (See http://www.iso.ch for
            ordering ISO documents. See gopher://info.itu.ch for an
            online version of ITU-T V.42.)

         ISIZE (Input SIZE)
            This contains the size of the original (uncompressed) input
            data modulo 2^32.

      2.3.1.1. Extra field

         If the FLG.FEXTRA bit is set, an "extra field" is present in
         the header, with total length XLEN bytes.  It consists of a
         series of subfields, each of the form:

            +---+---+---+---+==================================+
            |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
            +---+---+---+---+==================================+

         SI1 and SI2 provide a subfield ID, typically two ASCII letters
         with some mnemonic value.  Jean-Loup Gailly
         <gzip@prep.ai.mit.edu> is maintaining a registry of subfield
         IDs; please send him any subfield ID you wish to use.  Subfield
         IDs with SI2 = 0 are reserved for future use.  The following
         IDs are currently defined:



Deutsch                      Informational                      [Page 8]
 
RFC 1952             GZIP File Format Specification             May 1996


            SI1         SI2         Data
            ----------  ----------  ----
            0x41 ('A')  0x70 ('P')  Apollo file type information

         LEN gives the length of the subfield data, excluding the 4
         initial bytes.

      2.3.1.2. Compliance

         A compliant compressor must produce files with correct ID1,
         ID2, CM, CRC32, and ISIZE, but may set all the other fields in
         the fixed-length part of the header to default values (255 for
         OS, 0 for all others).  The compressor must set all reserved
         bits to zero.

         A compliant decompressor must check ID1, ID2, and CM, and
         provide an error indication if any of these have incorrect
         values.  It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
         at least so it can skip over the optional fields if they are
         present.  It need not examine any other part of the header or
         trailer; in particular, a decompressor may ignore FTEXT and OS
         and always produce binary output, and still be compliant.  A
         compliant decompressor must give an error indication if any
         reserved bit is non-zero, since such a bit could indicate the
         presence of a new field that would cause subsequent data to be
         interpreted incorrectly.

3. References

   [1] "Information Processing - 8-bit single-byte coded graphic
       character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
       The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
       ASCII. Files defining this character set are available as
       iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/

   [2] ISO 3309

   [3] ITU-T recommendation V.42

   [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
       available in ftp://ftp.uu.net/pub/archiving/zip/doc/

   [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
       ftp://prep.ai.mit.edu/pub/gnu/

   [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
       Look-Up", Communications of the ACM, 31(8), pp.1008-1013.




Deutsch                      Informational                      [Page 9]
 
RFC 1952             GZIP File Format Specification             May 1996


   [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
       pp.118-133.

   [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
       describing the CRC concept.

4. Security Considerations

   Any data compression method involves the reduction of redundancy in
   the data.  Consequently, any corruption of the data is likely to have
   severe effects and be difficult to correct.  Uncompressed text, on
   the other hand, will probably still be readable despite the presence
   of some corrupted bytes.

   It is recommended that systems using this data format provide some
   means of validating the integrity of the compressed data, such as by
   setting and checking the CRC-32 check value.

5. Acknowledgements

   Trademarks cited in this document are the property of their
   respective owners.

   Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
   the related software described in this specification.  Glenn
   Randers-Pehrson converted this document to RFC and HTML format.

6. Author's Address

   L. Peter Deutsch
   Aladdin Enterprises
   203 Santa Margarita Ave.
   Menlo Park, CA 94025

   Phone: (415) 322-0103 (AM only)
   FAX:   (415) 322-1734
   EMail: <ghost@aladdin.com>

   Questions about the technical content of this specification can be
   sent by email to:

   Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
   Mark Adler <madler@alumni.caltech.edu>

   Editorial comments on this specification can be sent by email to:

   L. Peter Deutsch <ghost@aladdin.com> and
   Glenn Randers-Pehrson <randeg@alumni.rpi.edu>



Deutsch                      Informational                     [Page 10]
 
RFC 1952             GZIP File Format Specification             May 1996


7. Appendix: Jean-Loup Gailly's gzip utility

   The most widely used implementation of gzip compression, and the
   original documentation on which this specification is based, were
   created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>.  Since this
   implementation is a de facto standard, we mention some more of its
   features here.  Again, the material in this section is not part of
   the specification per se, and implementations need not follow it to
   be compliant.

   When compressing or decompressing a file, gzip preserves the
   protection, ownership, and modification time attributes on the local
   file system, since there is no provision for representing protection
   attributes in the gzip file format itself.  Since the file format
   includes a modification time, the gzip decompressor provides a
   command line switch that assigns the modification time from the file,
   rather than the local modification time of the compressed input, to
   the decompressed output.

8. Appendix: Sample CRC Code

   The following sample code represents a practical implementation of
   the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
   for a formal specification.)

   The sample code is in the ANSI C programming language. Non C users
   may find it easier to read with these hints:

      &      Bitwise AND operator.
      ^      Bitwise exclusive-OR operator.
      >>     Bitwise right shift operator. When applied to an
             unsigned quantity, as here, right shift inserts zero
             bit(s) at the left.
      !      Logical NOT operator.
      ++     "n++" increments the variable n.
      0xNNN  0x introduces a hexadecimal (base 16) constant.
             Suffix L indicates a long value (at least 32 bits).

      /* Table of CRCs of all 8-bit messages. */
      unsigned long crc_table[256];

      /* Flag: has the table been computed? Initially false. */
      int crc_table_computed = 0;

      /* Make the table for a fast CRC. */
      void make_crc_table(void)
      {
        unsigned long c;



Deutsch                      Informational                     [Page 11]
 
RFC 1952             GZIP File Format Specification             May 1996


        int n, k;
        for (n = 0; n < 256; n++) {
          c = (unsigned long) n;
          for (k = 0; k < 8; k++) {
            if (c & 1) {
              c = 0xedb88320L ^ (c >> 1);
            } else {
              c = c >> 1;
            }
          }
          crc_table[n] = c;
        }
        crc_table_computed = 1;
      }

      /*
         Update a running crc with the bytes buf[0..len-1] and return
       the updated crc. The crc should be initialized to zero. Pre- and
       post-conditioning (one's complement) is performed within this
       function so it shouldn't be done by the caller. Usage example:

         unsigned long crc = 0L;

         while (read_buffer(buffer, length) != EOF) {
           crc = update_crc(crc, buffer, length);
         }
         if (crc != original_crc) error();
      */
      unsigned long update_crc(unsigned long crc,
                      unsigned char *buf, int len)
      {
        unsigned long c = crc ^ 0xffffffffL;
        int n;

        if (!crc_table_computed)
          make_crc_table();
        for (n = 0; n < len; n++) {
          c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
        }
        return c ^ 0xffffffffL;
      }

      /* Return the CRC of the bytes buf[0..len-1]. */
      unsigned long crc(unsigned char *buf, int len)
      {
        return update_crc(0L, buf, len);
      }




Deutsch                      Informational                     [Page 12]
 
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<






































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Deleted compat/zlib/doc/txtvsbin.txt.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
A Fast Method for Identifying Plain Text Files
==============================================


Introduction
------------

Given a file coming from an unknown source, it is sometimes desirable
to find out whether the format of that file is plain text.  Although
this may appear like a simple task, a fully accurate detection of the
file type requires heavy-duty semantic analysis on the file contents.
It is, however, possible to obtain satisfactory results by employing
various heuristics.

Previous versions of PKZip and other zip-compatible compression tools
were using a crude detection scheme: if more than 80% (4/5) of the bytes
found in a certain buffer are within the range [7..127], the file is
labeled as plain text, otherwise it is labeled as binary.  A prominent
limitation of this scheme is the restriction to Latin-based alphabets.
Other alphabets, like Greek, Cyrillic or Asian, make extensive use of
the bytes within the range [128..255], and texts using these alphabets
are most often misidentified by this scheme; in other words, the rate
of false negatives is sometimes too high, which means that the recall
is low.  Another weakness of this scheme is a reduced precision, due to
the false positives that may occur when binary files containing large
amounts of textual characters are misidentified as plain text.

In this article we propose a new, simple detection scheme that features
a much increased precision and a near-100% recall.  This scheme is
designed to work on ASCII, Unicode and other ASCII-derived alphabets,
and it handles single-byte encodings (ISO-8859, MacRoman, KOI8, etc.)
and variable-sized encodings (ISO-2022, UTF-8, etc.).  Wider encodings
(UCS-2/UTF-16 and UCS-4/UTF-32) are not handled, however.


The Algorithm
-------------

The algorithm works by dividing the set of bytecodes [0..255] into three
categories:
- The white list of textual bytecodes:
  9 (TAB), 10 (LF), 13 (CR), 32 (SPACE) to 255.
- The gray list of tolerated bytecodes:
  7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC).
- The black list of undesired, non-textual bytecodes:
  0 (NUL) to 6, 14 to 31.

If a file contains at least one byte that belongs to the white list and
no byte that belongs to the black list, then the file is categorized as
plain text; otherwise, it is categorized as binary.  (The boundary case,
when the file is empty, automatically falls into the latter category.)


Rationale
---------

The idea behind this algorithm relies on two observations.

The first observation is that, although the full range of 7-bit codes
[0..127] is properly specified by the ASCII standard, most control
characters in the range [0..31] are not used in practice.  The only
widely-used, almost universally-portable control codes are 9 (TAB),
10 (LF) and 13 (CR).  There are a few more control codes that are
recognized on a reduced range of platforms and text viewers/editors:
7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB) and 27 (ESC); but these
codes are rarely (if ever) used alone, without being accompanied by
some printable text.  Even the newer, portable text formats such as
XML avoid using control characters outside the list mentioned here.

The second observation is that most of the binary files tend to contain
control characters, especially 0 (NUL).  Even though the older text
detection schemes observe the presence of non-ASCII codes from the range
[128..255], the precision rarely has to suffer if this upper range is
labeled as textual, because the files that are genuinely binary tend to
contain both control characters and codes from the upper range.  On the
other hand, the upper range needs to be labeled as textual, because it
is used by virtually all ASCII extensions.  In particular, this range is
used for encoding non-Latin scripts.

Since there is no counting involved, other than simply observing the
presence or the absence of some byte values, the algorithm produces
consistent results, regardless what alphabet encoding is being used.
(If counting were involved, it could be possible to obtain different
results on a text encoded, say, using ISO-8859-16 versus UTF-8.)

There is an extra category of plain text files that are "polluted" with
one or more black-listed codes, either by mistake or by peculiar design
considerations.  In such cases, a scheme that tolerates a small fraction
of black-listed codes would provide an increased recall (i.e. more true
positives).  This, however, incurs a reduced precision overall, since
false positives are more likely to appear in binary files that contain
large chunks of textual data.  Furthermore, "polluted" plain text should
be regarded as binary by general-purpose text detection schemes, because
general-purpose text processing algorithms might not be applicable.
Under this premise, it is safe to say that our detection method provides
a near-100% recall.

Experiments have been run on many files coming from various platforms
and applications.  We tried plain text files, system logs, source code,
formatted office documents, compiled object code, etc.  The results
confirm the optimistic assumptions about the capabilities of this
algorithm.


--
Cosmin Truta
Last updated: 2006-May-28
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<






















































































































































































































Changes to src/clone.c.

173
174
175
176
177
178
179

180
181
182
183
184
185
186
    db_begin_transaction();
    db_record_repository_filename(g.argv[3]);
    db_initial_setup(0, 0, zDefaultUser);
    user_select();
    db_set("content-schema", CONTENT_SCHEMA, 0);
    db_set("aux-schema", AUX_SCHEMA_MAX, 0);
    db_set("rebuilt", get_version(), 0);

    remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
    url_remember();
    if( g.zSSLIdentity!=0 ){
      /* If the --ssl-identity option was specified, store it as a setting */
      Blob fn;
      blob_zero(&fn);
      file_canonical_name(g.zSSLIdentity, &fn, 0);







>







173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
    db_begin_transaction();
    db_record_repository_filename(g.argv[3]);
    db_initial_setup(0, 0, zDefaultUser);
    user_select();
    db_set("content-schema", CONTENT_SCHEMA, 0);
    db_set("aux-schema", AUX_SCHEMA_MAX, 0);
    db_set("rebuilt", get_version(), 0);
    db_unset("hash-policy", 0);
    remember_or_get_http_auth(zHttpAuth, urlFlags & URL_REMEMBER, g.argv[2]);
    url_remember();
    if( g.zSSLIdentity!=0 ){
      /* If the --ssl-identity option was specified, store it as a setting */
      Blob fn;
      blob_zero(&fn);
      file_canonical_name(g.zSSLIdentity, &fn, 0);

Changes to src/configure.c.

127
128
129
130
131
132
133

134
135
136
137
138
139
140
  { "crnl-glob",              CONFIGSET_PROJ },
  { "encoding-glob",          CONFIGSET_PROJ },
  { "empty-dirs",             CONFIGSET_PROJ },
  { "allow-symlinks",         CONFIGSET_PROJ },
  { "dotfiles",               CONFIGSET_PROJ },
  { "parent-project-code",    CONFIGSET_PROJ },
  { "parent-project-name",    CONFIGSET_PROJ },


#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
  { "mv-rm-files",            CONFIGSET_PROJ },
#endif

  { "ticket-table",           CONFIGSET_TKT  },
  { "ticket-common",          CONFIGSET_TKT  },







>







127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
  { "crnl-glob",              CONFIGSET_PROJ },
  { "encoding-glob",          CONFIGSET_PROJ },
  { "empty-dirs",             CONFIGSET_PROJ },
  { "allow-symlinks",         CONFIGSET_PROJ },
  { "dotfiles",               CONFIGSET_PROJ },
  { "parent-project-code",    CONFIGSET_PROJ },
  { "parent-project-name",    CONFIGSET_PROJ },
  { "hash-policy",            CONFIGSET_PROJ },

#ifdef FOSSIL_ENABLE_LEGACY_MV_RM
  { "mv-rm-files",            CONFIGSET_PROJ },
#endif

  { "ticket-table",           CONFIGSET_TKT  },
  { "ticket-common",          CONFIGSET_TKT  },

Changes to src/content.c.

526
527
528
529
530
531
532




533
534
535
536
537
538
539
      /* No existing artifact with the auxiliary hash name.  Therefore, use
      ** the primary hash name. */
      blob_reset(&hash);
      hname_hash(pBlob, 0, &hash);
    }
  }else{
    blob_init(&hash, zUuid, -1);




  }
  if( nBlob ){
    size = nBlob;
  }else{
    size = blob_size(pBlob);
    if( srcId ){
      size = delta_output_size(blob_buffer(pBlob), size);







>
>
>
>







526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
      /* No existing artifact with the auxiliary hash name.  Therefore, use
      ** the primary hash name. */
      blob_reset(&hash);
      hname_hash(pBlob, 0, &hash);
    }
  }else{
    blob_init(&hash, zUuid, -1);
  }
  if( g.eHashPolicy==HPOLICY_AUTO && blob_size(&hash)>HNAME_LEN_SHA1 ){
    g.eHashPolicy = HPOLICY_SHA3;
    db_set_int("hash-policy", HPOLICY_SHA3, 0);
  }
  if( nBlob ){
    size = nBlob;
  }else{
    size = blob_size(pBlob);
    if( srcId ){
      size = delta_output_size(blob_buffer(pBlob), size);

Changes to src/db.c.

1483
1484
1485
1486
1487
1488
1489





1490
1491
1492
1493
1494
1495
1496
....
1826
1827
1828
1829
1830
1831
1832

1833
1834
1835
1836
1837
1838
1839
....
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907

1908
1909
1910
1911
1912
1913
1914
....
1915
1916
1917
1918
1919
1920
1921


1922
1923
1924
1925

1926
1927
1928
1929
1930
1931
1932
....
1935
1936
1937
1938
1939
1940
1941




1942
1943
1944
1945
1946
1947
1948
  g.zRepositoryName = mprintf("%s", zDbName);
  db_open_or_attach(g.zRepositoryName, "repository");
  g.repositoryOpen = 1;
  /* Cache "allow-symlinks" option, because we'll need it on every stat call */
  g.allowSymlinks = db_get_boolean("allow-symlinks",
                                   db_allow_symlinks_by_default());
  g.zAuxSchema = db_get("aux-schema","");






  /* If the ALIAS table is not present, then some on-the-fly schema
  ** updates might be required.
  */
  rebuild_schema_update_2_0();   /* Do the Fossil-2.0 schema updates */
}

................................................................................
      " SELECT name,value,mtime FROM settingSrc.config"
      "  WHERE (name IN %s OR name IN %s)"
      "    AND name NOT GLOB 'project-*'"
      "    AND name NOT GLOB 'short-project-*';",
      configure_inop_rhs(CONFIGSET_ALL),
      db_setting_inop_rhs()
    );

    db_multi_exec(
      "REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
    );

    /*
    ** Copy the user permissions, contact information, last modified
    ** time, and photo for all the "system" users from the supplied
................................................................................
** repository is used, almost all of the settings accessible from the setup
** page, either directly or indirectly, will be copied.  Normal users and
** their associated permissions will not be copied; however, the system
** default users "anonymous", "nobody", "reader", "developer", and their
** associated permissions will be copied.
**
** Options:
**    --template      FILE      copy settings from repository file
**    --admin-user|-A USERNAME  select given USERNAME as admin user
**    --date-override DATETIME  use DATETIME as time of the initial check-in

**
** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
** year-month-day form, it may be truncated, the "T" may be replaced by
** a space, and it may also name a timezone offset from UTC as "-HH:MM"
** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
** means UTC.
**
................................................................................
** See also: clone
*/
void create_repository_cmd(void){
  char *zPassword;
  const char *zTemplate;      /* Repository from which to copy settings */
  const char *zDate;          /* Date of the initial check-in */
  const char *zDefaultUser;   /* Optional name of the default user */



  zTemplate = find_option("template",0,1);
  zDate = find_option("date-override",0,1);
  zDefaultUser = find_option("admin-user","A",1);

  /* We should be done with options.. */
  verify_all_options();

  if( g.argc!=3 ){
    usage("REPOSITORY-NAME");
  }

................................................................................
  }

  db_create_repository(g.argv[2]);
  db_open_repository(g.argv[2]);
  db_open_config(0, 0);
  if( zTemplate ) db_attach(zTemplate, "settingSrc");
  db_begin_transaction();




  if( zDate==0 ) zDate = "now";
  db_initial_setup(zTemplate, zDate, zDefaultUser);
  db_end_transaction(0);
  if( zTemplate ) db_detach("settingSrc");
  fossil_print("project-id: %s\n", db_get("project-code", 0));
  fossil_print("server-id:  %s\n", db_get("server-code", 0));
  zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);







>
>
>
>
>







 







>







 







|
|
|
>







 







>
>




>







 







>
>
>
>







1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
....
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
....
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
....
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
....
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
  g.zRepositoryName = mprintf("%s", zDbName);
  db_open_or_attach(g.zRepositoryName, "repository");
  g.repositoryOpen = 1;
  /* Cache "allow-symlinks" option, because we'll need it on every stat call */
  g.allowSymlinks = db_get_boolean("allow-symlinks",
                                   db_allow_symlinks_by_default());
  g.zAuxSchema = db_get("aux-schema","");
  g.eHashPolicy = db_get_int("hash-policy",-1);
  if( g.eHashPolicy<0 ){
    g.eHashPolicy = hname_default_policy();
    db_set_int("hash-policy", g.eHashPolicy, 0);
  }

  /* If the ALIAS table is not present, then some on-the-fly schema
  ** updates might be required.
  */
  rebuild_schema_update_2_0();   /* Do the Fossil-2.0 schema updates */
}

................................................................................
      " SELECT name,value,mtime FROM settingSrc.config"
      "  WHERE (name IN %s OR name IN %s)"
      "    AND name NOT GLOB 'project-*'"
      "    AND name NOT GLOB 'short-project-*';",
      configure_inop_rhs(CONFIGSET_ALL),
      db_setting_inop_rhs()
    );
    g.eHashPolicy = db_get_int("hash-policy", g.eHashPolicy);
    db_multi_exec(
      "REPLACE INTO reportfmt SELECT * FROM settingSrc.reportfmt;"
    );

    /*
    ** Copy the user permissions, contact information, last modified
    ** time, and photo for all the "system" users from the supplied
................................................................................
** repository is used, almost all of the settings accessible from the setup
** page, either directly or indirectly, will be copied.  Normal users and
** their associated permissions will not be copied; however, the system
** default users "anonymous", "nobody", "reader", "developer", and their
** associated permissions will be copied.
**
** Options:
**    --template      FILE         Copy settings from repository file
**    --admin-user|-A USERNAME     Select given USERNAME as admin user
**    --date-override DATETIME     Use DATETIME as time of the initial check-in
**    --sha1                       Use a initial hash policy of "sha1"
**
** DATETIME may be "now" or "YYYY-MM-DDTHH:MM:SS.SSS". If in
** year-month-day form, it may be truncated, the "T" may be replaced by
** a space, and it may also name a timezone offset from UTC as "-HH:MM"
** (westward) or "+HH:MM" (eastward). Either no timezone suffix or "Z"
** means UTC.
**
................................................................................
** See also: clone
*/
void create_repository_cmd(void){
  char *zPassword;
  const char *zTemplate;      /* Repository from which to copy settings */
  const char *zDate;          /* Date of the initial check-in */
  const char *zDefaultUser;   /* Optional name of the default user */
  int bUseSha1 = 0;           /* True to set the hash-policy to sha1 */


  zTemplate = find_option("template",0,1);
  zDate = find_option("date-override",0,1);
  zDefaultUser = find_option("admin-user","A",1);
  bUseSha1 = find_option("sha1",0,0)!=0;
  /* We should be done with options.. */
  verify_all_options();

  if( g.argc!=3 ){
    usage("REPOSITORY-NAME");
  }

................................................................................
  }

  db_create_repository(g.argv[2]);
  db_open_repository(g.argv[2]);
  db_open_config(0, 0);
  if( zTemplate ) db_attach(zTemplate, "settingSrc");
  db_begin_transaction();
  if( bUseSha1 ){
    g.eHashPolicy = HPOLICY_SHA1;
    db_set_int("hash-policy", HPOLICY_SHA1, 0);
  }
  if( zDate==0 ) zDate = "now";
  db_initial_setup(zTemplate, zDate, zDefaultUser);
  db_end_transaction(0);
  if( zTemplate ) db_detach("settingSrc");
  fossil_print("project-id: %s\n", db_get("project-code", 0));
  fossil_print("server-id:  %s\n", db_get("server-code", 0));
  zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);

Changes to src/diffcmd.c.

149
150
151
152
153
154
155



156
157
158
159
160
161
162
...
165
166
167
168
169
170
171
172

173
174
175
176
177
178
179
...
192
193
194
195
196
197
198



199

200
201
202
203
204
205
206
...
250
251
252
253
254
255
256



257


258
259

260
261
262
263
264
265
266
...
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
...
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
}

/*
** Show the difference between two files, one in memory and one on disk.
**
** The difference is the set of edits needed to transform pFile1 into
** zFile2.  The content of pFile1 is in memory.  zFile2 exists on disk.



**
** Use the internal diff logic if zDiffCmd is NULL.  Otherwise call the
** command zDiffCmd to do the diffing.
**
** When using an external diff program, zBinGlob contains the GLOB patterns
** for file names to treat as binary.  If fIncludeBinary is zero, these files
** will be skipped in addition to files that may contain binary content.
................................................................................
  Blob *pFile1,             /* In memory content to compare from */
  int isBin1,               /* Does the 'from' content appear to be binary */
  const char *zFile2,       /* On disk content to compare to */
  const char *zName,        /* Display name of the file */
  const char *zDiffCmd,     /* Command for comparison */
  const char *zBinGlob,     /* Treat file names matching this as binary */
  int fIncludeBinary,       /* Include binary files for external diff */
  u64 diffFlags             /* Flags to control the diff */

){
  if( zDiffCmd==0 ){
    Blob out;                 /* Diff output text */
    Blob file2;               /* Content of zFile2 */
    const char *zName2;       /* Name of zFile2 for display */

    /* Read content of zFile2 into memory */
................................................................................
    /* Compute and output the differences */
    if( diffFlags & DIFF_BRIEF ){
      if( blob_compare(pFile1, &file2) ){
        fossil_print("CHANGED  %s\n", zName);
      }
    }else{
      blob_zero(&out);



      text_diff(pFile1, &file2, &out, 0, diffFlags);

      if( blob_size(&out) ){
        diff_print_filenames(zName, zName2, diffFlags);
        fossil_print("%s\n", blob_str(&out));
      }
      blob_reset(&out);
    }

................................................................................
      blob_appendf(&nameFile1, "%s~%d", zFile2, cnt++);
    }while( file_access(blob_str(&nameFile1),F_OK)==0 );
    blob_write_to_file(pFile1, blob_str(&nameFile1));

    /* Construct the external diff command */
    blob_zero(&cmd);
    blob_appendf(&cmd, "%s ", zDiffCmd);



    shell_escape(&cmd, blob_str(&nameFile1));


    blob_append(&cmd, " ", 1);
    shell_escape(&cmd, zFile2);


    /* Run the external diff command */
    fossil_system(blob_str(&cmd));

    /* Delete the temporary file and clean up memory used */
    file_delete(blob_str(&nameFile1));
    blob_reset(&nameFile1);
................................................................................
        content_get(srcid, &content);
      }else{
        blob_zero(&content);
      }
      isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
      diff_print_index(zPathname, diffFlags);
      diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
                zBinGlob, fIncludeBinary, diffFlags);
      blob_reset(&content);
    }
    blob_reset(&fname);
  }
  db_finalize(&q);
  db_end_transaction(1);  /* ROLLBACK */
}
................................................................................
  while( db_step(&q)==SQLITE_ROW ){
    char *zFullName;
    const char *zFile = (const char*)db_column_text(&q, 0);
    if( !file_dir_match(pFileDir, zFile) ) continue;
    zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
    db_column_blob(&q, 1, &content);
    diff_file(&content, 0, zFullName, zFile,
              zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
    fossil_free(zFullName);
    blob_reset(&content);
  }
  db_finalize(&q);
}

/*







>
>
>







 







|
>







 







>
>
>
|
>







 







>
>
>
|
>
>
|
|
>







 







|







 







|







149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
...
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
...
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
...
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
...
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
...
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
}

/*
** Show the difference between two files, one in memory and one on disk.
**
** The difference is the set of edits needed to transform pFile1 into
** zFile2.  The content of pFile1 is in memory.  zFile2 exists on disk.
**
** If fSwapDiff is 1, show the set of edits to transform zFile2 into pFile1
** instead of the opposite.
**
** Use the internal diff logic if zDiffCmd is NULL.  Otherwise call the
** command zDiffCmd to do the diffing.
**
** When using an external diff program, zBinGlob contains the GLOB patterns
** for file names to treat as binary.  If fIncludeBinary is zero, these files
** will be skipped in addition to files that may contain binary content.
................................................................................
  Blob *pFile1,             /* In memory content to compare from */
  int isBin1,               /* Does the 'from' content appear to be binary */
  const char *zFile2,       /* On disk content to compare to */
  const char *zName,        /* Display name of the file */
  const char *zDiffCmd,     /* Command for comparison */
  const char *zBinGlob,     /* Treat file names matching this as binary */
  int fIncludeBinary,       /* Include binary files for external diff */
  u64 diffFlags,            /* Flags to control the diff */
  int fSwapDiff             /* Diff from Zfile2 to Pfile1 */
){
  if( zDiffCmd==0 ){
    Blob out;                 /* Diff output text */
    Blob file2;               /* Content of zFile2 */
    const char *zName2;       /* Name of zFile2 for display */

    /* Read content of zFile2 into memory */
................................................................................
    /* Compute and output the differences */
    if( diffFlags & DIFF_BRIEF ){
      if( blob_compare(pFile1, &file2) ){
        fossil_print("CHANGED  %s\n", zName);
      }
    }else{
      blob_zero(&out);
      if( fSwapDiff ){
        text_diff(&file2, pFile1, &out, 0, diffFlags);
      }else{
        text_diff(pFile1, &file2, &out, 0, diffFlags);
      }
      if( blob_size(&out) ){
        diff_print_filenames(zName, zName2, diffFlags);
        fossil_print("%s\n", blob_str(&out));
      }
      blob_reset(&out);
    }

................................................................................
      blob_appendf(&nameFile1, "%s~%d", zFile2, cnt++);
    }while( file_access(blob_str(&nameFile1),F_OK)==0 );
    blob_write_to_file(pFile1, blob_str(&nameFile1));

    /* Construct the external diff command */
    blob_zero(&cmd);
    blob_appendf(&cmd, "%s ", zDiffCmd);
    if( fSwapDiff ){
      shell_escape(&cmd, zFile2);
      blob_append(&cmd, " ", 1);
      shell_escape(&cmd, blob_str(&nameFile1));
    }else{
      shell_escape(&cmd, blob_str(&nameFile1));
      blob_append(&cmd, " ", 1);
      shell_escape(&cmd, zFile2);
    }

    /* Run the external diff command */
    fossil_system(blob_str(&cmd));

    /* Delete the temporary file and clean up memory used */
    file_delete(blob_str(&nameFile1));
    blob_reset(&nameFile1);
................................................................................
        content_get(srcid, &content);
      }else{
        blob_zero(&content);
      }
      isBin = fIncludeBinary ? 0 : looks_like_binary(&content);
      diff_print_index(zPathname, diffFlags);
      diff_file(&content, isBin, zFullName, zPathname, zDiffCmd,
                zBinGlob, fIncludeBinary, diffFlags, 0);
      blob_reset(&content);
    }
    blob_reset(&fname);
  }
  db_finalize(&q);
  db_end_transaction(1);  /* ROLLBACK */
}
................................................................................
  while( db_step(&q)==SQLITE_ROW ){
    char *zFullName;
    const char *zFile = (const char*)db_column_text(&q, 0);
    if( !file_dir_match(pFileDir, zFile) ) continue;
    zFullName = mprintf("%s%s", g.zLocalRoot, zFile);
    db_column_blob(&q, 1, &content);
    diff_file(&content, 0, zFullName, zFile,
              zDiffCmd, zBinGlob, fIncludeBinary, diffFlags, 0);
    fossil_free(zFullName);
    blob_reset(&content);
  }
  db_finalize(&q);
}

/*

Changes to src/doc.c.

733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
  db_end_transaction(0);
  return;

  /* Jump here when unable to locate the document */
doc_not_found:
  db_end_transaction(0);
  if( isUV && P("name")==0 ){
    uvstat_page();
    return;
  }
  cgi_set_status(404, "Not Found");
  style_header("Not Found");
  @ <p>Document %h(zOrigName) not found
  if( fossil_strcmp(zCheckin,"ckout")!=0 ){
    @ in %z(href("%R/tree?ci=%T",zCheckin))%h(zCheckin)</a>







|







733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
  db_end_transaction(0);
  return;

  /* Jump here when unable to locate the document */
doc_not_found:
  db_end_transaction(0);
  if( isUV && P("name")==0 ){
    uvlist_page();
    return;
  }
  cgi_set_status(404, "Not Found");
  style_header("Not Found");
  @ <p>Document %h(zOrigName) not found
  if( fossil_strcmp(zCheckin,"ckout")!=0 ){
    @ in %z(href("%R/tree?ci=%T",zCheckin))%h(zCheckin)</a>

Changes to src/encode.c.

334
335
336
337
338
339
340


























































































341
342
343
344
345
346
347
      }
    }
    z[j++] = c;
  }
  if( z[j] ) z[j] = 0;
}




























































































/*
** The characters used for HTTP base64 encoding.
*/
static unsigned char zBase[] =
  "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
      }
    }
    z[j++] = c;
  }
  if( z[j] ) z[j] = 0;
}


/*
** The *pz variable points to a UTF8 string.  Read the next character
** off of that string and return its codepoint value.  Advance *pz to the
** next character
*/
u32 fossil_utf8_read(
  const unsigned char **pz    /* Pointer to string from which to read char */
){
  unsigned int c;

  /*
  ** This lookup table is used to help decode the first byte of
  ** a multi-byte UTF8 character.
  */
  static const unsigned char utf8Trans1[] = {
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
    0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
    0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
    0x00, 0x01, 0x02, 0x03, 0x00, 0x01, 0x00, 0x00,
  };

  c = *((*pz)++);
  if( c>=0xc0 ){
    c = utf8Trans1[c-0xc0];
    while( (*(*pz) & 0xc0)==0x80 ){
      c = (c<<6) + (0x3f & *((*pz)++));
    }
    if( c<0x80
        || (c&0xFFFFF800)==0xD800
        || (c&0xFFFFFFFE)==0xFFFE ){  c = 0xFFFD; }
  }
  return c;
}

/*
** Encode a UTF8 string for JSON.  All special characters are escaped.
*/
void blob_append_json_string(Blob *pBlob, const char *zStr){
  const unsigned char *z;
  char *zOut;
  u32 c;
  int n, i, j;
  z = (const unsigned char*)zStr;
  n = 0;
  while( (c = fossil_utf8_read(&z))!=0 ){
    if( c=='\\' || c=='"' ){
      n += 2;
    }else if( c<' ' || c>=0x7f ){
      if( c=='\n' || c=='\r' ){
        n += 2;
      }else{
        n += 6;
      }
    }else{
      n++;
    }
  }
  i = blob_size(pBlob);
  blob_resize(pBlob, i+n);
  zOut = blob_buffer(pBlob);
  z = (const unsigned char*)zStr;
  while( (c = fossil_utf8_read(&z))!=0 ){
    if( c=='\\' ){
      zOut[i++] = '\\';
      zOut[i++] = c;
    }else if( c<' ' || c>=0x7f ){
      zOut[i++] = '\\';
      if( c=='\n' ){
        zOut[i++] = 'n';
      }else if( c=='\r' ){
        zOut[i++] = 'r';
      }else{
        zOut[i++] = 'u';
        for(j=3; j>=0; j--){
          zOut[i+j] = "0123456789abcdef"[c&0xf];
          c >>= 4;
        }
        i += 4;
      }
    }else{
      zOut[i++] = c;
    }
  }
  zOut[i] = 0;
}

/*
** The characters used for HTTP base64 encoding.
*/
static unsigned char zBase[] =
  "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

Changes to src/hname.c.

14
15
16
17
18
19
20
21


22
23
24
25
26
27
28
..
45
46
47
48
49
50
51









52
53
54
55
56
57
58
...
140
141
142
143
144
145
146
147
148









149
150
151
152
153

154



155


156

157
158
159
160
161

162



163




164

165
166
167

























































































**   http://www.hwaci.com/drh/
**
*******************************************************************************
**
** This file contains generic code for dealing with hashes used for
** naming artifacts.  Specific hash algorithms are implemented separately
** (for example in sha1.c and sha3.c).  This file contains the generic
** interface code.


*/
#include "config.h"
#include "hname.h"


#if INTERFACE
/*
................................................................................
#define HNAME_LEN_K256   64

/*
** The number of distinct hash algorithms:
*/
#define HNAME_COUNT 2     /* Just SHA1 and SHA3-256. Let's keep it that way! */










#endif /* INTERFACE */

/*
** Return a human-readable name for the hash algorithm given a hash with
** a length of nHash hexadecimal digits.
*/
const char *hname_alg(int nHash){
................................................................................
  return id;
}

/*
** Compute a hash on blob pContent.  Write the hash into blob pHashOut.
** This routine assumes that pHashOut is uninitialized.
**
** The preferred hash is used for iHType==0, and various alternative hashes
** are used for iHType>0 && iHType<NHAME_COUNT.









*/
void hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){
#if RELEASE_VERSION_NUMBER>=20100
  /* For Fossil 2.1 and later, the preferred hash algorithm is SHA3-256 and
  ** SHA1 is the secondary hash algorithm. */

  switch( iHType ){



    case 0:  sha3sum_blob(pContent, 256, pHashOut); break;


    case 1:  sha1sum_blob(pContent, pHashOut);      break;

  }
#else
  /* Prior to Fossil 2.1, the preferred hash algorithm is SHA1 (for backwards
  ** compatibility with Fossil 1.x) and SHA3-256 is the only auxiliary
  ** algorithm */

  switch( iHType ){



    case 0:  sha1sum_blob(pContent, pHashOut);      break;




    case 1:  sha3sum_blob(pContent, 256, pHashOut); break;

  }
#endif
}
































































































|
>
>







 







>
>
>
>
>
>
>
>
>







 







|
|
>
>
>
>
>
>
>
>
>

|
<
<
<
>
|
>
>
>
|
>
>
|
>
|
<
<
<
<
>
|
>
>
>
|
>
>
>
>
|
>
|
<
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
..
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
...
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170



171
172
173
174
175
176
177
178
179
180
181




182
183
184
185
186
187
188
189
190
191
192
193
194

195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
**   http://www.hwaci.com/drh/
**
*******************************************************************************
**
** This file contains generic code for dealing with hashes used for
** naming artifacts.  Specific hash algorithms are implemented separately
** (for example in sha1.c and sha3.c).  This file contains the generic
** interface logic.
**
** "hname" is intended to be an abbreviation of "hash name".
*/
#include "config.h"
#include "hname.h"


#if INTERFACE
/*
................................................................................
#define HNAME_LEN_K256   64

/*
** The number of distinct hash algorithms:
*/
#define HNAME_COUNT 2     /* Just SHA1 and SHA3-256. Let's keep it that way! */

/*
** Hash naming policies
*/
#define HPOLICY_SHA1           0      /* Use SHA1 hashes */
#define HPOLICY_AUTO           1      /* SHA1 but auto-promote to SHA3 */
#define HPOLICY_SHA3           2      /* Use SHA3 hashes */
#define HPOLICY_SHA3_ONLY      3      /* Use SHA3 hashes exclusively */
#define HPOLICY_SHUN_SHA1      4      /* Shun all SHA1 objects */

#endif /* INTERFACE */

/*
** Return a human-readable name for the hash algorithm given a hash with
** a length of nHash hexadecimal digits.
*/
const char *hname_alg(int nHash){
................................................................................
  return id;
}

/*
** Compute a hash on blob pContent.  Write the hash into blob pHashOut.
** This routine assumes that pHashOut is uninitialized.
**
** The preferred hash is used for iHType==0 and the alternative hash is
** used if iHType==1.  (The interface is designed to accommodate more than
** just two hashes, but HNAME_COUNT is currently fixed at 2.)
**
** Depending on the hash policy, the alternative hash may be disallowed.
** If the alterative hash is disallowed, the routine returns 0.  This
** routine returns 1 if iHType>0 and the alternative hash is allowed,
** and it always returns 1 when iHType==0.
**
** Alternative hash is disallowed for all hash policies except auto,
** sha1 and sha3.
*/
int hname_hash(const Blob *pContent, unsigned int iHType, Blob *pHashOut){



  assert( iHType==0 || iHType==1 );
  if( iHType==1 ){
    switch( g.eHashPolicy ){
      case HPOLICY_AUTO:
      case HPOLICY_SHA1:
        sha3sum_blob(pContent, 256, pHashOut);
        return 1;
      case HPOLICY_SHA3:
        sha1sum_blob(pContent, pHashOut);
        return 1;
    }




  }
  if( iHType==0 ){
    switch( g.eHashPolicy ){
      case HPOLICY_SHA1:
      case HPOLICY_AUTO:
        sha1sum_blob(pContent, pHashOut);
        return 1;
      case HPOLICY_SHA3:
      case HPOLICY_SHA3_ONLY:
      case HPOLICY_SHUN_SHA1:
        sha3sum_blob(pContent, 256, pHashOut);
        return 1;
    }

  }
  blob_init(pHashOut, 0, 0);
  return 0;
}

/*
** Return the default hash policy for repositories that do not currently
** have an assigned hash policy.
**
** Make the default HPOLICY_AUTO if there are SHA1 artficates but no SHA3
** artifacts in the repository.  Make the default HPOLICY_SHA3 if there
** are one or more SHA3 artifacts or if the repository is initially empty.
*/
int hname_default_policy(void){
  if( db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
   || !db_exists("SELECT 1 FROM blob WHERE length(uuid)==40")
  ){
    return HPOLICY_SHA3;
  }else{
    return HPOLICY_AUTO;
  }
}

/*
** Names of the hash policies.
*/
static const char *azPolicy[] = {
  "sha1", "auto", "sha3", "sha3-only", "shun-sha1"
};

/* Return the name of the current hash policy.
*/
const char *hpolicy_name(void){
  return azPolicy[g.eHashPolicy];
}


/*
** COMMAND: hash-policy*
**
** Usage: fossil hash-policy ?NEW-POLICY?
**
** Query or set the hash policy for the current repository.  Available hash
** policies are as follows:
**
**   sha1              New artifact names are created using SHA1
**
**   auto              New artifact names are created using SHA1, but
**                     automatically change the policy to "sha3" when
**                     any SHA3 artifact enters the repository.
**
**   sha3              New artifact names are created using SHA3, but
**                     older artifacts with SHA1 names may be reused.
**
**   sha3-only         Use only SHA3 artifact names.  Do not reuse legacy
**                     SHA1 names.
**
**   shun-sha1         Shun any SHA1 artifacts received by sync operations
**                     other than clones.  Older legacy SHA1 artifacts are
**                     are allowed during a clone.
**
** The default hash policy for existing repositories is "auto", which will
** immediately promote to "sha3" if the repository contains one or more
** artifacts with SHA3 names.  The default hash policy for new repositories
** is "shun-sha1".
*/
void hash_policy_command(void){
  int i;
  db_find_and_open_repository(0, 0);
  if( g.argc!=2 && g.argc!=3 ) usage("?NEW-POLICY?");
  if( g.argc==2 ){
    fossil_print("%s\n", azPolicy[g.eHashPolicy]);
    return;
  }
  for(i=HPOLICY_SHA1; i<=HPOLICY_SHUN_SHA1; i++){
    if( fossil_strcmp(g.argv[2],azPolicy[i])==0 ){
      if( i==HPOLICY_AUTO
       && db_exists("SELECT 1 FROM blob WHERE length(uuid)>40")
      ){
        i = HPOLICY_SHA3;
      }
      g.eHashPolicy = i;
      db_set_int("hash-policy", i, 0);
      fossil_print("%s\n", azPolicy[i]);
      return;
    }
  }
  fossil_fatal("unknown hash policy \"%s\" - should be one of: sha1 auto"
               " sha3 sha3-only shun-sha1", g.argv[2]);
}

Changes to src/main.c.

138
139
140
141
142
143
144

145
146
147
148
149
150
151
....
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
....
2020
2021
2022
2023
2024
2025
2026


2027
2028
2029
2030
2031
2032
2033
  char *zRepositoryOption; /* Most recent cached repository option value */
  char *zRepositoryName;  /* Name of the repository database file */
  char *zLocalDbName;     /* Name of the local database file */
  char *zOpenRevision;    /* Check-in version to use during database open */
  int localOpen;          /* True if the local database is open */
  char *zLocalRoot;       /* The directory holding the  local database */
  int minPrefix;          /* Number of digits needed for a distinct UUID */

  int fNoDirSymlinks;     /* True if --no-dir-symlinks flag is present */
  int fSqlTrace;          /* True if --sqltrace flag is present */
  int fSqlStats;          /* True if --sqltrace or --sqlstats are present */
  int fSqlPrint;          /* True if -sqlprint flag is present */
  int fQuiet;             /* True if -quiet flag is present */
  int fJail;              /* True if running with a chroot jail */
  int fHttpTrace;         /* Trace outbound HTTP requests */
................................................................................
** Open the repository to be served if it is known.  If g.argv[arg] is
** a directory full of repositories, then set g.zRepositoryName to
** the name of that directory and the specific repository will be
** opened later by process_one_web_page() based on the content of
** the PATH_INFO variable.
**
** If the fCreate flag is set, then create the repository if it
** does not already exist.
*/
static void find_server_repository(int arg, int fCreate){
  if( g.argc<=arg ){
    db_must_be_within_tree();
  }else{
    const char *zRepo = g.argv[arg];
    int isDir = file_isdir(zRepo);
................................................................................
      file_simplify_name(g.zRepositoryName, -1, 0);
    }else{
      if( isDir==0 && fCreate ){
        const char *zPassword;
        db_create_repository(zRepo);
        db_open_repository(zRepo);
        db_begin_transaction();


        db_initial_setup(0, "now", g.zLogin);
        db_end_transaction(0);
        fossil_print("project-id: %s\n", db_get("project-code", 0));
        fossil_print("server-id:  %s\n", db_get("server-code", 0));
        zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
        fossil_print("admin-user: %s (initial password is \"%s\")\n",
                     g.zLogin, zPassword);







>







 







|







 







>
>







138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
....
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
....
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
  char *zRepositoryOption; /* Most recent cached repository option value */
  char *zRepositoryName;  /* Name of the repository database file */
  char *zLocalDbName;     /* Name of the local database file */
  char *zOpenRevision;    /* Check-in version to use during database open */
  int localOpen;          /* True if the local database is open */
  char *zLocalRoot;       /* The directory holding the  local database */
  int minPrefix;          /* Number of digits needed for a distinct UUID */
  int eHashPolicy;        /* Current hash policy.  One of HPOLICY_* */
  int fNoDirSymlinks;     /* True if --no-dir-symlinks flag is present */
  int fSqlTrace;          /* True if --sqltrace flag is present */
  int fSqlStats;          /* True if --sqltrace or --sqlstats are present */
  int fSqlPrint;          /* True if -sqlprint flag is present */
  int fQuiet;             /* True if -quiet flag is present */
  int fJail;              /* True if running with a chroot jail */
  int fHttpTrace;         /* Trace outbound HTTP requests */
................................................................................
** Open the repository to be served if it is known.  If g.argv[arg] is
** a directory full of repositories, then set g.zRepositoryName to
** the name of that directory and the specific repository will be
** opened later by process_one_web_page() based on the content of
** the PATH_INFO variable.
**
** If the fCreate flag is set, then create the repository if it
** does not already exist. Always use "auto" hash-policy in this case.
*/
static void find_server_repository(int arg, int fCreate){
  if( g.argc<=arg ){
    db_must_be_within_tree();
  }else{
    const char *zRepo = g.argv[arg];
    int isDir = file_isdir(zRepo);
................................................................................
      file_simplify_name(g.zRepositoryName, -1, 0);
    }else{
      if( isDir==0 && fCreate ){
        const char *zPassword;
        db_create_repository(zRepo);
        db_open_repository(zRepo);
        db_begin_transaction();
        g.eHashPolicy = HPOLICY_AUTO;
        db_set_int("hash-policy", HPOLICY_AUTO, 0);
        db_initial_setup(0, "now", g.zLogin);
        db_end_transaction(0);
        fossil_print("project-id: %s\n", db_get("project-code", 0));
        fossil_print("server-id:  %s\n", db_get("server-code", 0));
        zPassword = db_text(0, "SELECT pw FROM user WHERE login=%Q", g.zLogin);
        fossil_print("admin-user: %s (initial password is \"%s\")\n",
                     g.zLogin, zPassword);

Changes to src/sha3.c.

376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
...
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
    A44 =   B4 ^((~B0)&  B1 );
  }
}

/*
** Initialize a new hash.  iSize determines the size of the hash
** in bits and should be one of 224, 256, 384, or 512.  Or iSize
** can be zero to use the default hash size of 224 bits.
*/
static void SHA3Init(SHA3Context *p, int iSize){
  memset(p, 0, sizeof(*p));
  if( iSize>=128 && iSize<=512 ){
    p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
  }else{
    p->nRate = 144;
  }
#if SHA3_BYTEORDER==1234
  /* Known to be little-endian at compile-time. No-op */
#elif SHA3_BYTEORDER==4321
  p->ixMask = 7;  /* Big-endian */
#else
  {
................................................................................
        KeccakF1600Step(p);
        p->nLoaded = 0;
      }
    }
  }
#endif
  for(; i<nData; i++){
#if SHA1_BYTEORDER==1234
    p->u.x[p->nLoaded] ^= aData[i];
#elif SHA3_BYTEORDER==4321
    p->u.x[p->nLoaded^0x07] ^= aData[i];
#else
    p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
#endif
    p->nLoaded++;







|






|







 







|







376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
...
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
    A44 =   B4 ^((~B0)&  B1 );
  }
}

/*
** Initialize a new hash.  iSize determines the size of the hash
** in bits and should be one of 224, 256, 384, or 512.  Or iSize
** can be zero to use the default hash size of 256 bits.
*/
static void SHA3Init(SHA3Context *p, int iSize){
  memset(p, 0, sizeof(*p));
  if( iSize>=128 && iSize<=512 ){
    p->nRate = (1600 - ((iSize + 31)&~31)*2)/8;
  }else{
    p->nRate = (1600 - 2*256)/8;
  }
#if SHA3_BYTEORDER==1234
  /* Known to be little-endian at compile-time. No-op */
#elif SHA3_BYTEORDER==4321
  p->ixMask = 7;  /* Big-endian */
#else
  {
................................................................................
        KeccakF1600Step(p);
        p->nLoaded = 0;
      }
    }
  }
#endif
  for(; i<nData; i++){
#if SHA3_BYTEORDER==1234
    p->u.x[p->nLoaded] ^= aData[i];
#elif SHA3_BYTEORDER==4321
    p->u.x[p->nLoaded^0x07] ^= aData[i];
#else
    p->u.x[p->nLoaded^p->ixMask] ^= aData[i];
#endif
    p->nLoaded++;

Changes to src/shun.c.

24
25
26
27
28
29
30

31
32
33
34
35
36
37
/*
** Return true if the given artifact ID should be shunned.
*/
int uuid_is_shunned(const char *zUuid){
  static Stmt q;
  int rc;
  if( zUuid==0 || zUuid[0]==0 ) return 0;

  db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
  db_bind_text(&q, ":uuid", zUuid);
  rc = db_step(&q);
  db_reset(&q);
  return rc==SQLITE_ROW;
}








>







24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/*
** Return true if the given artifact ID should be shunned.
*/
int uuid_is_shunned(const char *zUuid){
  static Stmt q;
  int rc;
  if( zUuid==0 || zUuid[0]==0 ) return 0;
  if( g.eHashPolicy==HPOLICY_SHUN_SHA1 && zUuid[HNAME_LEN_SHA1]==0 ) return 1;
  db_static_prepare(&q, "SELECT 1 FROM shun WHERE uuid=:uuid");
  db_bind_text(&q, ":uuid", zUuid);
  rc = db_step(&q);
  db_reset(&q);
  return rc==SQLITE_ROW;
}

Changes to src/sqlcmd.c.

210
211
212
213
214
215
216



217
218
219
220
221
222
223
**                              all files contained in check-in X.  Example:
**                                SELECT * FROM files_of_checkin('trunk');
*/
void cmd_sqlite3(void){
  int noRepository;
  const char *zConfigDb;
  extern int sqlite3_shell(int, char**);



  noRepository = find_option("no-repository", 0, 0)!=0;
  if( !noRepository ){
    db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
  }
  db_open_config(1,0);
  zConfigDb = g.zConfigDbName;
  fossil_close(1, noRepository);







>
>
>







210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
**                              all files contained in check-in X.  Example:
**                                SELECT * FROM files_of_checkin('trunk');
*/
void cmd_sqlite3(void){
  int noRepository;
  const char *zConfigDb;
  extern int sqlite3_shell(int, char**);
#ifdef FOSSIL_ENABLE_TH1_HOOKS
  g.fNoThHook = 1;
#endif
  noRepository = find_option("no-repository", 0, 0)!=0;
  if( !noRepository ){
    db_find_and_open_repository(OPEN_ANY_SCHEMA, 0);
  }
  db_open_config(1,0);
  zConfigDb = g.zConfigDbName;
  fossil_close(1, noRepository);

Changes to src/stash.c.

330
331
332
333
334
335
336


337
338
339
340
341
342
343
344
345
346
347
348
349
350
351


352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372

373
374






375
376
377
378
379
380
381
382
383
384
385
...
431
432
433
434
435
436
437

438
439


440
441
442
443
444
445
446
...
458
459
460
461
462
463
464
465

466
467
468
469
470
471
472

473
474
475
476
477
478
479
...
652
653
654
655
656
657
658

659

660
661
662
663
664
665
666



667
668
669
670
671
672
673
674
675
676
677
678
679
680
      diff_print_index(zNew, diffFlags);
      isBin1 = 0;
      isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
      diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
                    zBinGlob, fIncludeBinary, diffFlags);
    }else if( isRemoved ){
      fossil_print("DELETE %s\n", zOrig);


      if( fBaseline==0 ){
        if( file_wd_islink(zOPath) ){
          blob_read_link(&a, zOPath);
        }else{
          blob_read_from_file(&a, zOPath);
        }
      }else{
        content_get(rid, &a);
      }
      diff_print_index(zNew, diffFlags);
      isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
      isBin2 = 0;
      diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
                    zBinGlob, fIncludeBinary, diffFlags);
    }else{


      Blob delta, disk;
      int isOrigLink = file_wd_islink(zOPath);
      db_ephemeral_blob(&q, 6, &delta);
      if( fBaseline==0 ){
        if( isOrigLink ){
          blob_read_link(&disk, zOPath);
        }else{
          blob_read_from_file(&disk, zOPath);
        }
      }
      fossil_print("CHANGED %s\n", zNew);
      if( !isOrigLink != !isLink ){
        diff_print_index(zNew, diffFlags);
        diff_print_filenames(zOrig, zNew, diffFlags);
        printf(DIFF_CANNOT_COMPUTE_SYMLINK);
      }else{
        Blob *pBase = fBaseline ? &a : &disk;
        content_get(rid, &a);
        blob_delta_apply(&a, &delta, &b);
        isBin1 = fIncludeBinary ? 0 : looks_like_binary(pBase);
        isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);

        diff_file_mem(fBaseline? &a : &disk, &b, isBin1, isBin2, zNew,
                      zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);






        blob_reset(&a);
        blob_reset(&b);
      }
      if( !fBaseline ) blob_reset(&disk);
      blob_reset(&delta);
    }
 }
  db_finalize(&q);
}

/*
................................................................................
**
**  fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
**
**     List all changes sets currently stashed.  Show information about
**     individual files in each changeset if -v or --verbose is used.
**
**  fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?

**
**     Show the contents of a stash.


**
**  fossil stash pop
**  fossil stash apply ?STASHID?
**
**     Apply STASHID or the most recently create stash to the current
**     working checkout.  The "pop" command deletes that changeset from
**     the stash after applying it but the "apply" command retains the
................................................................................
**     -a|--all flag is used.  Individual drops are undoable but -a|--all
**     is not.
**
**  fossil stash diff ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
**
**     Show diffs of the current working directory and what that
**     directory would be if STASHID were applied.

**
** SUMMARY:
**  fossil stash
**  fossil stash save ?-m|--comment COMMENT? ?FILES...?
**  fossil stash snapshot ?-m|--comment COMMENT? ?FILES...?
**  fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
**  fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?

**  fossil stash pop
**  fossil stash apply|goto ?STASHID?
**  fossil stash drop|rm ?STASHID? ?-a|--all?
**  fossil stash diff ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
*/
void stash_cmd(void){
................................................................................
                  "(SELECT origname FROM stashfile WHERE stashid=%d)",
                  stashid);
    undo_finish();
  }else
  if( memcmp(zCmd, "diff", nCmd)==0
   || memcmp(zCmd, "gdiff", nCmd)==0
   || memcmp(zCmd, "show", nCmd)==0

   || memcmp(zCmd, "cat", nCmd)==0

  ){
    const char *zDiffCmd = 0;
    const char *zBinGlob = 0;
    int fIncludeBinary = 0;
    int fBaseline = zCmd[0]=='s' || zCmd[0]=='c';
    u64 diffFlags;




    if( find_option("tk",0,0)!=0 ){
      db_close(0);
      diff_tk(fBaseline ? "stash show" : "stash diff", 3);
      return;
    }
    if( find_option("internal","i",0)==0 ){
      zDiffCmd = diff_command_external(memcmp(zCmd, "gdiff", nCmd)==0);
    }
    diffFlags = diff_options();
    if( find_option("verbose","v",0)!=0 ) diffFlags |= DIFF_VERBOSE;
    if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
    if( zDiffCmd ){
      zBinGlob = diff_get_binary_glob();
      fIncludeBinary = diff_include_binary_files();







>
>
|
<
<
<
<
<
<

<
<
|
<
|
|
|
>
>
|


<
<
<
<
<
<
<






<


|

>
|
|
>
>
>
>
>
>



<







 







>

|
>
>







 







|
>







>







 







>

>




|


>
>
>






|







330
331
332
333
334
335
336
337
338
339






340


341

342
343
344
345
346
347
348
349







350
351
352
353
354
355

356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371

372
373
374
375
376
377
378
...
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
...
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
...
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
      diff_print_index(zNew, diffFlags);
      isBin1 = 0;
      isBin2 = fIncludeBinary ? 0 : looks_like_binary(&a);
      diff_file_mem(&empty, &a, isBin1, isBin2, zNew, zDiffCmd,
                    zBinGlob, fIncludeBinary, diffFlags);
    }else if( isRemoved ){
      fossil_print("DELETE %s\n", zOrig);
      diff_print_index(zNew, diffFlags);
      isBin2 = 0;
      if( fBaseline ){






        content_get(rid, &a);


        isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);

        diff_file_mem(&a, &empty, isBin1, isBin2, zOrig, zDiffCmd,
                      zBinGlob, fIncludeBinary, diffFlags);
      }else{
      }
    }else{
      Blob delta;
      int isOrigLink = file_wd_islink(zOPath);
      db_ephemeral_blob(&q, 6, &delta);







      fossil_print("CHANGED %s\n", zNew);
      if( !isOrigLink != !isLink ){
        diff_print_index(zNew, diffFlags);
        diff_print_filenames(zOrig, zNew, diffFlags);
        printf(DIFF_CANNOT_COMPUTE_SYMLINK);
      }else{

        content_get(rid, &a);
        blob_delta_apply(&a, &delta, &b);
        isBin1 = fIncludeBinary ? 0 : looks_like_binary(&a);
        isBin2 = fIncludeBinary ? 0 : looks_like_binary(&b);
        if( fBaseline ){
          diff_file_mem(&a, &b, isBin1, isBin2, zNew,
                        zDiffCmd, zBinGlob, fIncludeBinary, diffFlags);
        }else{
          /*Diff with file on disk using fSwapDiff=1 to show the diff in the
            same direction as if fBaseline=1.*/
          diff_file(&b, isBin2, zOPath, zNew, zDiffCmd,
              zBinGlob, fIncludeBinary, diffFlags, 1);
        }
        blob_reset(&a);
        blob_reset(&b);
      }

      blob_reset(&delta);
    }
 }
  db_finalize(&q);
}

/*
................................................................................
**
**  fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
**
**     List all changes sets currently stashed.  Show information about
**     individual files in each changeset if -v or --verbose is used.
**
**  fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
**
**     Show the contents of a stash as a diff against it's baseline.
**     With gshow and gcat, gdiff-command is used instead of internal
**     diff logic.
**
**  fossil stash pop
**  fossil stash apply ?STASHID?
**
**     Apply STASHID or the most recently create stash to the current
**     working checkout.  The "pop" command deletes that changeset from
**     the stash after applying it but the "apply" command retains the
................................................................................
**     -a|--all flag is used.  Individual drops are undoable but -a|--all
**     is not.
**
**  fossil stash diff ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
**
**     Show diffs of the current working directory and what that
**     directory would be if STASHID were applied. With gdiff,
**     gdiff-command is used instead of internal diff logic.
**
** SUMMARY:
**  fossil stash
**  fossil stash save ?-m|--comment COMMENT? ?FILES...?
**  fossil stash snapshot ?-m|--comment COMMENT? ?FILES...?
**  fossil stash list|ls ?-v|--verbose? ?-W|--width <num>?
**  fossil stash show|cat ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gshow|gcat ?STASHID? ?DIFF-OPTIONS?
**  fossil stash pop
**  fossil stash apply|goto ?STASHID?
**  fossil stash drop|rm ?STASHID? ?-a|--all?
**  fossil stash diff ?STASHID? ?DIFF-OPTIONS?
**  fossil stash gdiff ?STASHID? ?DIFF-OPTIONS?
*/
void stash_cmd(void){
................................................................................
                  "(SELECT origname FROM stashfile WHERE stashid=%d)",
                  stashid);
    undo_finish();
  }else
  if( memcmp(zCmd, "diff", nCmd)==0
   || memcmp(zCmd, "gdiff", nCmd)==0
   || memcmp(zCmd, "show", nCmd)==0
   || memcmp(zCmd, "gshow", nCmd)==0
   || memcmp(zCmd, "cat", nCmd)==0
   || memcmp(zCmd, "gcat", nCmd)==0
  ){
    const char *zDiffCmd = 0;
    const char *zBinGlob = 0;
    int fIncludeBinary = 0;
    int fBaseline = 0;
    u64 diffFlags;

    if( strstr(zCmd,"show")!=0 || strstr(zCmd,"cat")!=0 ){
      fBaseline = 1;
    }
    if( find_option("tk",0,0)!=0 ){
      db_close(0);
      diff_tk(fBaseline ? "stash show" : "stash diff", 3);
      return;
    }
    if( find_option("internal","i",0)==0 ){
      zDiffCmd = diff_command_external(zCmd[0]=='g');
    }
    diffFlags = diff_options();
    if( find_option("verbose","v",0)!=0 ) diffFlags |= DIFF_VERBOSE;
    if( g.argc>4 ) usage(mprintf("%s ?STASHID? ?DIFF-OPTIONS?", zCmd));
    if( zDiffCmd ){
      zBinGlob = diff_get_binary_glob();
      fIncludeBinary = diff_include_binary_files();

Changes to src/stat.c.

181
182
183
184
185
186
187




188

189
190
191
192
193
194
195
  @ <tr><th>Fossil&nbsp;Version:</th><td>
  @ %h(MANIFEST_DATE) %h(MANIFEST_VERSION)
  @ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
  @ </td></tr>
  @ <tr><th>SQLite&nbsp;Version:</th><td>%.19s(sqlite3_sourceid())
  @ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
  @ <a href='version?verbose=2'>(details)</a></td></tr>




  @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>

  @ <tr><th>Repository Rebuilt:</th><td>
  @ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
  @ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
  @ <tr><th>Database&nbsp;Stats:</th><td>
  @ %d(db_int(0, "PRAGMA repository.page_count")) pages,
  @ %d(db_int(0, "PRAGMA repository.page_size")) bytes/page,
  @ %d(db_int(0, "PRAGMA repository.freelist_count")) free pages,







>
>
>
>
|
>







181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
  @ <tr><th>Fossil&nbsp;Version:</th><td>
  @ %h(MANIFEST_DATE) %h(MANIFEST_VERSION)
  @ (%h(RELEASE_VERSION)) <a href='version?verbose=1'>(details)</a>
  @ </td></tr>
  @ <tr><th>SQLite&nbsp;Version:</th><td>%.19s(sqlite3_sourceid())
  @ [%.10s(&sqlite3_sourceid()[20])] (%s(sqlite3_libversion()))
  @ <a href='version?verbose=2'>(details)</a></td></tr>
  if( g.eHashPolicy!=HPOLICY_AUTO ){
    @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema),
    @ %s(hpolicy_name())</td></tr>
  }else{
    @ <tr><th>Schema&nbsp;Version:</th><td>%h(g.zAuxSchema)</td></tr>
  }
  @ <tr><th>Repository Rebuilt:</th><td>
  @ %h(db_get_mtime("rebuilt","%Y-%m-%d %H:%M:%S","Never"))
  @ By Fossil %h(db_get("rebuilt","Unknown"))</td></tr>
  @ <tr><th>Database&nbsp;Stats:</th><td>
  @ %d(db_int(0, "PRAGMA repository.page_count")) pages,
  @ %d(db_int(0, "PRAGMA repository.page_size")) bytes/page,
  @ %d(db_int(0, "PRAGMA repository.freelist_count")) free pages,

Changes to src/unversioned.c.

454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
...
552
553
554
555
556
557
558

























































**
** Display a list of all unversioned files in the repository.
** Query parameters:
**
**    byage=1          Order the initial display be decreasing age
**    showdel=0        Show deleted files
*/
void uvstat_page(void){
  Stmt q;
  sqlite3_int64 iNow;
  sqlite3_int64 iTotalSz = 0;
  int cnt = 0;
  int n = 0;
  const char *zOrderBy = "name";
  int showDel = 0;
................................................................................
     @ </table></div>
     output_table_sorting_javascript("uvtab","tkKttN",1);
   }else{
     @ No unversioned files on this server.
   }
   style_footer();
}
































































|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
...
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
**
** Display a list of all unversioned files in the repository.
** Query parameters:
**
**    byage=1          Order the initial display be decreasing age
**    showdel=0        Show deleted files
*/
void uvlist_page(void){
  Stmt q;
  sqlite3_int64 iNow;
  sqlite3_int64 iTotalSz = 0;
  int cnt = 0;
  int n = 0;
  const char *zOrderBy = "name";
  int showDel = 0;
................................................................................
     @ </table></div>
     output_table_sorting_javascript("uvtab","tkKttN",1);
   }else{
     @ No unversioned files on this server.
   }
   style_footer();
}

/*
** WEBPAGE: juvlist
**
** Return a complete list of unversioned files as JSON.  The JSON
** looks like this:
**
** [{"name":NAME,
**   "mtime":MTIME,
**   "hash":HASH,
**   "size":SIZE,
**   "user":USER}]
*/
void uvlist_json_page(void){
  Stmt q;
  char *zSep = "[";
  Blob json;

  login_check_credentials();
  if( !g.perm.Read ){ login_needed(g.anon.Read); return; }
  cgi_set_content_type("text/json");
  if( !db_table_exists("repository","unversioned") ){
    blob_init(&json, "[]", -1);
    cgi_set_content(&json);
    return;
  }
  blob_init(&json, 0, 0);
  db_prepare(&q,
     "SELECT"
     "   name,"
     "   mtime,"
     "   hash,"
     "   sz,"
     "   (SELECT login FROM rcvfrom, user"
     "     WHERE user.uid=rcvfrom.uid AND rcvfrom.rcvid=unversioned.rcvid)"
     " FROM unversioned WHERE hash IS NOT NULL"
   );
   while( db_step(&q)==SQLITE_ROW ){
     const char *zName = db_column_text(&q, 0);
     sqlite3_int64 mtime = db_column_int(&q, 1);
     const char *zHash = db_column_text(&q, 2);
     int fullSize = db_column_int(&q, 3);
     const char *zLogin = db_column_text(&q, 4);
     if( zLogin==0 ) zLogin = "";
     blob_appendf(&json, "%s{\"name\":\"", zSep);
     zSep = ",\n ";
     blob_append_json_string(&json, zName);
     blob_appendf(&json, "\",\n  \"mtime\":%lld,\n  \"hash\":\"", mtime);
     blob_append_json_string(&json, zHash);
     blob_appendf(&json, "\",\n  \"size\":%d,\n  \"user\":\"", fullSize);
     blob_append_json_string(&json, zLogin);
     blob_appendf(&json, "\"}");
   }
   db_finalize(&q);
   blob_appendf(&json,"]\n");
   cgi_set_content(&json);
}

Changes to src/wiki.c.

1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
** timestamp. Returns 0 if there is no such item and -1 if the details
** are ambiguous and could refer to multiple items.
*/
int wiki_technote_to_rid(const char *zETime) {
  int rid=0;                    /* Artifact ID of the tech note */
  int nETime = strlen(zETime);
  Stmt q;
  if( nETime>=4 && hname_validate(zETime, nETime) ){
    char zUuid[HNAME_MAX+1];
    memcpy(zUuid, zETime, nETime+1);
    canonical16(zUuid, nETime);
    db_prepare(&q,
      "SELECT e.objid"
      "  FROM event e, tag t"
      " WHERE e.type='e' AND e.tagid IS NOT NULL AND t.tagid=e.tagid"







|







1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
** timestamp. Returns 0 if there is no such item and -1 if the details
** are ambiguous and could refer to multiple items.
*/
int wiki_technote_to_rid(const char *zETime) {
  int rid=0;                    /* Artifact ID of the tech note */
  int nETime = strlen(zETime);
  Stmt q;
  if( nETime>=4 && nETime<=HNAME_MAX && validate16(zETime, nETime) ){
    char zUuid[HNAME_MAX+1];
    memcpy(zUuid, zETime, nETime+1);
    canonical16(zUuid, nETime);
    db_prepare(&q,
      "SELECT e.objid"
      "  FROM event e, tag t"
      " WHERE e.type='e' AND e.tagid IS NOT NULL AND t.tagid=e.tagid"

Changes to src/xfer.c.

1766
1767
1768
1769
1770
1771
1772

1773
1774
1775
1776
1777
1778
1779
  transport_stats(0, 0, 1);
  socket_global_init();
  memset(&xfer, 0, sizeof(xfer));
  xfer.pIn = &recv;
  xfer.pOut = &send;
  xfer.mxSend = db_get_int("max-upload", 250000);
  xfer.maxTime = -1;

  if( syncFlags & SYNC_PRIVATE ){
    g.perm.Private = 1;
    xfer.syncPrivate = 1;
  }

  blobarray_zero(xfer.aToken, count(xfer.aToken));
  blob_zero(&send);







>







1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
  transport_stats(0, 0, 1);
  socket_global_init();
  memset(&xfer, 0, sizeof(xfer));
  xfer.pIn = &recv;
  xfer.pOut = &send;
  xfer.mxSend = db_get_int("max-upload", 250000);
  xfer.maxTime = -1;
  xfer.clientVersion = RELEASE_VERSION_NUMBER;
  if( syncFlags & SYNC_PRIVATE ){
    g.perm.Private = 1;
    xfer.syncPrivate = 1;
  }

  blobarray_zero(xfer.aToken, count(xfer.aToken));
  blob_zero(&send);

Changes to win/Makefile.mingw.mistachkin.

459
460
461
462
463
464
465

466
467
468
469
470
471
472
...
509
510
511
512
513
514
515


516
517
518
519
520
521
522
...
634
635
636
637
638
639
640

641
642
643
644
645
646
647
...
684
685
686
687
688
689
690


691
692
693
694
695
696
697
...
758
759
760
761
762
763
764

765
766
767
768
769
770
771
...
808
809
810
811
812
813
814


815
816
817
818
819
820
821
....
1093
1094
1095
1096
1097
1098
1099

1100
1101
1102
1103
1104
1105
1106
....
1143
1144
1145
1146
1147
1148
1149


1150
1151
1152
1153
1154
1155
1156
....
1496
1497
1498
1499
1500
1501
1502








1503
1504
1505
1506
1507
1508
1509
....
1896
1897
1898
1899
1900
1901
1902
















1903
1904
1905
1906
1907
1908
1909
  $(SRCDIR)/finfo.c \
  $(SRCDIR)/foci.c \
  $(SRCDIR)/fshell.c \
  $(SRCDIR)/fusefs.c \
  $(SRCDIR)/glob.c \
  $(SRCDIR)/graph.c \
  $(SRCDIR)/gzip.c \

  $(SRCDIR)/http.c \
  $(SRCDIR)/http_socket.c \
  $(SRCDIR)/http_ssl.c \
  $(SRCDIR)/http_transport.c \
  $(SRCDIR)/import.c \
  $(SRCDIR)/info.c \
  $(SRCDIR)/json.c \
................................................................................
  $(SRCDIR)/regexp.c \
  $(SRCDIR)/report.c \
  $(SRCDIR)/rss.c \
  $(SRCDIR)/schema.c \
  $(SRCDIR)/search.c \
  $(SRCDIR)/setup.c \
  $(SRCDIR)/sha1.c \


  $(SRCDIR)/shun.c \
  $(SRCDIR)/sitemap.c \
  $(SRCDIR)/skins.c \
  $(SRCDIR)/sqlcmd.c \
  $(SRCDIR)/stash.c \
  $(SRCDIR)/stat.c \
  $(SRCDIR)/statrep.c \
................................................................................
  $(OBJDIR)/finfo_.c \
  $(OBJDIR)/foci_.c \
  $(OBJDIR)/fshell_.c \
  $(OBJDIR)/fusefs_.c \
  $(OBJDIR)/glob_.c \
  $(OBJDIR)/graph_.c \
  $(OBJDIR)/gzip_.c \

  $(OBJDIR)/http_.c \
  $(OBJDIR)/http_socket_.c \
  $(OBJDIR)/http_ssl_.c \
  $(OBJDIR)/http_transport_.c \
  $(OBJDIR)/import_.c \
  $(OBJDIR)/info_.c \
  $(OBJDIR)/json_.c \
................................................................................
  $(OBJDIR)/regexp_.c \
  $(OBJDIR)/report_.c \
  $(OBJDIR)/rss_.c \
  $(OBJDIR)/schema_.c \
  $(OBJDIR)/search_.c \
  $(OBJDIR)/setup_.c \
  $(OBJDIR)/sha1_.c \


  $(OBJDIR)/shun_.c \
  $(OBJDIR)/sitemap_.c \
  $(OBJDIR)/skins_.c \
  $(OBJDIR)/sqlcmd_.c \
  $(OBJDIR)/stash_.c \
  $(OBJDIR)/stat_.c \
  $(OBJDIR)/statrep_.c \
................................................................................
 $(OBJDIR)/finfo.o \
 $(OBJDIR)/foci.o \
 $(OBJDIR)/fshell.o \
 $(OBJDIR)/fusefs.o \
 $(OBJDIR)/glob.o \
 $(OBJDIR)/graph.o \
 $(OBJDIR)/gzip.o \

 $(OBJDIR)/http.o \
 $(OBJDIR)/http_socket.o \
 $(OBJDIR)/http_ssl.o \
 $(OBJDIR)/http_transport.o \
 $(OBJDIR)/import.o \
 $(OBJDIR)/info.o \
 $(OBJDIR)/json.o \
................................................................................
 $(OBJDIR)/regexp.o \
 $(OBJDIR)/report.o \
 $(OBJDIR)/rss.o \
 $(OBJDIR)/schema.o \
 $(OBJDIR)/search.o \
 $(OBJDIR)/setup.o \
 $(OBJDIR)/sha1.o \


 $(OBJDIR)/shun.o \
 $(OBJDIR)/sitemap.o \
 $(OBJDIR)/skins.o \
 $(OBJDIR)/sqlcmd.o \
 $(OBJDIR)/stash.o \
 $(OBJDIR)/stat.o \
 $(OBJDIR)/statrep.o \
................................................................................
		$(OBJDIR)/finfo_.c:$(OBJDIR)/finfo.h \
		$(OBJDIR)/foci_.c:$(OBJDIR)/foci.h \
		$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
		$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
		$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
		$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
		$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \

		$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
		$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
		$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
		$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
		$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
		$(OBJDIR)/info_.c:$(OBJDIR)/info.h \
		$(OBJDIR)/json_.c:$(OBJDIR)/json.h \
................................................................................
		$(OBJDIR)/regexp_.c:$(OBJDIR)/regexp.h \
		$(OBJDIR)/report_.c:$(OBJDIR)/report.h \
		$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
		$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
		$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
		$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
		$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \


		$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
		$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
		$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
		$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
		$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
		$(OBJDIR)/stat_.c:$(OBJDIR)/stat.h \
		$(OBJDIR)/statrep_.c:$(OBJDIR)/statrep.h \
................................................................................
$(OBJDIR)/gzip_.c:	$(SRCDIR)/gzip.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/gzip.c >$@

$(OBJDIR)/gzip.o:	$(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c

$(OBJDIR)/gzip.h:	$(OBJDIR)/headers









$(OBJDIR)/http_.c:	$(SRCDIR)/http.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/http.c >$@

$(OBJDIR)/http.o:	$(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/http.o -c $(OBJDIR)/http_.c

................................................................................
$(OBJDIR)/sha1_.c:	$(SRCDIR)/sha1.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/sha1.c >$@

$(OBJDIR)/sha1.o:	$(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c

$(OBJDIR)/sha1.h:	$(OBJDIR)/headers

















$(OBJDIR)/shun_.c:	$(SRCDIR)/shun.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/shun.c >$@

$(OBJDIR)/shun.o:	$(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/shun.o -c $(OBJDIR)/shun_.c








>







 







>
>







 







>







 







>
>







 







>







 







>
>







 







>







 







>
>







 







>
>
>
>
>
>
>
>







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
...
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
...
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
...
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
...
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
...
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
....
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
....
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
....
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
....
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
  $(SRCDIR)/finfo.c \
  $(SRCDIR)/foci.c \
  $(SRCDIR)/fshell.c \
  $(SRCDIR)/fusefs.c \
  $(SRCDIR)/glob.c \
  $(SRCDIR)/graph.c \
  $(SRCDIR)/gzip.c \
  $(SRCDIR)/hname.c \
  $(SRCDIR)/http.c \
  $(SRCDIR)/http_socket.c \
  $(SRCDIR)/http_ssl.c \
  $(SRCDIR)/http_transport.c \
  $(SRCDIR)/import.c \
  $(SRCDIR)/info.c \
  $(SRCDIR)/json.c \
................................................................................
  $(SRCDIR)/regexp.c \
  $(SRCDIR)/report.c \
  $(SRCDIR)/rss.c \
  $(SRCDIR)/schema.c \
  $(SRCDIR)/search.c \
  $(SRCDIR)/setup.c \
  $(SRCDIR)/sha1.c \
  $(SRCDIR)/sha1hard.c \
  $(SRCDIR)/sha3.c \
  $(SRCDIR)/shun.c \
  $(SRCDIR)/sitemap.c \
  $(SRCDIR)/skins.c \
  $(SRCDIR)/sqlcmd.c \
  $(SRCDIR)/stash.c \
  $(SRCDIR)/stat.c \
  $(SRCDIR)/statrep.c \
................................................................................
  $(OBJDIR)/finfo_.c \
  $(OBJDIR)/foci_.c \
  $(OBJDIR)/fshell_.c \
  $(OBJDIR)/fusefs_.c \
  $(OBJDIR)/glob_.c \
  $(OBJDIR)/graph_.c \
  $(OBJDIR)/gzip_.c \
  $(OBJDIR)/hname_.c \
  $(OBJDIR)/http_.c \
  $(OBJDIR)/http_socket_.c \
  $(OBJDIR)/http_ssl_.c \
  $(OBJDIR)/http_transport_.c \
  $(OBJDIR)/import_.c \
  $(OBJDIR)/info_.c \
  $(OBJDIR)/json_.c \
................................................................................
  $(OBJDIR)/regexp_.c \
  $(OBJDIR)/report_.c \
  $(OBJDIR)/rss_.c \
  $(OBJDIR)/schema_.c \
  $(OBJDIR)/search_.c \
  $(OBJDIR)/setup_.c \
  $(OBJDIR)/sha1_.c \
  $(OBJDIR)/sha1hard_.c \
  $(OBJDIR)/sha3_.c \
  $(OBJDIR)/shun_.c \
  $(OBJDIR)/sitemap_.c \
  $(OBJDIR)/skins_.c \
  $(OBJDIR)/sqlcmd_.c \
  $(OBJDIR)/stash_.c \
  $(OBJDIR)/stat_.c \
  $(OBJDIR)/statrep_.c \
................................................................................
 $(OBJDIR)/finfo.o \
 $(OBJDIR)/foci.o \
 $(OBJDIR)/fshell.o \
 $(OBJDIR)/fusefs.o \
 $(OBJDIR)/glob.o \
 $(OBJDIR)/graph.o \
 $(OBJDIR)/gzip.o \
 $(OBJDIR)/hname.o \
 $(OBJDIR)/http.o \
 $(OBJDIR)/http_socket.o \
 $(OBJDIR)/http_ssl.o \
 $(OBJDIR)/http_transport.o \
 $(OBJDIR)/import.o \
 $(OBJDIR)/info.o \
 $(OBJDIR)/json.o \
................................................................................
 $(OBJDIR)/regexp.o \
 $(OBJDIR)/report.o \
 $(OBJDIR)/rss.o \
 $(OBJDIR)/schema.o \
 $(OBJDIR)/search.o \
 $(OBJDIR)/setup.o \
 $(OBJDIR)/sha1.o \
 $(OBJDIR)/sha1hard.o \
 $(OBJDIR)/sha3.o \
 $(OBJDIR)/shun.o \
 $(OBJDIR)/sitemap.o \
 $(OBJDIR)/skins.o \
 $(OBJDIR)/sqlcmd.o \
 $(OBJDIR)/stash.o \
 $(OBJDIR)/stat.o \
 $(OBJDIR)/statrep.o \
................................................................................
		$(OBJDIR)/finfo_.c:$(OBJDIR)/finfo.h \
		$(OBJDIR)/foci_.c:$(OBJDIR)/foci.h \
		$(OBJDIR)/fshell_.c:$(OBJDIR)/fshell.h \
		$(OBJDIR)/fusefs_.c:$(OBJDIR)/fusefs.h \
		$(OBJDIR)/glob_.c:$(OBJDIR)/glob.h \
		$(OBJDIR)/graph_.c:$(OBJDIR)/graph.h \
		$(OBJDIR)/gzip_.c:$(OBJDIR)/gzip.h \
		$(OBJDIR)/hname_.c:$(OBJDIR)/hname.h \
		$(OBJDIR)/http_.c:$(OBJDIR)/http.h \
		$(OBJDIR)/http_socket_.c:$(OBJDIR)/http_socket.h \
		$(OBJDIR)/http_ssl_.c:$(OBJDIR)/http_ssl.h \
		$(OBJDIR)/http_transport_.c:$(OBJDIR)/http_transport.h \
		$(OBJDIR)/import_.c:$(OBJDIR)/import.h \
		$(OBJDIR)/info_.c:$(OBJDIR)/info.h \
		$(OBJDIR)/json_.c:$(OBJDIR)/json.h \
................................................................................
		$(OBJDIR)/regexp_.c:$(OBJDIR)/regexp.h \
		$(OBJDIR)/report_.c:$(OBJDIR)/report.h \
		$(OBJDIR)/rss_.c:$(OBJDIR)/rss.h \
		$(OBJDIR)/schema_.c:$(OBJDIR)/schema.h \
		$(OBJDIR)/search_.c:$(OBJDIR)/search.h \
		$(OBJDIR)/setup_.c:$(OBJDIR)/setup.h \
		$(OBJDIR)/sha1_.c:$(OBJDIR)/sha1.h \
		$(OBJDIR)/sha1hard_.c:$(OBJDIR)/sha1hard.h \
		$(OBJDIR)/sha3_.c:$(OBJDIR)/sha3.h \
		$(OBJDIR)/shun_.c:$(OBJDIR)/shun.h \
		$(OBJDIR)/sitemap_.c:$(OBJDIR)/sitemap.h \
		$(OBJDIR)/skins_.c:$(OBJDIR)/skins.h \
		$(OBJDIR)/sqlcmd_.c:$(OBJDIR)/sqlcmd.h \
		$(OBJDIR)/stash_.c:$(OBJDIR)/stash.h \
		$(OBJDIR)/stat_.c:$(OBJDIR)/stat.h \
		$(OBJDIR)/statrep_.c:$(OBJDIR)/statrep.h \
................................................................................
$(OBJDIR)/gzip_.c:	$(SRCDIR)/gzip.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/gzip.c >$@

$(OBJDIR)/gzip.o:	$(OBJDIR)/gzip_.c $(OBJDIR)/gzip.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/gzip.o -c $(OBJDIR)/gzip_.c

$(OBJDIR)/gzip.h:	$(OBJDIR)/headers

$(OBJDIR)/hname_.c:	$(SRCDIR)/hname.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/hname.c >$@

$(OBJDIR)/hname.o:	$(OBJDIR)/hname_.c $(OBJDIR)/hname.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/hname.o -c $(OBJDIR)/hname_.c

$(OBJDIR)/hname.h:	$(OBJDIR)/headers

$(OBJDIR)/http_.c:	$(SRCDIR)/http.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/http.c >$@

$(OBJDIR)/http.o:	$(OBJDIR)/http_.c $(OBJDIR)/http.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/http.o -c $(OBJDIR)/http_.c

................................................................................
$(OBJDIR)/sha1_.c:	$(SRCDIR)/sha1.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/sha1.c >$@

$(OBJDIR)/sha1.o:	$(OBJDIR)/sha1_.c $(OBJDIR)/sha1.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/sha1.o -c $(OBJDIR)/sha1_.c

$(OBJDIR)/sha1.h:	$(OBJDIR)/headers

$(OBJDIR)/sha1hard_.c:	$(SRCDIR)/sha1hard.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/sha1hard.c >$@

$(OBJDIR)/sha1hard.o:	$(OBJDIR)/sha1hard_.c $(OBJDIR)/sha1hard.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/sha1hard.o -c $(OBJDIR)/sha1hard_.c

$(OBJDIR)/sha1hard.h:	$(OBJDIR)/headers

$(OBJDIR)/sha3_.c:	$(SRCDIR)/sha3.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/sha3.c >$@

$(OBJDIR)/sha3.o:	$(OBJDIR)/sha3_.c $(OBJDIR)/sha3.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/sha3.o -c $(OBJDIR)/sha3_.c

$(OBJDIR)/sha3.h:	$(OBJDIR)/headers

$(OBJDIR)/shun_.c:	$(SRCDIR)/shun.c $(TRANSLATE)
	$(TRANSLATE) $(SRCDIR)/shun.c >$@

$(OBJDIR)/shun.o:	$(OBJDIR)/shun_.c $(OBJDIR)/shun.h $(SRCDIR)/config.h
	$(XTCC) -o $(OBJDIR)/shun.o -c $(OBJDIR)/shun_.c

Changes to www/changes.wiki.

1











2
3
4
5
6
7
8
<title>Change Log</title>












<a name='v2_0'></a>
<h2>Changes for Version 2.0 (2017-03-03)</h2>

  *  Use the
     [https://github.com/cr-marcstevens/sha1collisiondetection|hardened SHA1]
     implemenation by Marc Stevens and Dan Shumow.

>
>
>
>
>
>
>
>
>
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<title>Change Log</title>

<a name='v2_1'></a>
<h2>Changes for Version 2.1 (2017-03-??)</h2>

  *  Add support for [./hashpolicy.wiki|hash policies] that control which
     of the Hardened-SHA1 or SHA3-256 algorithms is used to name new
     artifacts.
  *  Add the "gshow" and "gcat" subcommands to [/help?cmd=stash|fossil stash].
  *  Add the [/help?cmd=/juvlist|/juvlist] web page and use it to construct
     the [/uv/download.html|Download Page] of the Fossil self-hosting website
     using Ajax.

<a name='v2_0'></a>
<h2>Changes for Version 2.0 (2017-03-03)</h2>

  *  Use the
     [https://github.com/cr-marcstevens/sha1collisiondetection|hardened SHA1]
     implemenation by Marc Stevens and Dan Shumow.

Added www/hashpolicy.wiki.



























































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
<title>Hash Policy</title>

<h2> Executive Summary, Or How To Avoid Reading This Article </h2>

There is much angst over the [http://www.shattered.io|Shattered attack]
against SHA1.  If you are concerned about this and its implications for
Fossil, simply upgrade to Fossil 2.0 or later and the problem will go away.
Everything will continue to work as before.  All of your legacy repositories
will continue to work and all of your old check-ins will still have the
same name.  Your workflow will be unchanged.

But if you are curious and want a deeper understanding of what is
going on, read on...


<h2> Introduction </h2>

The first snapshot-based distributed version control system
was [http://www.monotone.ca|Monotone].  Many of the ideas behind the design
of Fossil were copied from Monotone, including the use of a SHA1 hash to
assign names to artifacts.  Git and Mercurial did the same thing.

The SHA1 hash algorithm is used only to create names for artifacts in Fossil
(and in Git, Mercurial, and Monotone).  It is not used for security.
Nevertheless, when the [http://www.shattered.io|Shattered attack] found
two different PDF files with the same SHA1 hash, many users learned that
"SHA1 is broken".  They see that Fossil (and Git, Mercurial, and Monotone)
use SHA1 and they therefore conclude that "Fossil is broken".  This is
not true, but it is a public relations problem.  So the decision
was made to migrate Fossil away from SHA1.

This article describes how that migration is occurring.

<h2>Use Of Hardened SHA1</h2>

In Fossil version 2.0 ([/timeline?c=version-2.0|2017-03-03]),
the internal SHA1 implementation was changed from a generic
FIPS PUB 180-4 SHA1 implementation to a "Hardened SHA1"
&#91;[https://github.com/cr-marcstevens/sha1collisiondetection|1]&#93;
&#91;[https://marc-stevens.nl/research/papers/C13-S.pdf|2]&#93;.

The Hardened SHA1 implement automatically detects when the artifact
being hashed is specifically designed to exploit the known weaknesses
in the SHA1 algorithm, and when it detects such an attack it changes
the hash algorithm (by increasing the number of rounds in the compression
function) to make the algorithm secure again.  If the attack detection
gets a false possible, that means that Hardened SHA1 will get a different
answer than the standard FIPS PUB 180-4 SHA1, but the creators of
Hardened SHA1 (see the second paper
&#91;[https://marc-stevens.nl/research/papers/C13-S.pdf|2]&#93;)
report that the probability of a false positive is vanishingly small -
less than 1 false positive out of 10<sup><font size=1>27</font></sup>
hashes.

Hardened SHA1 is slower (and a lot bigger) but Fossil does not do that
much hashing, so performance is not really an issue.

All versions of Fossil moving forward will use Hardened SHA1.  So if
someone says "SHA1 is broken, and Fossil uses SHA1, therefore Fossil is
broken", you can rebut the argument by pointing out that Fossil uses
<em>Hardened SHA1</em> not generic SHA1 and Hardened SHA1 is <em>not</em>
broken.

<h2>Support For SHA3-256</h2>

Prior to Fossil version 2.0 ([/timeline?c=version-2.0|2017-03-03]),
all artifacts in all Fossil repositories were named
by only a SHA1 hash.
Version 2.0 extended the [./fileformat.wiki|Fossil file format]
to allow artifacts to be named by either SHA1 or SHA3-256 hashes.
(SHA3-256 is the only variant of SHA3 that
Fossil uses for artifact naming, so for the remainder of this article
it will be called simply "SHA3".  Similarly, "Hardened SHA1" will
shortened to "SHA1" in the sequel.)

Other than permitting the use of SHA3 in addition to SHA1, there
were no file format changes in Fossil version 2.0 relative
to the previous version 1.37.  Both Fossil 2.0 and Fossil 1.37 read
and write all the same repositories and sync with one another, as long
as none of the repositories contain artifacts named using SHA3.  If
a repository does contain artifacts named using SHA3, Fossil 1.37 will
not know how to interpret those artifacts and will generate various warnings
and errors.

<h2>How Fossil Decides Which Hash Algorithm To Use</h2>

If newer versions of Fossil are able to use either SHA1 or SHA3 to
name artifacts, which hash algorithm is actually used?  That question
is answered by the "hash policy".  These are the supported hash policies:

<table cellpadding=10>
<tr>
<td valign='top'>sha1</td>
<td>Name all new artifacts using the (Hardened) SHA1 hash algorithm.</td>
</tr>
<tr>
<td valign='top'>auto</td>
<td>Name new artifacts using the SHA1 hash algorithm.  But if any
artifacts are encountered which are already named using SHA3, then
automatically switch the hash policy to "sha3"</td>
</tr>
<tr>
<td valign='top'>sha3</td>
<td>Name new artifacts using the SHA3 hash algorithm if the artifact
does not already have a SHA1 name.  If the artifact already has a SHA1
name, then continue to use the older SHA1 name.  Use SHA3 for new
artifacts that have never before been encountered.</td>
</tr>
<tr>
<td valign='top'>sha3-only</td>
<td>Name new artifacts using the SHA3 hash algorithm even if the artifact
already has a SHA1 name.  In other words, force the use of SHA3.  This can
cause some artifacts to be added to the respository twice, once under their
SHA1 name and again under their SHA3 name.  But delta compression will
prevent that from causing repository size problems.</td>
</tr>
<tr>
<td valign='top'>shun-sha1</td>
<td>Like "sha3-only" but at this level do not accept a push of SHA1-named
artifacts.  If another Fossil instance tries to push a SHA1-named artifact,
that artifact is discarded and ignored.
</tr>
</table>

For Fossil 2.0, and obviously also for Fossil 1.37 and before, the
only hash policy supported was "sha1".  All new artifacts were named
using their SHA1 hash.
Even though Fossil 2.0 was capable of understanding SHA3 hashes, it
never actually generates any SHA3 hashes.

Beginning with Fossil 2.1, the default hash policy for legacy repositories
changed to "auto".
That means Fossil 2.1 will continue to generate only SHA1 hashes until it
encounters one artifact with a SHA3 hash.  Once a single SHA3 hash is
seen, Fossil automatically switches to "sha3" mode and thereafter generates
only SHA3 hashes.

When a new repository is created by cloning, the hash policy is copied
from the parent.

For new repositories created using the
[/help?cmd=new|fossil new] command the default hash policy is "sha3".
That means new repositories
will normally hold nothing except SHA3 hashes.  The hash policy for new
repositories can be overridden using the "--sha1" option to the
"fossil new" command.

Even after upgrading to Fossil 2.1, Fossil will continue to use nothing
but SHA1 hashes on legacy repositories, thus preserving complete
compatibility with Fossil 1.37 and before.  If you want Fossil to go
ahead and start using SHA3 hashes, change the hash policy to
"sha3" using a command like this:

<blockquote><verbatim>
fossil hash-policy sha3
</verbatim></blockquote>

The next check-in will use a SHA3 hash.  And when that check-in is pushed
to colleagues, their copies of Fossil will see the new SHA3-named artifact
and automatically convert to SHA3 as well.

Of course, if some members of your team stubbornly refuse to upgrade past
Fossil 1.37, you should avoid changing the hash policy and creating
artifacts with SHA3 names, because once you do that your recalcitrant
coworkers will no longer be able to collaborate.

<h2>A Pure SHA3 Future</h2>

At some point in the future, years from now, after everybody has finally
upgraded to Fossil 2.0 or later, the default hash policy will probably
change to "sha3", or maybe even "shun-sha1".  By the time that happens,
you will probably already be using SHA3 on all your projects and so you
are unlikely to notice.

Changes to www/mkdownload.tcl.

35
36
37
38
39
40
41

42
43
44
45
46
47
48
49
50
..
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
  if {[regexp -- {-(\d\.\d+)\.(tar\.gz|zip)$} $fn all version]} {
    set filehash($fn) [lindex $line 1]
    set avers($version) 1
  }
}
close $in


set vdate(1.37) 2017-01-15
set vdate(1.36) 2016-10-24

# Do all versions from newest to oldest
#
foreach vers [lsort -decr -real [array names avers]] {
  #  set hr "../timeline?c=version-$vers;y=ci"
  set v2 v[string map {. _} $vers]
  set hr "../doc/trunk/www/changes.wiki#$v2"
................................................................................
    puts $out " (<a href='$hr2'>$vdate($vers)</a>)"
  }
  puts $out "</b></center>"
  puts $out "</td></tr>"
  puts $out "<tr>"

  foreach {prefix img desc} {
    fossil-linux-x86 linux.gif {Linux 3.x x86}
    fossil-macosx mac.gif {Mac 10.x x86}
    fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
    fossil-w32 win32.gif {Windows}
    fossil-src src.gif {Source Tarball}
  } {
    set glob download/$prefix*-$vers*
    set filename [array names filesize $glob]







>

<







 







|







35
36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
..
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
  if {[regexp -- {-(\d\.\d+)\.(tar\.gz|zip)$} $fn all version]} {
    set filehash($fn) [lindex $line 1]
    set avers($version) 1
  }
}
close $in

set vdate(2.0) 2017-03-03
set vdate(1.37) 2017-01-15


# Do all versions from newest to oldest
#
foreach vers [lsort -decr -real [array names avers]] {
  #  set hr "../timeline?c=version-$vers;y=ci"
  set v2 v[string map {. _} $vers]
  set hr "../doc/trunk/www/changes.wiki#$v2"
................................................................................
    puts $out " (<a href='$hr2'>$vdate($vers)</a>)"
  }
  puts $out "</b></center>"
  puts $out "</td></tr>"
  puts $out "<tr>"

  foreach {prefix img desc} {
    fossil-linux linux.gif {Linux 3.x x64}
    fossil-macosx mac.gif {Mac 10.x x86}
    fossil-openbsd-x86 openbsd.gif {OpenBSD 5.x x86}
    fossil-w32 win32.gif {Windows}
    fossil-src src.gif {Source Tarball}
  } {
    set glob download/$prefix*-$vers*
    set filename [array names filesize $glob]

Changes to www/mkindex.tcl.

34
35
36
37
38
39
40

41
42
43
44
45
46
47
  faq.wiki {Frequently Asked Questions}
  fileformat.wiki {Fossil File Format}
  fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
  foss-cklist.wiki {Checklist For Successful Open-Source Projects}
  fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
  fossil-v-git.wiki {Fossil Versus Git}
  hacker-howto.wiki {Hacker How-To}

  /help {Lists of Commands and Webpages}
  hints.wiki {Fossil Tips And Usage Hints}
  index.wiki {Home Page}
  inout.wiki {Import And Export To And From Git}
  makefile.wiki {The Fossil Build Process}
  /md_rules {Markdown Formatting Rules}
  newrepo.wiki {How To Create A New Fossil Repository}







>







34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
  faq.wiki {Frequently Asked Questions}
  fileformat.wiki {Fossil File Format}
  fiveminutes.wiki {Update and Running in 5 Minutes as a Single User}
  foss-cklist.wiki {Checklist For Successful Open-Source Projects}
  fossil-from-msvc.wiki {Integrating Fossil in the Microsoft Express 2010 IDE}
  fossil-v-git.wiki {Fossil Versus Git}
  hacker-howto.wiki {Hacker How-To}
  hashpolicy.wiki {Hash Policy: Choosing Between SHA1 and SHA3-256}
  /help {Lists of Commands and Webpages}
  hints.wiki {Fossil Tips And Usage Hints}
  index.wiki {Home Page}
  inout.wiki {Import And Export To And From Git}
  makefile.wiki {The Fossil Build Process}
  /md_rules {Markdown Formatting Rules}
  newrepo.wiki {How To Create A New Fossil Repository}

Changes to www/permutedindex.html.

27
28
29
30
31
32
33

34
35
36
37
38
39
40
..
41
42
43
44
45
46
47

48
49
50
51
52
53
54
...
109
110
111
112
113
114
115

116
117
118
119
120
121
122
...
145
146
147
148
149
150
151

152
153
154
155
156
157
158
...
172
173
174
175
176
177
178


179
180
181
182
183
184
185
<li><a href="delta_encoder_algorithm.wiki">Algorithm &mdash; Fossil Delta Encoding</a></li>
<li><a href="blame.wiki">Algorithm Of Fossil &mdash; The Annotate/Blame</a></li>
<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil &mdash; The</a></li>
<li><a href="customskin.md">Appearance of Web Pages &mdash; Theming: Customizing The</a></li>
<li><a href="faq.wiki">Asked Questions &mdash; Frequently</a></li>
<li><a href="password.wiki">Authentication &mdash; Password Management And</a></li>
<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>

<li><a href="antibot.wiki">Bots &mdash; Defense against Spiders and</a></li>
<li><a href="private.wiki">Branches &mdash; Creating, Syncing, and Deleting Private</a></li>
<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
<li><a href="makefile.wiki">Build Process &mdash; The Fossil</a></li>
<li><a href="aboutcgi.wiki">CGI Works In Fossil &mdash; How</a></li>
<li><a href="changes.wiki">Changelog &mdash; Fossil</a></li>
................................................................................
<li><a href="checkin_names.wiki"><b>Check-in And Version Names</b></a></li>
<li><a href="checkin.wiki"><b>Check-in Checklist</b></a></li>
<li><a href="checkin.wiki">Checklist &mdash; Check-in</a></li>
<li><a href="../test/release-checklist.wiki">Checklist &mdash; Pre-Release Testing</a></li>
<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
<li><a href="selfcheck.wiki">Checks &mdash; Fossil Repository Integrity Self</a></li>
<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>

<li><a href="contribute.wiki">Code or Documentation To The Fossil Project &mdash; Contributing</a></li>
<li><a href="style.wiki">Code Style Guidelines &mdash; Source</a></li>
<li><a href="../../../help">Commands and Webpages &mdash; Lists of</a></li>
<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
<li><a href="concepts.wiki">Concepts &mdash; Fossil Core</a></li>
<li><a href="server.wiki">Configure A Fossil Server &mdash; How To</a></li>
<li><a href="shunning.wiki">Content From Fossil &mdash; Shunning: Deleting</a></li>
................................................................................
<li><a href="quotes.wiki">Git, and DVCSes in General &mdash; Quotes: What People Are Saying About Fossil,</a></li>
<li><a href="env-opts.md">Global Options &mdash; Environment Variables and</a></li>
<li><a href="customgraph.md">Graph &mdash; Theming: Customizing the Timeline</a></li>
<li><a href="quickstart.wiki">Guide &mdash; Fossil Quick Start</a></li>
<li><a href="style.wiki">Guidelines &mdash; Source Code Style</a></li>
<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>

<li><a href="hints.wiki">Hints &mdash; Fossil Tips And Usage</a></li>
<li><a href="index.wiki"><b>Home Page</b></a></li>
<li><a href="selfhost.wiki">Hosting Repositories &mdash; Fossil Self</a></li>
<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
<li><a href="newrepo.wiki"><b>How To Create A New Fossil Repository</b></a></li>
<li><a href="encryptedrepos.wiki"><b>How To Use Encrypted Repositories</b></a></li>
................................................................................
<li><a href="env-opts.md">Options &mdash; Environment Variables and Global</a></li>
<li><a href="tech_overview.wiki">Overview Of The Design And Implementation Of Fossil &mdash; A Technical</a></li>
<li><a href="index.wiki">Page &mdash; Home</a></li>
<li><a href="customskin.md">Pages &mdash; Theming: Customizing The Appearance of Web</a></li>
<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What</a></li>
<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>

<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
<li><a href="private.wiki">Private Branches &mdash; Creating, Syncing, and Deleting</a></li>
<li><a href="makefile.wiki">Process &mdash; The Fossil Build</a></li>
<li><a href="contribute.wiki">Project &mdash; Contributing Code or Documentation To The Fossil</a></li>
<li><a href="embeddeddoc.wiki">Project Documentation &mdash; Embedded</a></li>
<li><a href="foss-cklist.wiki">Projects &mdash; Checklist For Successful Open-Source</a></li>
................................................................................
<li><a href="fiveminutes.wiki">Running in 5 Minutes as a Single User &mdash; Update and</a></li>
<li><a href="quotes.wiki">Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What People Are</a></li>
<li><a href="th1.md">Scripting Language &mdash; The TH1</a></li>
<li><a href="selfcheck.wiki">Self Checks &mdash; Fossil Repository Integrity</a></li>
<li><a href="selfhost.wiki">Self Hosting Repositories &mdash; Fossil</a></li>
<li><a href="server.wiki">Server &mdash; How To Configure A Fossil</a></li>
<li><a href="settings.wiki">Settings &mdash; Fossil</a></li>


<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
<li><a href="fiveminutes.wiki">Single User &mdash; Update and Running in 5 Minutes as a</a></li>
<li><a href="../../../sitemap"><b>Site Map</b></a></li>
<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
<li><a href="antibot.wiki">Spiders and Bots &mdash; Defense against</a></li>
<li><a href="tech_overview.wiki"><b>SQLite Databases Used By Fossil</b></a></li>
<li><a href="ssl.wiki">SSL with Fossil &mdash; Using</a></li>







>







 







>







 







>







 







>







 







>
>







27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
..
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
...
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
...
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
...
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
<li><a href="delta_encoder_algorithm.wiki">Algorithm &mdash; Fossil Delta Encoding</a></li>
<li><a href="blame.wiki">Algorithm Of Fossil &mdash; The Annotate/Blame</a></li>
<li><a href="blame.wiki">Annotate/Blame Algorithm Of Fossil &mdash; The</a></li>
<li><a href="customskin.md">Appearance of Web Pages &mdash; Theming: Customizing The</a></li>
<li><a href="faq.wiki">Asked Questions &mdash; Frequently</a></li>
<li><a href="password.wiki">Authentication &mdash; Password Management And</a></li>
<li><a href="whyusefossil.wiki"><b>Benefits Of Version Control</b></a></li>
<li><a href="hashpolicy.wiki">Between SHA1 and SHA3-256 &mdash; Hash Policy: Choosing</a></li>
<li><a href="antibot.wiki">Bots &mdash; Defense against Spiders and</a></li>
<li><a href="private.wiki">Branches &mdash; Creating, Syncing, and Deleting Private</a></li>
<li><a href="branching.wiki"><b>Branching, Forking, Merging, and Tagging</b></a></li>
<li><a href="bugtheory.wiki"><b>Bug Tracking In Fossil</b></a></li>
<li><a href="makefile.wiki">Build Process &mdash; The Fossil</a></li>
<li><a href="aboutcgi.wiki">CGI Works In Fossil &mdash; How</a></li>
<li><a href="changes.wiki">Changelog &mdash; Fossil</a></li>
................................................................................
<li><a href="checkin_names.wiki"><b>Check-in And Version Names</b></a></li>
<li><a href="checkin.wiki"><b>Check-in Checklist</b></a></li>
<li><a href="checkin.wiki">Checklist &mdash; Check-in</a></li>
<li><a href="../test/release-checklist.wiki">Checklist &mdash; Pre-Release Testing</a></li>
<li><a href="foss-cklist.wiki"><b>Checklist For Successful Open-Source Projects</b></a></li>
<li><a href="selfcheck.wiki">Checks &mdash; Fossil Repository Integrity Self</a></li>
<li><a href="childprojects.wiki"><b>Child Projects</b></a></li>
<li><a href="hashpolicy.wiki">Choosing Between SHA1 and SHA3-256 &mdash; Hash Policy:</a></li>
<li><a href="contribute.wiki">Code or Documentation To The Fossil Project &mdash; Contributing</a></li>
<li><a href="style.wiki">Code Style Guidelines &mdash; Source</a></li>
<li><a href="../../../help">Commands and Webpages &mdash; Lists of</a></li>
<li><a href="build.wiki"><b>Compiling and Installing Fossil</b></a></li>
<li><a href="concepts.wiki">Concepts &mdash; Fossil Core</a></li>
<li><a href="server.wiki">Configure A Fossil Server &mdash; How To</a></li>
<li><a href="shunning.wiki">Content From Fossil &mdash; Shunning: Deleting</a></li>
................................................................................
<li><a href="quotes.wiki">Git, and DVCSes in General &mdash; Quotes: What People Are Saying About Fossil,</a></li>
<li><a href="env-opts.md">Global Options &mdash; Environment Variables and</a></li>
<li><a href="customgraph.md">Graph &mdash; Theming: Customizing the Timeline</a></li>
<li><a href="quickstart.wiki">Guide &mdash; Fossil Quick Start</a></li>
<li><a href="style.wiki">Guidelines &mdash; Source Code Style</a></li>
<li><a href="hacker-howto.wiki"><b>Hacker How-To</b></a></li>
<li><a href="adding_code.wiki"><b>Hacking Fossil</b></a></li>
<li><a href="hashpolicy.wiki"><b>Hash Policy: Choosing Between SHA1 and SHA3-256</b></a></li>
<li><a href="hints.wiki">Hints &mdash; Fossil Tips And Usage</a></li>
<li><a href="index.wiki"><b>Home Page</b></a></li>
<li><a href="selfhost.wiki">Hosting Repositories &mdash; Fossil Self</a></li>
<li><a href="aboutcgi.wiki"><b>How CGI Works In Fossil</b></a></li>
<li><a href="server.wiki"><b>How To Configure A Fossil Server</b></a></li>
<li><a href="newrepo.wiki"><b>How To Create A New Fossil Repository</b></a></li>
<li><a href="encryptedrepos.wiki"><b>How To Use Encrypted Repositories</b></a></li>
................................................................................
<li><a href="env-opts.md">Options &mdash; Environment Variables and Global</a></li>
<li><a href="tech_overview.wiki">Overview Of The Design And Implementation Of Fossil &mdash; A Technical</a></li>
<li><a href="index.wiki">Page &mdash; Home</a></li>
<li><a href="customskin.md">Pages &mdash; Theming: Customizing The Appearance of Web</a></li>
<li><a href="password.wiki"><b>Password Management And Authentication</b></a></li>
<li><a href="quotes.wiki">People Are Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What</a></li>
<li><a href="stats.wiki"><b>Performance Statistics</b></a></li>
<li><a href="hashpolicy.wiki">Policy: Choosing Between SHA1 and SHA3-256 &mdash; Hash</a></li>
<li><a href="../test/release-checklist.wiki"><b>Pre-Release Testing Checklist</b></a></li>
<li><a href="pop.wiki"><b>Principles Of Operation</b></a></li>
<li><a href="private.wiki">Private Branches &mdash; Creating, Syncing, and Deleting</a></li>
<li><a href="makefile.wiki">Process &mdash; The Fossil Build</a></li>
<li><a href="contribute.wiki">Project &mdash; Contributing Code or Documentation To The Fossil</a></li>
<li><a href="embeddeddoc.wiki">Project Documentation &mdash; Embedded</a></li>
<li><a href="foss-cklist.wiki">Projects &mdash; Checklist For Successful Open-Source</a></li>
................................................................................
<li><a href="fiveminutes.wiki">Running in 5 Minutes as a Single User &mdash; Update and</a></li>
<li><a href="quotes.wiki">Saying About Fossil, Git, and DVCSes in General &mdash; Quotes: What People Are</a></li>
<li><a href="th1.md">Scripting Language &mdash; The TH1</a></li>
<li><a href="selfcheck.wiki">Self Checks &mdash; Fossil Repository Integrity</a></li>
<li><a href="selfhost.wiki">Self Hosting Repositories &mdash; Fossil</a></li>
<li><a href="server.wiki">Server &mdash; How To Configure A Fossil</a></li>
<li><a href="settings.wiki">Settings &mdash; Fossil</a></li>
<li><a href="hashpolicy.wiki">SHA1 and SHA3-256 &mdash; Hash Policy: Choosing Between</a></li>
<li><a href="hashpolicy.wiki">SHA3-256 &mdash; Hash Policy: Choosing Between SHA1 and</a></li>
<li><a href="shunning.wiki"><b>Shunning: Deleting Content From Fossil</b></a></li>
<li><a href="fiveminutes.wiki">Single User &mdash; Update and Running in 5 Minutes as a</a></li>
<li><a href="../../../sitemap"><b>Site Map</b></a></li>
<li><a href="style.wiki"><b>Source Code Style Guidelines</b></a></li>
<li><a href="antibot.wiki">Spiders and Bots &mdash; Defense against</a></li>
<li><a href="tech_overview.wiki"><b>SQLite Databases Used By Fossil</b></a></li>
<li><a href="ssl.wiki">SSL with Fossil &mdash; Using</a></li>