Fossil

Check-in [47aa9282]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Further design changes to hierarchical manifests. Still no actual code.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | hierarchical-manifests
Files: files | file ages | folders
SHA1: 47aa92824027e7740ceca498a442308e8e462351
User & Date: drh 2015-12-22 13:40:22
Context
2016-01-07
20:42
minor typo fix. Leaf check-in: 82bb1901 user: stephan tags: hierarchical-manifests
2015-12-22
13:40
Further design changes to hierarchical manifests. Still no actual code. check-in: 47aa9282 user: drh tags: hierarchical-manifests
07:18
Describe an enhancement to manifest artifacts that allows for an hierarchical description of the structure of a check-in. It is hoped that this new format will work more efficiently for large repositories, and make clone and pull from Git much easier and faster. This check-in is a documentation change only. the new hierarchical manifest type has not yet been implemented in code. check-in: 7576a0f1 user: drh tags: hierarchical-manifests
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to www/fileformat.wiki.

109
110
111
112
113
114
115

116
117
118
119
120
121
122
...
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175

176
177
178
179
180
181
182
183
184
185
186
187
188
189



190
191
192
193
194
195
196
197
198
199

200
201
202
203
204
205
206
207
208
209
210
211
212
...
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
...
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
...
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
...
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
Allowed cards in the manifest are as follows:

<blockquote>
<b>B</b> <i>baseline-manifest</i><br>
<b>C</b> <i>checkin-comment</i><br>
<b>D</b> <i>time-and-date-stamp</i><br>
<b>F</b> <i>filename</i> ?<i>SHA1-hash</i>? ?<i>permissions</i>? ?<i>old-name</i>?<br>

<b>N</b> <i>mimetype</i><br>
<b>P</b> <i>SHA1-hash</i>+<br>
<b>Q</b> (<b>+</b>|<b>-</b>)<i>SHA1-hash</i> ?<i>SHA1-hash</i>?<br>
<b>R</b> <i>repository-checksum</i><br>
<b>T</b> (<b>+</b>|<b>-</b>|<b>*</b>)<i>tag-name</i> <b>*</b> ?<i>value</i>?<br>
<b>U</b> <i>user-login</i><br>
<b>Z</b> <i>manifest-checksum</i>
................................................................................
The format must be one of:

<blockquote>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i><br>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i><b>.</b><i>SSS</i>
</blockquote>

A manifest has zero or more F-cards.  Each F-card identifies a file or
subdirectory
that is part of the check-in.  There are one, two, three, or four
arguments.  The first argument is the pathname of the file or
subdirectory in the
check-in relative to the root of the project file hierarchy.  No ".."
or "." directories are allowed within the filename.  Space characters
are escaped as in C-card comment text.  Backslash characters and
newlines are not allowed within filenames.  The directory separator
character is a forward slash (ASCII 0x2F).  The second argument to the
F-card is the full 40-character lower-case hexadecimal SHA1 hash of
the content artifact, or of the [#directory|directory artifact] if
the "d" permission is present.  The second argument is required for baseline
manifests but is optional for delta manifests.  When the second
argument to the F-card is omitted, it means that the file has been
deleted relative to the baseline (files removed in baseline manifests
versions are <em>not</em> added as F-cards). The optional 3rd argument
defines any special access permissions associated with the file.  This

can be defined as "x" to mean that the file is executable or "l"
(small letter ell) to mean a symlink or "d" to mean the entry describes
a subdirectory rather than a file.  All files and subdirectories 
are always readable and writable.  This can be expressed by "w" 
permission if desired but the "w" permission is optional and is ignored
by Fossil.  The file format might be extended with new permission
letters in the future.  The optional 4th argument is the name of the
same file as it existed in the parent check-in.  If the name of the
file is unchanged from its parent, then the 4th argument is omitted.

Manifests may be either flat or hierarchical.  A flat manifest lists
all files in the check-in, including all files in subdirectories.  A
flat manifest may not include F-cards with the "d" permission.  An
heirarchical manifest only lists the files or subdirectories at the



top-level of the check-in.  An heirarchical manifest may not include
an F-card entries that have a directory separator character ("/").  
An heirarchical manifest may not be a delta-manifest (it may not have
a B-card) nor may it be used as a baseline-manifest by some other
delta-manifest.  Hierarchical manifests
are only recognized by Fossil versions 1.35 and later.  Repositories
that contain hierarchical manifests will cause problems for earlier
versions of Fossil.

When an F-card refers to a subdirectory (that is to say, when the

F-card is part of an hierarchical manifest and contains the "d"
permission) then the referenced directory artifact must be a 
[#directory|well-formed directory artifact] that contains a
G-card that exactly matches the name of the subdirectory as assigned
by the F-card.  If these conditions are not met, then the artifact is
not a valid manifest.

A manifest has zero or one P-cards.  Most manifests have one P-card.
The P-card has a varying number of arguments that
defines other manifests from which the current manifest
is derived.  Each argument is an 40-character lowercase 
hexadecimal SHA1 of the predecessor manifest.  All arguments
to the P-card must be unique to that line.
................................................................................
<h3>1.2 Directory Artifacts</h3>

A directory artifact describes the files and subdirectories within a
single directory of an hierarchical manifest.  Directory artifacts
are only recognized by Fossil version 1.35 and later (circa 2015-12-23).

Directory artifacts contain zero or more F-cards and exactly one Z-card,
in the same format as a manifest.  A directory artifact also contains
exactly one G-card with a single argument that is the pathname
of the directory relative to the root of the repository.
The format of the directory name in a G-card is
the same as the format of a filename in an F-card.

The F-cards in a directory artifact may not contain directory separator
characters.  The content of subdirectories must be expressed using
additional directory artifacts referenced by F-cards with the "d"
permission.  All F-cards in a directory artifact must contain at least
two arguments.

When an F-card X of directory artifact Y refers to 
subdirectory Z (that is to say, when F-card X contains
the "d" permission and the second argument on X is the SHA1
hash of directory artifact Z) then the G-card of Z must
be the concatenation of the G-card on artifact Y, the
directory separator character "/" and the first argument to
the F-card X.  Otherwise, the artifact Y is not a valid
directory artifact.

<a name="cluster"></a>
<h3>1.3 Clusters Artifacts</h3>

A cluster is an artifact that declares the existence of other artifacts.
Clusters are used during repository synchronization to help 
reduce network traffic.  As such, clusters are an optimization and
................................................................................
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
</tr>
<tr>
<td><b>B</b> <i>baseline</i></td>
<td align=center><b>0-1*</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr><td>&nbsp;</td><td colspan='8'>* = Required for delta manifests,
Disallowed for hierarchical manifests.</td></tr>
<tr>
<td><b>C</b> <i>comment-text</i></td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align=center><b>0-1</b></td>
<td align=center><b>0-1</b></td>
</tr>
<tr>
<td><b>D</b> <i>date-time-stamp</i></td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align=center><b>1</b></td>
................................................................................
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td><b>G</b> <i>fileame</i>
<td>&nbsp;</td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
................................................................................
manifests.  That means that all the files of a check-in had to be
listed in every manifest.  Because manifests are delta-encoded, there
is not a storage space issue.  Fossil was originally designed
specifically to support the SQLite project, and as SQLite has fewer
than 2000 files on any give version, a flat baseline manifest design
worked well there and was simple to implement.

However, some project (ex: NetBSD) contain a huge number of files in
every version, and even though the manifests compressed will using
delta-compression, many CPU cycles had to be spent to decompress those
manifests.  To help make Fossil more efficient for large projects like
NetBSD, the concept of a delta-manifest was added.  This helped a lot
but was not a perfect solution.

Later, the concept of an hierarchical manifest was added.  By breaking
up each manifest into many separate subdirectories it is hoped that







>







 







|
<









<
|




|
>
|
|
<
|






|
|
|
|
>
>
>
|
<
<
<
|
|
<
<

<
>
|
<
<
<
<
<







 







|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<







 







|







<
<
<
<
<
<
<
<
<
<
<
<
<







 







|
|
|







 







|
|







109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
...
152
153
154
155
156
157
158
159

160
161
162
163
164
165
166
167
168

169
170
171
172
173
174
175
176
177

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192



193
194


195

196
197





198
199
200
201
202
203
204
...
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296

297
298
299
300
301
302
303
...
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618













619
620
621
622
623
624
625
...
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
...
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
Allowed cards in the manifest are as follows:

<blockquote>
<b>B</b> <i>baseline-manifest</i><br>
<b>C</b> <i>checkin-comment</i><br>
<b>D</b> <i>time-and-date-stamp</i><br>
<b>F</b> <i>filename</i> ?<i>SHA1-hash</i>? ?<i>permissions</i>? ?<i>old-name</i>?<br>
<b>G</b> <i>SHA1-hash</i><br>
<b>N</b> <i>mimetype</i><br>
<b>P</b> <i>SHA1-hash</i>+<br>
<b>Q</b> (<b>+</b>|<b>-</b>)<i>SHA1-hash</i> ?<i>SHA1-hash</i>?<br>
<b>R</b> <i>repository-checksum</i><br>
<b>T</b> (<b>+</b>|<b>-</b>|<b>*</b>)<i>tag-name</i> <b>*</b> ?<i>value</i>?<br>
<b>U</b> <i>user-login</i><br>
<b>Z</b> <i>manifest-checksum</i>
................................................................................
The format must be one of:

<blockquote>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i><br>
<i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i><b>.</b><i>SSS</i>
</blockquote>

A manifest has zero or more F-cards.  Each F-card identifies a file

that is part of the check-in.  There are one, two, three, or four
arguments.  The first argument is the pathname of the file or
subdirectory in the
check-in relative to the root of the project file hierarchy.  No ".."
or "." directories are allowed within the filename.  Space characters
are escaped as in C-card comment text.  Backslash characters and
newlines are not allowed within filenames.  The directory separator
character is a forward slash (ASCII 0x2F).  The second argument to the
F-card is the full 40-character lower-case hexadecimal SHA1 hash of

the content artifact.  The second argument is required for baseline
manifests but is optional for delta manifests.  When the second
argument to the F-card is omitted, it means that the file has been
deleted relative to the baseline (files removed in baseline manifests
versions are <em>not</em> added as F-cards). The optional 3rd argument
defines any special access permissions associated with the file.  
In a manifest, the permission string may contain 
an "x" to mean that the file is executable or "l"
(small letter ell) to mean the file is a symlink.

All files are always readable and writable.  This can be expressed by "w" 
permission if desired but the "w" permission is optional and is ignored
by Fossil.  The file format might be extended with new permission
letters in the future.  The optional 4th argument is the name of the
same file as it existed in the parent check-in.  If the name of the
file is unchanged from its parent, then the 4th argument is omitted.

A G-card is an alternative way of specifying the files of a check-in.
A single manifest may have many F-cards or a single G-card, but not
both.  A manifest containing F-cards is called a "flat manifest" and
a manifest that contains a G-card is called an "hierarchical manifest".

A G-card contains a single argument which is the SHA1 hash of a
[#directory|directory artifact] that defines the files and directories
at the top-level of the check-in.  That directory artifact may contain



F-cards with the "d" permission that reference other subdirectories
in check-in file hierarchy.




A G-card may not be used in a delta manifest.  No delta manifest may
refer to an hierarchical manifest as its baseline.






A manifest has zero or one P-cards.  Most manifests have one P-card.
The P-card has a varying number of arguments that
defines other manifests from which the current manifest
is derived.  Each argument is an 40-character lowercase 
hexadecimal SHA1 of the predecessor manifest.  All arguments
to the P-card must be unique to that line.
................................................................................
<h3>1.2 Directory Artifacts</h3>

A directory artifact describes the files and subdirectories within a
single directory of an hierarchical manifest.  Directory artifacts
are only recognized by Fossil version 1.35 and later (circa 2015-12-23).

Directory artifacts contain zero or more F-cards and exactly one Z-card,
in the same format as a manifest.

The F-cards of a directory artifact are slightly different from the F-cards
in a [#manifest|manifest artifact].  

<ul>
<li>F-cards in a directory artifact may not have directory separator
characters in their filename.  The F-cards of a directory specify only
the files and subdirectories in that directory.
<li>F-cards in a directory artifact may use the "d" permission character
to indicate that the entry refers to a subdirectory.  In that case, the
SHA1 hash on the second argument refers to another directory artifact,
not to a content artifact.
<li>F-cards in a directory artifact must have at last two arguments.
</ul>

The Z-card in a directory artifact is required and has the same format as
it does in all other special artifacts.  Directory artifacts may not be
PGP-clearsigned.


<a name="cluster"></a>
<h3>1.3 Clusters Artifacts</h3>

A cluster is an artifact that declares the existence of other artifacts.
Clusters are used during repository synchronization to help 
reduce network traffic.  As such, clusters are an optimization and
................................................................................
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
</tr>
<tr>
<td><b>B</b> <i>baseline</i></td>
<td align=center><b>0-1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>













</tr>
<tr>
<td><b>D</b> <i>date-time-stamp</i></td>
<td align=center><b>1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align=center><b>1</b></td>
................................................................................
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
<tr>
<td><b>G</b> <i>SHA1-hash</i>
<td align=center><b>0-1</b></td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
................................................................................
manifests.  That means that all the files of a check-in had to be
listed in every manifest.  Because manifests are delta-encoded, there
is not a storage space issue.  Fossil was originally designed
specifically to support the SQLite project, and as SQLite has fewer
than 2000 files on any give version, a flat baseline manifest design
worked well there and was simple to implement.

However, some other projects (ex: NetBSD) contain a huge number of files in
every check-in, and even though the manifests compressed will using
delta-compression, many CPU cycles had to be spent to decompress those
manifests.  To help make Fossil more efficient for large projects like
NetBSD, the concept of a delta-manifest was added.  This helped a lot
but was not a perfect solution.

Later, the concept of an hierarchical manifest was added.  By breaking
up each manifest into many separate subdirectories it is hoped that