Fossil

Check-in [7898593d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Assorted improvements to www/globs.md, mainly to clarity and grammar.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 7898593d9de25804fc1c737148ab0f96b646cf861d24fcdd8afdf5ed57a48670
User & Date: wyoung 2020-03-21 19:57:43.803
Context
2020-03-22
14:29
Update the built-in SQLite to the latest 3.32.0 alpha that includes fixes for the DBCONFIG_MAINDBNAME problem. ... (check-in: 8d114c2a user: drh tags: trunk)
2020-03-21
19:57
Assorted improvements to www/globs.md, mainly to clarity and grammar. ... (check-in: 7898593d user: wyoung tags: trunk)
2020-03-20
04:02
Rename test function to match the test command name ... (check-in: 77be1777 user: andygoth tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to www/globs.md.
1
2
3
4
5
6
7
8
9
10

11


12
13

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

36
37
38
39
40
41
42
43
44
45
46
47


48
49
50
51
52
53
54
55
56
57
58
59





60

61


62
63
64









65


66
67

68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

119

120
121
122



123
124

125

126
127
128

129
130
131




132
133
134
135
136
137
138
# File Name Glob Patterns


A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file.

Note that although both are notations for describing patterns in text,
glob patterns are not the same thing as a [regular expression or

regexp][regexp].



[glob]: https://en.wikipedia.org/wiki/Glob_(programming) (Wikipedia)

[regexp]: https://en.wikipedia.org/wiki/Regular_expression


A number of fossil setting values hold one or more file glob patterns
that will identify files needing special treatment.  Glob patterns are
also accepted in options to certain commands as well as query
parameters to certain pages.

In many cases more than one glob may be specified in a setting,
option, or query parameter by listing multiple globs separated by a
comma or white space.

Of course, many fossil commands also accept lists of files to act on,
and those also may be specified with globs. Although those glob
patterns are similar to what is described here, they are not defined
by fossil, but rather by the conventions of the operating system in
use.


## Syntax

A list of glob patterns is simply one or more glob patterns separated

by white space or commas. If a glob must contain white spaces or
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one (or more) globs in the list
matches.

A glob pattern is a collection of characters compared to a target
text, usually a file name. The whole glob is said to match if it
successfully consumes and matches the entire target text. Glob
patterns are made up of ordinary characters and special characters.

Ordinary characters consume a single character of the target and must
match it exactly.



Special characters (and special character sequences) consume zero or
more characters from the target and describe what matches. The special
characters (and sequences) are:

:Pattern |:Effect
---------------------------------------------------------------------
`*`      | Matches any sequence of zero or more characters
`?`      | Matches exactly one character
`[...]`  | Matches one character from the enclosed list of characters
`[^...]` | Matches one character not in the enclosed list






Special character sequences have some additional features:




 *  A range of characters may be specified with `-`, so `[a-d]` matches
    exactly the same characters as `[abcd]`. Ranges reflect Unicode
    code points without any locale-specific collation sequence.









 *  Include `-` in a list by placing it last, just before the `]`.


 *  Include `]` in a list by making the first character after the `[` or
    `[^`. At any other place, `]` ends the list.

 *  Include `^` in a list by placing anywhere except first after the
    `[`.
 *  Beware that ranges in lists may include more than you expect:
    `[A-z]` Matches `A` and `Z`, but also matches `a` and some less
    obvious characters such as `[`, `\`, and `]` with code point
    values between `Z` and `a`.
 *  Beware that a range must be specified from low value to high
    value: `[z-a]` does not match any character at all, preventing the
    entire glob from matching.
 *  Note that unlike typical Unix shell globs, wildcards (`*`, `?`,
    and character lists) are allowed to match `/` directory
    separators as well as the initial `.` in the name of a hidden
    file or directory.

Some examples of character lists:

:Pattern |:Effect
---------------------------------------------------------------------
`[a-d]`  | Matches any one of `a`, `b`, `c`, or `d` but not `ä`
`[^a-d]` | Matches exactly one character other than `a`, `b`, `c`, or `d`
`[0-9a-fA-F]` | Matches exactly one hexadecimal digit
`[a-]`   | Matches either `a` or `-`
`[][]`   | Matches either `]` or `[`
`[^]]`   | Matches exactly one character other than `]`
`[]^]`   | Matches either `]` or `^`
`[^-]`   | Matches exactly one character other than `-`

White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE.  Note that this does not include any of the many additional
spacing characters available in Unicode, and specifically does not
include U+00A0 NO-BREAK SPACE.

Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file (as for a
versioned setting).

Similarly 'single quotes' and "double quotes" are the ASCII straight
quote characters, not any of the other quotation marks provided in
Unicode and specifically not the "curly" quotes preferred by
typesetters and word processors.


## File Names to Match

Before it is compared to a glob pattern, each file name is transformed
to a canonical form. The glob must match the entire canonical file
name to be considered a match.

The canonical name of a file has all directory separators changed to
`/`, redundant slashes are removed, all `.` path components are

removed, and all `..` path components are resolved. (There are

additional details we are ignoring here, but they cover rare edge
cases and also follow the principle of least surprise.)




The goal is to have a name that is the simplest possible for each
particular file, and that will be the same on Windows, Unix, and any

other platform where fossil is run.


Beware, however, that all glob matching is case sensitive. This will
not be a surprise on Unix where all file names are also case

sensitive. However, most Windows file systems are case preserving and
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are names of the same file; on Unix they are different files.





Some example cases:

:Pattern     |:Effect
--------------------------------------------------------------------------------
`README`     | Matches only a file named `README` in the root of the tree. It does not match a file named `src/README` because it does not include any characters that consume (and match) the `src/` part.
`*/README`   | Matches `src/README`. Unlike Unix file globs, it also matches `src/library/README`. However it does not match the file `README` in the root of the tree.








<
|
>
|
>
>

|
>


<
|
|
|
|

<
<
|
|
<
<
|
<
|




|
>


|


|
<
|
<

|
|
>
>

|
<
|






|

>
>
>
>
>
|
>

>
>
|
|

>
>
>
>
>
>
>
>
>
|
>
>
|
|
>
|
|
|
<
<
<



<
<
<
<
















|
|



|
|

|








|
<

|
|
>
|
>
|
|

>
>
>

|
>
|
>

|
|
>
|

|
>
>
>
>







1
2
3
4
5
6
7
8

9
10
11
12
13
14
15
16
17
18

19
20
21
22
23


24
25


26

27
28
29
30
31
32
33
34
35
36
37
38
39

40

41
42
43
44
45
46
47

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87



88
89
90




91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File Name Glob Patterns


A [glob pattern][glob] is a text expression that matches one or more
file names using wild cards familiar to most users of a command line.
For example, `*` is a glob that matches any name at all and
`Readme.txt` is a glob that matches exactly one file.


A glob should not be confused with a [regular expression][regexp] (RE),
even though they use some of the same special characters for similar
purposes, because [they are not fully compatible][greinc] pattern
matching languages. Fossil uses globs when matching file names with the
settings described in this document, not REs.

[glob]:   https://en.wikipedia.org/wiki/Glob_(programming)
[greinc]: https://unix.stackexchange.com/a/57958/138
[regexp]: https://en.wikipedia.org/wiki/Regular_expression


These settings hold one or more file glob patterns to cause Fossil to
give matching named files special treatment.  Glob patterns are also
accepted in options to certain commands and as query parameters to
certain Fossil UI web pages.



Where Fossil also accepts globs in commands, this handling may interact
with your OS’s command shell or its C runtime system, because they may


have their own glob pattern handling. We will detail such interactions

below.


## Syntax

Where Fossil accepts glob patterns, it will usually accept a *list* of
such patterns, each individual pattern separated from the others
by white space or commas. If a glob must contain white spaces or
commas, it can be quoted with either single or double quotation marks.
A list is said to match if any one glob in the list
matches.

A glob pattern matches a given file name if it successfully consumes and

matches the *entire* name. Partial matches are failed matches.


Most characters in a glob pattern consume a single character of the file
name and must match it exactly. For instance, “a” in a glob simply
matches the letter “a” in the file name unless it is inside a special
character sequence.

Other characters have special meaning, and they may include otherwise

normal characters to give them special meaning:

:Pattern |:Effect
---------------------------------------------------------------------
`*`      | Matches any sequence of zero or more characters
`?`      | Matches exactly one character
`[...]`  | Matches one character from the enclosed list of characters
`[^...]` | Matches one character *not* in the enclosed list

Note that unlike [POSIX globs][pg], these special characters and
sequences are allowed to match `/` directory separators as well as the
initial `.` in the name of a hidden file or directory. This is because
Fossil file names are stored as complete path names. The distinction
between file name and directory name is “below” Fossil in this sense.

[pg]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13

The bracket expresssions above require some additional explanation:

 *  A range of characters may be specified with `-`, so `[a-f]` matches
    exactly the same characters as `[abcdef]`. Ranges reflect Unicode
    code points without any locale-specific collation sequence.
    Therefore, this particular sequence never matches the Unicode
    pre-composed character `é`, for example. (U+00E9)

 *  This dependence on character/code point ordering may have other
    effects to surprise you. For example, the glob `[A-z]` not only
    matches upper and lowercase ASCII letters, it also matches several
    punctuation characters placed between `Z` and `a` in both ASCII and
    Unicode: `[`, `\`, `]`, `^`, `_`, and <tt>\`</tt>.

 *  You may include a literal `-` in a list by placing it last, just
    before the `]`.

 *  You may include a literal `]` in a list by making the first
    character after the `[` or `[^`. At any other place, `]` ends the list.

 *  You may include a literal `^` in a list by placing it anywhere
    except after the opening `[`.




 *  Beware that a range must be specified from low value to high
    value: `[z-a]` does not match any character at all, preventing the
    entire glob from matching.





Some examples of character lists:

:Pattern |:Effect
---------------------------------------------------------------------
`[a-d]`  | Matches any one of `a`, `b`, `c`, or `d` but not `ä`
`[^a-d]` | Matches exactly one character other than `a`, `b`, `c`, or `d`
`[0-9a-fA-F]` | Matches exactly one hexadecimal digit
`[a-]`   | Matches either `a` or `-`
`[][]`   | Matches either `]` or `[`
`[^]]`   | Matches exactly one character other than `]`
`[]^]`   | Matches either `]` or `^`
`[^-]`   | Matches exactly one character other than `-`

White space means the specific ASCII characters TAB, LF, VT, FF, CR,
and SPACE.  Note that this does not include any of the many additional
spacing characters available in Unicode such as
U+00A0, NO-BREAK SPACE.

Because both LF and CR are white space and leading and trailing spaces
are stripped from each glob in a list, a list of globs may be broken
into lines between globs when the list is stored in a file, as for a
versioned setting.

Note that 'single quotes' and "double quotes" are the ASCII straight
quote characters, not any of the other quotation marks provided in
Unicode and specifically not the "curly" quotes preferred by
typesetters and word processors.


## File Names to Match

Before it is compared to a glob pattern, each file name is transformed
to a canonical form:


  *  all directory separators are changed to `/`
  *  redundant slashes are removed
  *  all `.` path components are removed
  *  all `..` path components are resolved

(There are additional details we are ignoring here, but they cover rare
edge cases and follow the principle of least surprise.)

The glob must match the *entire* canonical file name to be considered a
match.

The goal is to have a name that is the simplest possible for each
particular file, and that will be the same regardless of the platform
you run Fossil on. This is important when you have a repository cloned
from multiple platforms and have globs in versioned settings: you want
those settings to be interpreted the same way everywhere.

Beware, however, that all glob matching in Fossil is case sensitive
regardless of host platform and file system. This will not be a surprise
on POSIX platforms where file names are usually treated case
sensitively. However, most Windows file systems are case preserving but
case insensitive. That is, on Windows, the names `ReadMe` and `README`
are usually names of the same file. The same is true in other cases,
such as by default on macOS file systems and in the file system drivers
for Windows file systems running on non-Windows systems. (e.g. exfat on
Linux.) Therefore, write your Fossil glob patterns to match the name of
the file as checked into the repository.

Some example cases:

:Pattern     |:Effect
--------------------------------------------------------------------------------
`README`     | Matches only a file named `README` in the root of the tree. It does not match a file named `src/README` because it does not include any characters that consume (and match) the `src/` part.
`*/README`   | Matches `src/README`. Unlike Unix file globs, it also matches `src/library/README`. However it does not match the file `README` in the root of the tree.
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
[`test-echo`]: /help?cmd=test-echo
[`test-glob`]: /help?cmd=test-glob


## Converting `.gitignore` to `ignore-glob`

Many other version control systems handle the specific case of
ignoring certain files differently from fossil: they have you create
individual "ignore" files in each folder, which specify things ignored
in that folder and below. Usually some form of glob patterns are used
in those files, but the details differ from fossil.

In many simple cases, you can just store a top level "ignore" file in
`.fossil-settings/ignore-glob`. But as usual, there will be lots of
edge cases.

[Git has a rich collection of ignore files][gitignore] which
accumulate rules that affect the current command. There are global
files, per-user files, per workspace unmanaged files, and fully
version controlled files. Some of the files used have no set name, but
are called out in configuration files.

[gitignore]: https://git-scm.com/docs/gitignore

In contrast, fossil has a global setting and a local setting, but the local setting
overrides the global rather than extending it. Similarly, a fossil
command's `--ignore` option replaces the `ignore-glob` setting rather
than extending it.

With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
fossil:

 *  "A blank line matches no files..." is the same in fossil.
 *  "A line starting with # serves as a comment...." not in fossil.
 *  "Trailing spaces are ignored unless they are quoted..." is similar
    in fossil. All whitespace before and after a glob is trimmed in
    fossil unless quoted with single or double quotes. Git uses
    backslash quoting instead, which fossil does not.
 *  "An optional prefix "!" which negates the pattern..." not in
    fossil.
 *  Git's globs are relative to the location of the `.gitignore` file;
    fossil's globs are relative to the root of the workspace.
 *  Git's globs and fossil's globs treat directory separators
    differently. Git includes a notation for zero or more directories
    that is not needed in fossil.

### Example

In a project with source and documentation:

    work
      +-- doc







|


|













|
|






|

|
|

|
|
|
|
|
|
|
|

|







497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
[`test-echo`]: /help?cmd=test-echo
[`test-glob`]: /help?cmd=test-glob


## Converting `.gitignore` to `ignore-glob`

Many other version control systems handle the specific case of
ignoring certain files differently from Fossil: they have you create
individual "ignore" files in each folder, which specify things ignored
in that folder and below. Usually some form of glob patterns are used
in those files, but the details differ from Fossil.

In many simple cases, you can just store a top level "ignore" file in
`.fossil-settings/ignore-glob`. But as usual, there will be lots of
edge cases.

[Git has a rich collection of ignore files][gitignore] which
accumulate rules that affect the current command. There are global
files, per-user files, per workspace unmanaged files, and fully
version controlled files. Some of the files used have no set name, but
are called out in configuration files.

[gitignore]: https://git-scm.com/docs/gitignore

In contrast, Fossil has a global setting and a local setting, but the local setting
overrides the global rather than extending it. Similarly, a Fossil
command's `--ignore` option replaces the `ignore-glob` setting rather
than extending it.

With that in mind, translating a `.gitignore` file into
`.fossil-settings/ignore-glob` may be possible in many cases. Here are
some of features of `.gitignore` and comments on how they relate to
Fossil:

 *  "A blank line matches no files...": same in Fossil.
 *  "A line starting with # serves as a comment....": not in Fossil.
 *  "Trailing spaces are ignored unless they are quoted..." is similar
    in Fossil. All whitespace before and after a glob is trimmed in
    Fossil unless quoted with single or double quotes. Git uses
    backslash quoting instead, which Fossil does not.
 *  "An optional prefix "!" which negates the pattern...": not in
    Fossil.
 *  Git's globs are relative to the location of the `.gitignore` file:
    Fossil's globs are relative to the root of the workspace.
 *  Git's globs and Fossil's globs treat directory separators
    differently. Git includes a notation for zero or more directories
    that is not needed in Fossil.

### Example

In a project with source and documentation:

    work
      +-- doc
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571



572


573

574
575
576
577
578
579





## Implementation and References

Most of the implementation of glob pattern handling in fossil is found
`glob.c`, `file.c`, and each individual command and web page that uses
a glob pattern. Find commands and pages in the fossil sources by
looking for comments like `COMMAND: add` or `WEBPAGE: timeline` in
front of the function that implements the command or page in files
`src/*.c`. (Fossil's build system creates the tables used to dispatch
commands at build time by searching the sources for those comments.) A
few starting points:

:File            |:Description
--------------------------------------------------------------------------------
[`src/glob.c`][] | Implementation of glob pattern list loading, parsing, and matching.
[`src/file.c`][] | Implementation of various kinds of canonical names of a file.

[`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c
[`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c




The actual pattern matching is implemented in SQL, so the


documentation for `GLOB` and the other string matching operators in

[SQLite] (https://sqlite.org/lang_expr.html#like) is useful. Of
course, the SQLite [source code]
(https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768)
and [test harnesses]
(https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673)
also make entertaining reading.







|
<
<
<
<
<
<
<



|
|




>
>
>
|
>
>
|
>
|
<
|
<
|
<
569
570
571
572
573
574
575
576







577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594

595

596






## Implementation and References

The implementation of the Fossil-specific glob pattern handling is here:








:File            |:Description
--------------------------------------------------------------------------------
[`src/glob.c`][] | pattern list loading, parsing, and generic matching code
[`src/file.c`][] | application of glob patterns to file names

[`src/glob.c`]: https://www.fossil-scm.org/index.html/file/src/glob.c
[`src/file.c`]: https://www.fossil-scm.org/index.html/file/src/file.c

See the [Adding Features to Fossil][aff] document for broader details
about finding and working with such code.

The actual pattern matching leverages the `GLOB` operator in SQLite, so
you may find [its documentation][gdoc], [source code][gsrc] and [test
harness][gtst] helpful.

[aff]:  ./adding_code.wiki
[gdoc]: https://sqlite.org/lang_expr.html#like

[gsrc]: https://www.sqlite.org/src/artifact?name=9d52522cc8ae7f5c&ln=570-768

[gtst]: https://www.sqlite.org/src/artifact?name=66a2c9ac34f74f03&ln=586-673