ADDREMOVE command simultaneously add and delete same file

(1) By litmit on 2025-07-02 15:07:11 [link] [source]

If a name of file contain unusual combination of ASCII characters then fossil addremove simultaneously add and delete this file.

Example:

Tests>fossil addremove
ADDED  "Tests/TestSuite/Strange names/ "
ADDED  "Tests/TestSuite/Strange names/  "
ADDED  "Tests/TestSuite/Strange names/   %    "
ADDED  "Tests/TestSuite/Strange names/   %%    "
ADDED  "Tests/TestSuite/Strange names/   -    "
ADDED  "Tests/TestSuite/Strange names/   --    "
ADDED  "Tests/TestSuite/Strange names/ ."
ADDED  "Tests/TestSuite/Strange names/!."
ADDED  "Tests/TestSuite/Strange names/. "
ADDED  "Tests/TestSuite/Strange names/..."
DELETED  Tests/TestSuite/Strange names/
DELETED  Tests/TestSuite/Strange names/
DELETED  Tests/TestSuite/Strange names/   %
DELETED  Tests/TestSuite/Strange names/   %%
DELETED  Tests/TestSuite/Strange names/   -
DELETED  Tests/TestSuite/Strange names/   --
DELETED  Tests/TestSuite/Strange names/ .
DELETED  Tests/TestSuite/Strange names/!.
DELETED  Tests/TestSuite/Strange names/.
DELETED  Tests/TestSuite/Strange names/...
added 10 files, deleted 10 files

(I added double qoutas to the output to clarify the exact file names)

Nothing surprising for me. This names can not be proccesed not only Fossil but many others Windows utils (but of course not all).

Is it possible to process all names supported by host filesystem? Or check such names before ADD and show a warning like 'can't process file name 'badname'' instead of ADD and DELETE them?

(2) By Stephan Beal (stephan) on 2025-07-02 18:37:33 in reply to 1 [link] [source]

Is it possible to process all names supported by host filesystem?

Fossil doesn't have that information. You might be running on a VFAT SD card on a Linux filesystem (so it can't guess based on the OS, either). Fossil is intentionally conservative with regards to legal filenames, as it would be a tragedy to have SCM'd content which can only be processed on a subset of platforms.

Or check such names before ADD and show a warning like 'can't process file name 'badname'' instead of ADD and DELETE them?

That sounds feasible, but to ask a question Richard is fond of posing:

What problem does it solve?

Obviously, the names you've used are extreme examples intended to demonstrate a point, and not something people would really use in an SCM. Did you encounter this problem through seemingly legitimate names, or was this an experiment done just out of curiosity? If it's the latter: does it need "fixing"?

(3) By litmit on 2025-07-03 06:49:14 in reply to 2 [link] [source]

Is it possible to process all names supported by host filesystem?

Fossil doesn't have that information.

I'm not sure. Fossil sees these names using standard C functions (I suppose). But after that something goes wrong (possibly incorrect parsing of names).

it would be a tragedy to have SCM'd content which can only be processed on a subset of platforms.

This is not SCM problem at all, but only problem for multi-platform developers. They should use common naming rules suitable for all used platforms. SCM just should correctly handle situations when a file cannot be successfully checkout on the target platfom.

Obviously, the names you've used are extreme examples intended to demonstrate a point, and not something people would really use in an SCM.

I'm developing a utility that needs to work with thousands of filenames and parse and match them correctly using glob patterns. And I want to make a utility that will not have any problem with strange names. Therefore I need a test set of files to ensure that my utility handle them correctly. But now I have the problem maintaining my source code using Fossil.

BTW, these names are only a small part of my test set (all the others were proccesed correctly by Fossil).

(4) By Stephan Beal (stephan) on 2025-07-03 12:39:58 in reply to 3 [link] [source]

Fossil sees these names using standard C functions (I suppose).

That's correct, but there are no libc APIs which tell us whether a file's name is valid. open(2) sets errno to EINVAL in that case, but we can't try that without opening or, depending on the context, creating a file with that name. We can't check that legality in advance without using platform-specific APIs (which may or may not exist - i've no idea).

After looking more closely at your initial post...

fossil handles names like " %% " just fine. Your output demonstrates addremove having an apparent issue with them, but i cannot reproduce it using a random subset of the names you've shown:

[stephan@nuc:~/tmp]$ rm -fr x x.f; f new x.f; mkdir x; cd x; f open ../x.f
project-id: 1402856c6a39af04d934c57d9ed198322a3589c2
server-id:  1698a5ab82a55d086b3921fa7d5a4133eb5cd56a
<snip>
checkout:     36cf3bd5e019908111563b97214a323b26d34f79 2025-07-03 12:12:45 UTC
tags:         trunk
comment:      initial empty check-in (user: stephan)
check-ins:    1

[stephan@nuc:~/tmp/x]$ touch "   %%    " "   --    " "   %%    " " ." ". " "..."
[stephan@nuc:~/tmp/x]$ f addremove
ADDED     %%    
ADDED     --    
ADDED   .
ADDED  configure 
added 4 files, deleted 0 files

Note, however, that it did fail to add ". " and "...". That's because their names begin with dots and addremove ignores those unless you pass the --dotfiles flag (which i only just now learned because i've only once (seen above) used addremove in my 17+ years of using fossil).

"fossil add" has no problem with those names:

stephan@nuc:~/tmp/x]$ f add '. ' '...'
ADDED  .
ADDED  ...

[stephan@nuc:~/tmp/x]$ f ci -m foo
New_Version: 9a4a69d902f440ed782d56ef639a26ad2ebfcae952663eb3f9c5a6ce78b62b34

[stephan@nuc:~/tmp/x]$ f zip trunk foo.zip
[stephan@nuc:~/tmp/x]$ unzip -l foo.zip 
Archive:  foo.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/   %%    
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/   --    
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/ .
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/. 
        0  2025-07-03 14:36   unnamed_2025-07-03_123603_9a4a69d902/...
---------                     -------
        0                     6 files

Whether those are legal on an Windows filesystem i do not know. Ideally, fossil should never have a file whose name cannot be legally represented on any reasonably modern filesystem, but if ZIP can handle a name then it's presumably reasonably portable.

And I want to make a utility that will not have any problem with strange names.

Are these names ones your utility has actually had to deal with or are you specifically reaching for hypothetical problem cases?

None of the examples you've given qualify, IMO, as real-world file names except in cases where people or automated tools specifically intend to obfuscate their purpose or provoke problems in downstream processing.

That is not to say that such names as "wrong", but they are unconventional enough to cause issues for all sorts of tools.

My recommendation, if you need such files for testing your tool, is not to check in those files but to generate them as part of your build/test process.

(5) By Konstantin Khomutov (kostix) on 2025-07-03 14:11:18 in reply to 4 [link] [source]

A random fun fact: on Windows, you cannot create a file with a name ending in a period (ASCII 0x2E): unless you use a full path starting with the \\?\ prefix, the Win32 API will silently remove all the trailing dots from the filename before attempting to operate on it.

(Of course, there exist more boring limitations such as inability to create files named like NUL, CON$ etc.)

That is to say, if one is about allowing all sorts of weird stuff in filenames, one should not just use C's stdlib for this but actually write "drivers" ("support layers") for each supported platform, implementing particular platform's quirks.

And even then, there exist Unicode. Are "Fußball" and "Fussball" the same name or different? Are 'é' (U+00E9) and 'é' (U+0065, U+0301) the same character or different?

This is a rabbit hole, I think. 🤷

(6) By Stephan Beal (stephan) on 2025-07-03 14:25:41 in reply to 5 [link] [source]

That is to say, if one is about allowing all sorts of weird stuff in filenames, one should not just use C's stdlib for this but actually write "drivers" ("support layers") for each supported platform, implementing particular platform's quirks.

Patches would be thoughtfully considered :).

This is a rabbit hole, I think. 🤷

Indeed :).

(7) By litmit on 2025-07-03 14:35:47 in reply to 4 [link] [source]

fossil handles names like " %% " just fine. Your output demonstrates addremove having an apparent issue with them, but i cannot reproduce it using a random subset of the names you've shown:

That's because you're trying to reproduce a Windows-specific problem on Linux :)

Note, however, that it did fail to add ". " and "...". That's because their names begin with dots and addremove ignores those unless you pass the --dotfiles flag (which i only just now learned because i've only once (seen above) used addremove in my 17+ years of using fossil).

I'm try '--dotfiles' - nothing changed.

AFAIK, the solution to this problem is to use the "modern" Windows API (in quotes, since this API was introduced in NT about thirty years ago). For this need to prepend \\?\ to the path when API functions called.

That is not to say that such names as "wrong", but they are unconventional enough to cause issues for all sorts of tools.

Not 'all'. Some tools working well. FAR file manager allows to create and modify such files. WinRAR archiver can compress and uncompress their. My utility written on Javascript and use Node as engine. And I have't any problem when process such names and not write any special code for this.

(8) By Florian Balmer (florian.balmer) on 2025-07-03 14:50:40 in reply to 7 [link] [source]

Such special names are invalid in the Win32 layer, and can only be used in the native Windows NT layer, and only as fully-qualified (absolute) paths prefixed with \\?\. This is not something Fossil is ready to deal with, so failure is a reasonable option, here.

What bugs me a bit is that such file names make it into vfile and then later can cause some commands to stop working until after a revert:

> fossil addremove
ADDED ...
DELETED ...
added 1 files, deleted 1 files

> fossil sql "SELECT * FROM vfile WHERE pathname='...';"
1,1,0,1,0,0,0,0,NULL,'...',NULL,NULL

> fossil addremove
added 0 files, deleted 0 files

> fossil sql "SELECT * FROM vfile WHERE pathname='...';"
1,1,0,1,0,0,0,0,NULL,'...',NULL,NULL

> fossil diff
DELETED ...

> fossil sql "SELECT * FROM vfile WHERE pathname='...';"
1,1,1,1,0,0,0,0,1751552658,'...',NULL,NULL

> fossil diff
not an ordinary file: C:/<PATH/TO/REPOSITORY>/...
abort due to prior errors

> rem `fossil diff` will work again after `fossil revert`

At a quick glance, I don't see an easy way to fix this. But since dealing with such file names is really special, maybe it's not worth fixing.

As suggested by Stephan, files with such names should probably be generated as part of the build/test process.

(Fossil has some support to work with NT-style paths prefixed with \\?\, but unlike at the Win32 layer, only the backslash is accepted as a path separator, so the conversion from \ to / performed by Fossil will break this, anyway. That's why I haven't been able to come up with a solution for another problem, so far, see 9826189229.

(9) By Trevor (MelvaigT) on 2025-07-03 15:25:30 in reply to 4 [source]

FWIW I second Stephan's advice not to create your test files directly in Fossil.

Not just with these 'peculiar' names, but in general. Otherwise you will also run into problems with things like case flattening and more generally other ways of referring to the same file with different names. See also Konstantin's post which appeared while I wrote this one.

And don't forget Fossil doesn't guarantee to reproduce all the attributes if a file - it makes an attempt at things like executable bits for convenience, but you may also be interested in general permissions and datestamps etc.

My starting point is wondering under what conditions Fossil ADDREMOVE can reasonably decide to both ADD and DELETE the same file.

The nearest I can get is thinking that if a user does an ADD themselves but via OS commands either deletes or fails to actually create the file then it is expected that ADDREMOVE will DELETE it.

If Fossil does not distinguish between a user executed ADD and an ADD that it just did itself, then the double operation can occur if:

The files are listed by the 'list directory' API that Fossil uses, so get ADDed.

When Fossil then uses a different API to check for the existence of a specific file it gets the answer 'missing'. This might be because of permissions on the file, or something Fossil is doing wrong, but just as likely it could be an inconsistency in the OS such as checking that the name is "valid" before it actually looks for the file. You can sometimes see this in Explorer - you see a file in the display, but as soon as you try to do something with it (such as delete it) you are told it doesn't exist.

This was very common in the early days of NTFS when they tried to make it compatible with the old 8.3 FAT file names. I am old enough to remember that, and have the burnt fingers to prove it. In what sane world are 'A' 'a.' 'a' and 'A.' all the same file?

As to whether Fossil needs to change - certainly any attempt to validate names is futile, though I might think about doing a check before doing the ADD that I am not about to change my mind, i.e. see if I can actually find the specific file. If only to avoid the same conversation with somebody else in five years time.