Fossil Forum

Proposed: Block "fossil add" of new files with reserved names.
Login

Proposed: Block "fossil add" of new files with reserved names.

(1) By Richard Hipp (drh) on 2021-03-10 18:19:27 [source]

Windows does not allow files or directories whose names match one of:

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9

This caused problems last year in one of my projects when I created a new subdir call "aux" and checked it in, and only discovered the problem when I tried to open the repository later on Windows.

I propose that the "fossil add" command issue a warning and fail if any new file contains one of these reserved words, and provide a switch to override the failure for people who know for certain that their repo will never be used on windows.

While we are at it, perhaps "fossil add" should warn for other potential problems, such as illegal characters in names, names of excessive length, or two or more file/directory names that differ only in case. Are there other potential problems that Fossil should warn about?

(2) By Stephan Beal (stephan) on 2021-03-10 18:30:44 in reply to 1 [link] [source]

I propose that the "fossil add" command issue a warning and fail if any new file contains one of these reserved words, and provide a switch to override the failure for people who know for certain that their repo will never be used on windows.

Didn't we add that late last year? i'm on the tablet, so can't easily check easily the code, but ported such a function to libfossil recently:

https://fossil.wanderinghorse.net/r/libfossil/info/2d90220116bb9064

My recollection is that we did that in the scope of rejecting checkout db files from being added.

(3) By Daniel Dumitriu (danield) on 2021-03-10 20:40:59 in reply to 2 [link] [source]

This caused problems last year in one of my projects when I created a new subdir call "aux".

This has bitten me (but not seriously) in 2016 as I converted my SVN phd repository (work under Linux) to Fossil and tried to open it on Windows just for fun. There was a file aux.h and subsequently Windows bailed out. I wrote to Richard and this gave birth to a mailing list thread.

The next day Richard committed a function testing for the reserved words.

Didn't we add that late last year?

As far as I can tell though, it is currently only used when writing to a file, see here and here.

I think it is a good idea to have a setting (defaulting to On) to reject adding files with reserved names.

As for the other checks, they are reasonable, too. The question is, how many settings will govern them?

Off-topic: is it on purpose that the code uses sometimes #if _WIN32 instead of #if defined(_WIN32)?

(4) By Stephan Beal (stephan) on 2021-03-12 13:20:10 in reply to 3 [link] [source]

I think it is a good idea to have a setting (defaulting to On) to reject adding files with reserved names.

A setting seems like overkill because add is something one does very rarely, and adding of a reserved name is even rarer. i'd propose an --allow-reserved-names flag to add instead.

In any case, we ought to disable their addition by default. That won't affect any existing repos which have such files because they're not adding them again.

i'll go ahead and get that on a branch then we can debate the merits of it, if desired.

Off-topic: is it on purpose that the code uses sometimes #if _WIN32 instead of #if defined(_WIN32)?

No clue.

(5) By Daniel Dumitriu (danield) on 2021-03-12 14:18:59 in reply to 4 [link] [source]

I'll go ahead and get that on a branch then we can debate the merits of it, if desired.

That would of course be nice :-)

Off-topic: is it on purpose that the code uses sometimes #if _WIN32 instead of #if defined(_WIN32)?

Then if nobody objects, I'll go and replace them later. In this case according to Microsoft documentation it does not seem to be dangerous (if defined, then as the integer 1), but more often than not one does want to use defined().

(6) By Stephan Beal (stephan) on 2021-03-12 14:53:38 in reply to 4 [link] [source]

i'll go ahead and get that on a branch then we can debate the merits of it, if desired.

We have 3 different classes of reserved names:

  • Windows-only.
  • A static list which includes _FOSSIL_, .fslckout, and several variations of those (*.-wal, *-journal, etc.).
  • A dynamic list which includes names we don't know until runtime: repository name (including journal files), whether or not manifest, manifest.uuid, and manifest.tags are reserved (only if the manifest setting says to enable them)

add already silently excludes both the 2nd and 3rd categories of those names, and we have no need to warn for those because they are outright verboten in a repository. Though historical versions, prior to late 2020, allowed them to be added, the manifest parser now skips over such names which means they can no longer be checked out and won't be crosslinked in a rebuild, so they won't appear in the mlink table (but could be fetched using the artifact command, if really desired).

This trivial patch:

fossil:/timeline?r=add-allow-reserved-flag

Adds a check for the first variety of names, aborting add if such a name is provided unless --allow-reserved is used. (--allow-reserved-names or --allow-windows-names might better choices - y'all can decide that.)

[stephan@nuc:~/fossil/fossil]$ f add aux
Filename is reserved: aux
Use --allow-reserved to permit reserved filenames.
[stephan@nuc:~/fossil/fossil]$ f add aux --allow-reserved
ADDED  aux
[stephan@nuc:~/fossil/fossil]$ l aux
-rw-rw-r-- 1 stephan stephan 0 Mar 12 15:38 aux
[stephan@nuc:~/fossil/fossil]$ f rm aux
DELETED aux

(8) By Thomas Hess (luziferius) on 2021-03-17 15:31:28 in reply to 6 [link] [source]

How about other names invalid on Windows?
According to https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file, several symbols are invalid on Windows.

I created a fossil repository on my Linux machine and was able to create check-ins with these files, all of which are unsupported on Windows.

Short shell script to create some not-well-behaving files:

$ tee "file ending with space " "ending with dot." "less<than" "greater>than" "co:lon" '"double quoted"' "back\slash" "pipe|to|somewhere" "question?mark" "aster*sk" 'filename
with
newlines' <<< "content"
$ fossil add *
(I used a fossil binary compiled from check-in e11efff8e4de0aef)

(9) By Stephan Beal (stephan) on 2021-03-17 15:44:04 in reply to 8 [link] [source]

I created a fossil repository on my Linux machine and was able to create check-ins with these files, all of which are unsupported on Windows.

A warning certainly can't hurt, so long as we don't outright make them illegal. A patch which implements such checks would be happily considered :).

'filename with newlines'

Per the article you linked, any bytes in the range 0-31 are disallowed, so such a check would inherently prohibit newlines and tabs and whatnot. NUL isn't an issue because it would be recognized as the end of the filename by our code, effectively truncating the name (garbage in, garbage out).

(7) By Daniel Dumitriu (danield) on 2021-03-15 14:13:39 in reply to 1 [link] [source]

This caused problems last year in one of my projects when I created a new subdir call "aux" and checked it in.

I've just noticed by chance how Microsoft itself currently works around this :-)

C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Auxiliary\Build\vcvars32.bat
                                                                    ^^^^^^^^^

(10) By Marcos Cruz (programandala.net) on 2021-03-26 13:21:17 in reply to 1 [link] [source]

I've just met the first case, but the filename is not checked when its directory is added, is it a bug?

fossil add src/aux.bank.bas
Filename is reserved: src/aux.bank.bas
Use --allow-reserved to permit reserved filenames.
fossil add src/
ADDED  src/aux.bank.bas

I use Fossil 2.15-rc2 [e378f9300e] 2021-03-19 in Debian.

(11) By Stephan Beal (stephan) on 2021-03-26 16:25:32 in reply to 10 [link] [source]

I've just met the first case, but the filename is not checked when its directory is added, is it a bug?

Definitely, but which part is the bug: is a filename which starts with "aux" really illegal? i don't have Windows to try out such a name.

If someone will confirm whether such a name should be permitted or not i'll fix either the name check or in-a-dir check, as appropriate.

(12) By Larry Brasfield (larrybr) on 2021-03-26 17:17:03 in reply to 11 [link] [source]

The OS is not quite that stupid. However, the basename 'aux' or case-variations thereof, whether or not an extension is present, simply cannot be created as an ordinary file.

I don't know why this sort of thing persists in the modern age. (Or, maybe I do but hesitate to elaborate.)

(14) By Marcos Cruz (programandala.net) on 2021-03-26 18:52:24 in reply to 11 [link] [source]

Definitely, but which part is the bug: is a filename which starts with "aux" >really illegal? i don't have Windows to try out such a name.

I had the same doubt. I supposed the restriction was on filenames like "aux". I tried it on somebody's computer (Windows 7, I think): there's no way to create files or directories starting with "aux.", "con.", etc., even if they are long file names with more than one dot, i.e. the illegal word is not the whole name before the extension. I don't see the point of that restriction.

(15) By Stephan Beal (stephan) on 2021-03-26 18:58:47 in reply to 10 [link] [source]

fossil add src/

ADDED src/aux.bank.bas

We have a much larger inconsistency here: scanning a directory that way works fundamentally differently and happens at a lower level. Locally i have a fix for it but, because that step happens at a lower level, we cannot output a really informative warning for it:

# With a directory name as input:
$ f add foo
Skipping Windows-reserved filename: foo/prn.bar
Skipping Windows-reserved filename: foo/aux.foo
ADDED  foo/bar

# Now warns but allows them:
$ f add foo --allow-reserved
Including Windows-reserved filename: foo/prn.bar
Including Windows-reserved filename: foo/aux.foo
ADDED  foo/aux.foo
ADDED  foo/prn.bar

# With a filename as input:
$ f add foo/aux.foo 
Windows-reserved filename: foo/aux.foo
Use --allow-reserved to permit reserved filenames.

Note that the directory-scanning code does not know about the --allow-reserved flag, so cannot sensibly refer the user to it. Nor does the dir-scanning code have a way of telling the caller that it skipped 1 or more reserved names.

This isn't yet checked in - i'm still hoping to find a nice solution for the inconsistency which doesn't involve having to change the signature of vfile_scan().

(16) By Stephan Beal (stephan) on 2021-03-26 19:35:03 in reply to 15 [link] [source]

still hoping to find a nice solution for the inconsistency which doesn't involve having to change the signature of vfile_scan().

That's now fixed in trunk. It simply required delaying the is-reserved check until after all filenames had been collected.

It currently warns (non-fatally) about reserved names if --allow-reserved is used, but that's arguable. We might want to squelch that warning when that flag is used.

(13) By Dan Shearer (danshearer) on 2021-03-26 17:29:07 in reply to 1 [link] [source]

Richard Hipp (drh) said on 2021-03-10 18:19:27:

While we are at it, perhaps "fossil add" should warn for other potential problems, such as

I do think that is a good idea, and it won't need much maintenance. It also needs a policy as I will show below.

illegal characters in names

And on Windows at least, an expert eye is needed. As a classic party piece... if you think ":" is an illegal filename in Windows, then do this:

echo "This is a standard NTFS stream" > README.md:funkystream more < README.md:funkystream

I don't know much about it beyond having use streams before, and I do know that it is not true to say ":" is invalid on Windows in the general case. Think "fork" in terms of HFS filenames. We might well say "streams are unsupported in Fossil".

On the other hand, Windows utilities and libraries often don't check for filenames that are weird and confusing but in fact valid (or at least, accepted and possible to create) under Unix. For example filenames such as "../README.md" where the ".." is the name not the path, similarly with "*" embedded in filenames.

So the policy could be "we check for more kinds of reasonableness than operating systems do. If you truly want to live on the dangerous side, then turn off all checks. But Fossil generally knows best."

names of excessive length

This is another example where policy needs to be decided. Is Fossil going to warn that this might not be portable (ie stating its superior knowledge, regardless of the intentions of the user), or instead will Fossil only warn in this specific case when one end or the other might be compromised by the filename that is about to be created?

two or more file/directory names that differ only in case.

This could be one of three policies:

  • a human factors problem, in the same way that a filename with many occurrences of visually similar letters such as "11l11I1l1.txt", assuming common fonts and character sets. People could easily be confused so "please don't create that filename!", or
  • Fossil knows for sure that in this case one end has a case-sensitive filename and one does not, therefore there will certainly be a collision probably provoking an error, or
  • Fossil has superior knowledge about technical aspects of filenames and so just believe us. We don't attempt to guess what might confuse a human, only computers.

Are there other potential problems that Fossil should warn about?

Lots. But only with a policy decision can we decide what is a problem and what is now.

Dan Shearer

(17) By Scott Robison (sdr) on 2021-03-27 12:27:30 in reply to 1 [link] [source]

This isn't a "lesson" for DRH, but just some observations for those who seem critical of this convention. I might have some details not 100% factually exactly accurate, but the spirit is correct.

CP/M had no directories. It had magic device names.

86-DOS was implemented as an 8086 version to work like CP/M. It also didn't support directories, so it made use of magic device names.

IBM wanted to support more things, so the solution was more magic files. aux is (usually) an alias for a serial port. prn an alias for a printer. CP/M only supported one of each (probably). DOS allows more, so aux is an alias for a COM# port, and prn is an alias for an LPT# port.

Both operating systems used file control blocks and ignored extensions for the magic names and had very limited length file names, the infamous 8.3 format.

DOS 2.0 added directories, but backward compatibility required the magic device names to remain.

Windows was at first a GUI shell over the base operating system, not unlike X window system provides GUI infrastructure over a base operating system. Once the OS and GUI were "merged" in the NT line, it still tried to keep things working the same way.

I know many people like to be critical of Microsoft, and I do at times as well, but I admire their desire to maintain backward compatibility (even if they don't always succeed) vs the posix world, particularly Linux (but BSD as well). People in the posix camps like to just assume that everyone is happy to rebuild the world when something is done to improve or change the core OS. Microsoft has tried to make it so that binaries continue to work so that you don't have to have source code and build knowledge to update all the applications you might use in your operating system.

So we continue walking this path 40+ years later in CP/M derived systems (whether that be direct or indirect, actual derivation or inspirational derivation). While many DOS 2.0 and later APIs tried to behave more like posix (using integer handles instead of FCBs and adding nesting directory support), they weren't willing to throw away support for existing software with a new OS, making it less appealing to technical people who would like more "purity" in their OS, but making it very appealing to actual users who just wanted to run software.

We see this in our own community with SQLite which maintains backward compatibility to avoid breaking the investment people have in their SQLite based software.

(18) By Jan Nijtmans (jan.nijtmans) on 2021-03-30 07:25:49 in reply to 1 [link] [source]

While we are at it, perhaps "fossil add" should warn for other potential problems, such as illegal characters in names

On Windows, fossil already handles illegal characters in filenames (like "*:<>?|\"). When using them in the Win32 API, those characters are converted to Unicode characters in the range in the range 0xf000 to 0xf0ff. This is done fully transparently in order to to improve interoperability with Cygwin: This allows the Windows versions of Fossil to be used to check out a repository, and then use it under Cygwin.

Details: https://cygwin.com/cygwin-ug-net/using-specialnames.html

This also has the advantage that there's no worry using such characters any more in fossil on other platforms. It simply works. The files may look strange in explorer, but that's all.

Special names like "AUX" cannot be handled this way.

(you might have guessed: I did the implementation of this in Fossil. Or ... actually ... I copied it from Tcl, which does the same on Windows)

(19) By Dan Shearer (danshearer) on 2021-03-30 15:58:46 in reply to 18 [link] [source]

Jan Nijtmans (jan.nijtmans) said on 2021-03-30 07:25:49:

I suggested having a policy above, and this illustrates why:

On Windows, fossil already handles illegal characters in filenames (like "*:<>? ")

Fossil is not doing a complete job of illegal characters, even just for Windows. Some issues I know about are:

  • Windows platforms are not identical, because FAT also will not accept "^" while other Windows filesystems do.
  • Windows doesn't allow terminating a filename with " " (space) although it can appear anywhere else
  • Windows doesn't allow terminating a filename with "." even though it can appear anywhere else
  • On Windows the maximum path length is only very slightly longer (260 chars) than the maximum filename length (256) on NT/(250 on DOS)

And that is just Windows. In MacOS, while ":" is the only bad character, if you are doing things at the prompt (which a Fossil user probably is) then the list of characters is more restricted and pretty much the Unix list. And I am quite sure I have missed many other special cases.

All of which is stupidly complicated, which is why I recommend that Fossil has a policy rather that trying to keep all operating systems happy. I propose a policy where filenaming is one of two cases:

  1. Illegal characters/names/regex's in filenames for Fossil as a whole, which would logically be the names that Windows can't have and that Unix should not have, ie some opinion. That solution might be to use Unicode where possible, or something else. But definitely not limited to the illegal filename list for any one single OS.
  2. No illegal characters detected or blocked, ie at the user's risk of creating platform-specific timebombs, specified with --allow-invalid-filenames.

Dan Shearer