Fossil Forum

2.18: cannot clone "http://www.fossil-scm.org/fossil" to "fossil"
Login

2.18: cannot clone "http://www.fossil-scm.org/fossil" to "fossil"

2.18: cannot clone "http://www.fossil-scm.org/fossil" to "fossil"

(1) By gabriele balducci (balducci) on 2022-02-27 16:22:53 [link] [source]

hello

2.18 throws the following error when trying to clone http://www.fossil-scm.org/fossil to a file whose name is "fossil":

##> fossil version
This is fossil version 2.18 [84f25d7eb1] 2022-02-23 13:22:20 UTC
##> fossil clone --accept-any-cert http://www.fossil-scm.org/fossil fossil
not a valid repository: fossil

Changing the name to, eg, "fossil1" or any other name removes the error:

##> fossil clone --accept-any-cert http://www.fossil-scm.org/fossil fossil1
redirect with status 302 to https://fossil-scm.org/home
Round-trips: 10   Artifacts sent: 0  received: 55403
Clone done, wire bytes sent: 2913  received: 39932906  ip: 45.33.6.223
Rebuilding repository meta-data...
  100.1% complete...
Extra delta compression... 
Vacuuming the database... 
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id:  733b9ceaf50b367c9c36a4db23e4961e2aa97ca5
admin-user: balducci (password is "p6o9UuSkvC")

Also, cloning any other repo to a file called "fossil" works fine:

##> fossil clone --accept-any-cert http://core.tcl.tk/expect fossil
redirect with status 302 to https://core.tcl-lang.org/expect
Round-trips: 2   Artifacts sent: 0  received: 3113
Clone done, wire bytes sent: 804  received: 2963948  ip: 104.18.184.65
Rebuilding repository meta-data...
  100.0% complete...
Extra delta compression... 
Vacuuming the database... 
project-id: 2bfec29695662cfbe09350bd2e7d294a5169aa0e
server-id:  a972427746f145a5971911cbdae5f73387c0da92
admin-user: balducci (password is "zv3PZkpmCq")

This happens with 2.18 only: 2.17 (or any other previous version) has no problem whatsoever:

##> fossil version
This is fossil version 2.17 [f48180f2ff] 2021-10-09 14:43:10 UTC
##> fossil clone --accept-any-cert http://www.fossil-scm.org/fossil fossil
redirect with status 302 to https://fossil-scm.org/home
Round-trips: 9   Artifacts sent: 0  received: 55403
Clone done, wire bytes sent: 2659  received: 39932563  ip: 45.33.6.223
Rebuilding repository meta-data...
  100.1% complete...
Extra delta compression... 
Vacuuming the database... 
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id:  d5cdb671fba4f836cd97ac6e2e49c3999a15c864
admin-user: balducci (password is "Ca6vmRJQts")

Can you reproduce this? If so, is it intentional, or is there a problem?

Building from source on linux with default configure options.

##> uname -srvmo 
Linux 5.16.10 #1 SMP Thu Feb 17 10:02:50 CET 2022 x86_64 GNU/Linux
##> gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/stow.d/versions/gcc-11.2.0/usr/lib64/gcc/x86_64-pc-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/balducci/tmp/install-us-d/gcc-11.2.0.d/gcc-11.2.0/configure --prefix=/opt/stow.d/versions/gcc-11.2.0/usr --libdir=/opt/stow.d/versions/gcc-11.2.0/usr/lib64 --libexecdir=/opt/stow.d/versions/gcc-11.2.0/usr/lib64 --enable-shared --disable-bootstrap --enable-languages=c,c++,objc,fortran --enable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC) 

Thanks for your valuable work

ciao -gabriele

(2) By Warren Young (wyoung) on 2022-02-27 16:25:13 in reply to 1 [link] [source]

What does ls -l fossil yield?

(3) By Stephan Beal (stephan) on 2022-02-27 16:36:34 in reply to 1 [link] [source]

fossil clone --accept-any-cert http://www.fossil-scm.org/fossil fossil

not a valid repository: fossil

That error comes from only one place in the code where the target file is not a valid repository. My guess is that you have a file or directory named fossil in the directory you are trying to clone from (maybe even the fossil binary itself).

i am able to clone from that URL just fine, but note that /fossil is no longer the preferred path: /home is the standard since sometime last year.

FWIW, giving the clone the same name as the fossil binary sounds like a recipe for confusion. i strongly recommend giving the cloned file a file extension.

(4) By sean (jungleboogie) on 2022-02-27 19:28:25 in reply to 1 [link] [source]

Try

Fossil clone https://fossil-scm.org —workdir fossil

This will make a fossil-scm.fossil file and open the repo in the fossil directory.

There might be an issue if a) the directory exists b) the directory isn’t empty.

(5) By gabriele balducci (balducci) on 2022-02-28 08:12:46 in reply to 1 [link] [source]

thank you very much to all that took the time to reply

I'm embarrassed to report that now everything works fine

Yet I'm pretty sure that, for some unexplained reason, the problem I reported was there; I'm absolutely sure that there wasn't any file/directory named

fossil
in the directory where I was making the tests.

I can add that the same error was also reported (thus quite independently) by a cron script which I run nightly and clones http://www.fossil-scm.org/fossil for monitoring development: this script is running untouched since years

But, yes: the ultimate reason must be some weird thing on my side

So: big apologies for the noise and thank you very much again

ciao

-gabriele

(6) By sean (jungleboogie) on 2022-02-28 14:50:56 in reply to 5 [link] [source]

I can add that the same error was also reported (thus quite independently) by a cron script which I run nightly and clones http://www.fossil-scm.org/fossil for monitoring development: this script is running untouched since years

Am i understanding it correctly that you have a script that clones the repo daily? Did you know that's not necessary? You can use fossil update trunk while inside the already cloned and opened fossil repo.

I have a simple shell function that enters the Fossil clone, updates, and begins making and installing fossil. Re-cloning the repo isn't necessary in my case.

(7) By Stephan Beal (stephan) on 2022-02-28 15:06:42 in reply to 6 [link] [source]

Did you know that's not necessary? You can use fossil update trunk while inside the already cloned and opened fossil repo.

Similarly, without a checkout:

fossil pull -R /local/repo/file.fossil

will pull in all remote changes. That's how i back up dozens of remote repos via a daily cron job running on a Raspberry pi.

Doing a full clone every day is a tremendous waste of both one's own computing resources and the remote host's.

(8) By gabriele balducci (balducci) on 2022-02-28 15:44:47 in reply to 7 [link] [source]

I understand your point

But fossil is one of (presently) 1441 packages that I monitor and maintain from source

Keeping locally all of those packages isn't something I have ever considered

For many of the packages I just monitor a web page, "grepping" for new releases; but my (~15 years) experience is that the most reliable method is to monitor the {git,fossil,subversion,mercurial...} repositories, as the layout of the web pages often changes and the risk is to lose contact without noticing

As to the band and resource waste: yes, it actually is

But if I compare my internet use with the zillions of people browsing the web for hours/day, well...

My monitoring script starts at 02:00am and usually ends by ~04:15am

thanks

ciao

-gabriele

(9) By sean (jungleboogie) on 2022-02-28 15:55:16 in reply to 8 [link] [source]

I don't understand.

Stephan and myself gave you two options that do exactly what you want without the need to download the fossil repo from scratch each day.

Are you saying neither one of these options are sufficient to/for you? If not, why not?

(10) By Stephan Beal (stephan) on 2022-02-28 16:34:06 in reply to 9 [link] [source]

Are you saying neither one of these options are sufficient to/for you? If not, why not?

My understanding is that Gabriele is, for 1400+ packages, downloading them, building them, then discarding them. The alternative would be keeping copies of all of those packages locally, which would have significant storage cost.

FWIW, Gabriele, storing them locally would not only save the download time, but would give you a way to avoid rebuilding the projects if they haven't changed. Your current process, as i understand it, cannot figure out if a project has changed and will simply rebuild it every time you run your process. Most projects don't update daily, so being able to pull from a remote and perform a build on a current checkout will runs many, many times faster than performing a full clone and rebuild each time. A rough calculation suggests that if every one of those 1440 repositories averages to 50MB each (most are very likely smaller), you'd need roughly 75GB of storage for all of them.

That's not me saying "your process is wrong, don't do that," just suggesting an alternative which may save you time long-term (and will certainly save 1440 other computers time). That said: your current process obviously works for you and working code always trumps hypothetical code.

(11) By Dan Shearer (danshearer) on 2022-02-28 17:10:41 in reply to 10 [link] [source]

Stephan Beal (stephan) on 2022-02-28 16:34:06:

storing them locally would not only save the download time, but would give you a way to avoid rebuilding the projects if they haven't changed.

This does become more complicated when there are many upstreams, multiple different SCMs, many different human-compatible versioning/naming systems, etc.

One of the functions of Not-forking is to address exactly this. Not-forking keeps a clever cache locally and deals with a lot of the upstream churn and noise. It knows the difference between a name/numbered release and building from a particular commit ID.

Not-forking was initially intended to solve the problem of managing the problem of combining trees, but it turned out that the problem discussed in this thread had to be solved as well. And to use Stephan's words, its very much working code, running thousands of times a day on a fast cluster as well as at human scale on laptops.

Dan Shearer

(12) By gabriele balducci (balducci) on 2022-02-28 17:32:55 in reply to 10 [link] [source]

I was replying to jungleboogie, but you were faster ...

Yes: I want to avoid keeping clone repositories on my local machine

That's why I clone the packages "from scratch" every night...

And no: I don't rebuild 1400+ packages daily!

For many of them, as I said, I just monitor a web page and know when a new version has been released.

For others, it's better to clone locally and analyze the log to know about new tags etc

There are even projects which are actively developed, but new versions are released with a very low frequency: in those cases I prefer to build/install after a given number of commits, to stay in touch.

About the suggestion to maintain local clones: the maintainance scheme I use for my system has grown as it presently is over many years and changing it would be a lot of work, which I haven't the time to do...

BTW: I discovered why my initially reported problem didn't show any longer: in the mess of doing many things at the same time I was using fossil-2.17: reverting to 2.18 brings back the problem (OUCH!)

As I said, I'm pretty sure that there must be something wrong on my side, since apparently I'm the only one stumbling on this

But, in any case, here is a step by step shell script which shows the problem, just in case anybody wants to try...

## cd /tmp
## mkdir test_dir
## cd test_dir
## pwd
/tmp/test_dir
## ls
./  ../
## fossil version
This is fossil version 2.18 [84f25d7eb1] 2022-02-23 13:22:20 UTC
## fossil clone --accept-any-cert https://fossil-scm.org/home fossil
not a valid repository: fossil
## ls
./  ../  fossil
## fossil timeline -n 3 -R ./fossil
not a valid repository: ./fossil
## 

Changing the name of the local file to anything different from "fossil" makes the error go away:

## cd /tmp
## mkdir test_dir1
## cd test_dir1
## pwd
/tmp/test_dir1
## ls
./  ../
## fossil version
This is fossil version 2.18 [84f25d7eb1] 2022-02-23 13:22:20 UTC
## fossil clone --accept-any-cert https://fossil-scm.org/home foo
Round-trips: 10   Artifacts sent: 0  received: 55405
Clone done, wire bytes sent: 2645  received: 39933233  ip: 45.33.6.223
Rebuilding repository meta-data...
  100.1% complete...
Extra delta compression... 
Vacuuming the database... 
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id:  06c1726c904bd9158163ee783d0eef6848307f1c
admin-user: balducci (password is "7wmJMdSAoz")
## ls
./  ../  foo
## fossil timeline -n 3 -R foo
=== 2022-02-28 ===
14:05:04 [369d7d1a20] /md_rules: replaced 'complex' with 'more', per /chat
         discussion. (user: stephan tags: trunk)
=== 2022-02-26 ===
14:37:38 [8af827342f] Removed ENABLE_JSON1 flag from tools/sqlcompattest.c
         because that flag is no longer in sqlite3 as of 3.38, which is the
         current minimum required version. Reported in [forum:549da79dd9 |
         forum post 549da79dd9]. (user: stephan tags: trunk)
01:36:21 [d862cb71d6] globs.md: clarified that globs apply to the whole
         dir/filename combination without any awareness/special treatment of
         the directory part, as suggested in [forum:6637b92a6a17a6bc | forum
         post 6637b92a6a17a6bc]. (user: stephan tags: trunk)
--- entry limit (3) reached ---
## 

thank you very much again

ciao

-gabriele

(13) By sean (jungleboogie) on 2022-02-28 17:58:53 in reply to 12 [link] [source]

Thanks for the reply.

Of the Fossil repos that you clone, how many of them do you not have an extension on?

I tested your above steps with the Fossil repo, SQLite repo and althttpd repo.

The only one I had issues (like you) was with the Fossil repo.

Typically people have a .fossil suffix for their repos. Interesting that you don't and has never been a problem.

If you build Fossil from source upon changes, when did this first occur?

(14) By Warren Young (wyoung) on 2022-02-28 18:28:00 in reply to 13 [link] [source]

I'm seeing this with tip-of-trunk. Something is seriously jacked up here.

Even pared down to basics, the OP's key command fails here on macOS 12:

  $ cd ~/tmp ; fossil clone https://fossil-scm.org/home fossil
  not a valid repository: fossil

That's with no fossil file in ~/tmp. After the command fails, there is a broken ~/tmp/fossil repo DB that opens in sqlite3 and attaches in fossil sql --no-repository, but you can't "open" it.

You can do without the cd above thus:

  $ fossil clone --chdir ~/tmp https://fossil-scm.org/home fossil

…and it will still fail.

BUT! Do the same with a just-built binary in Fossil checkout directory, and it succeeds!

  $ cd ~/src/fossil/trunk ; ./fossil clone --chdir ~/tmp https://fossil-scm.org/home fossil

(You need the --chdir here because ./fossil is that just-built binary.)

More, it also succeeds if you do the inverse:

  $ cd ~/tmp ; ~/src/fossil/trunk/fossil clone https://fossil-scm.org/home fossil

Thus, the value of argv[0] seems to have something to do with this.

But why?

Very strange.

I bring up much of this because the requirement to run the command from the PATH to see the symptom makes bisecting difficult. You have to install it each time to test for the failure.

(16) By Stephan Beal (stephan) on 2022-02-28 18:45:12 in reply to 14 [link] [source]

Thus, the value of argv[0] seems to have something to do with this.

Indeed. i can reproduce it on a BSD system (not sure which flavor):

[stephan@itac:~/tmp]$ ls -la fossil
ls: fossil: No such file or directory
[stephan@itac:~/tmp]$ fossil clone https://fossil-scm.org/home fossil
not a valid repository: fossil
[stephan@itac:~/tmp]$ ls -la fossil
-rw-r--r--  1 stephan  stephan  225305 Feb 28 18:40 fossil
[stephan@itac:~/tmp]$ rm fossil
[stephan@itac:~/tmp]$ ~/bin/fossil clone https://fossil-scm.org/home fossil
Round-trips: 4   Artifacts sent: 0  received: 35580
...

And i can now reproduce it on my linux systems thanks to Warren's hint about argv[0]: my ~/bin/fossil is not the fossil binary, but is a script which searches for a fossil binary in one of several places then exec's it with its full path. With that script, everything works as expected. If i replace that script with a plain fossil binary and repeat the above steps, it fails on Linux the same way.

(15) By Stephan Beal (stephan) on 2022-02-28 18:36:56 in reply to 12 [link] [source]

As I said, I'm pretty sure that there must be something wrong on my side, since apparently I'm the only one stumbling on this...

Actually, Sean and Martin G have been able to reproduce it on their BSD systems, so it may be a BSD-specific issue. Presumably you're on some flavor of BSD? i'm currently trying to reproduce it on a remote BSD machine as well but it takes ages to build fossil on that machine. i've been unable to reproduce it on Linux on x86 and ARM platforms.

(17.1) By Stephan Beal (stephan) on 2022-02-28 19:54:07 edited from 17.0 in reply to 1 [link] [source]

2.18 throws the following error when trying to clone http://www.fossil-scm.org/fossil to a file whose name is "fossil":

The mystery is solved:

This bit of code is an ancient esoteric fossil feature which allows fossil to use its own binary as a repository db, so that a repo and binary can be combined into a single file for distribution purposesmisref.

That check...

  1. Misinteracts with a new bit of code which asserted that the db file's size was a multiple of 512 bytes. That size check is valid in all cases exception this one.
  2. Is not strictly correct because the string comparison it is using is only correct when fossil is called with a directory component prefix (e.g. ~/bin/fossil or ./fossil). When the binary's name is resolved via the PATH, but the pre-resolved name is equal to the db name, that block triggers but it's semantically incorrect in that case.

How best to resolve that (no pun intended) discrepancy is not yet clear.

[1]: (edit) i may be confusing two things: the appendvfs support added ages ago with this more recent change (2018) just uncovered by Martin G.


  1. ^ Misreference

(18) By Dan Shearer (danshearer) on 2022-02-28 21:25:14 in reply to 17.1 [link] [source]

Stephan Beal (stephan) on 2022-02-28 19:54:07:

That was an impressive bit of collaborative bug squashing. Well done all.

[...] an ancient esoteric fossil feature which allows fossil to use its own binary as a repository db [...]

This feature (which I had never heard of either) has new relevance since Fossil 2.18 includes its own https server. Together they mean Fossil can now be a full-featured SCM with only rudimentary libc support. This is good for tiny embedded systems and other restricted environments.

Dan

(19) By Larry Brasfield (larrybr) on 2022-02-28 21:31:14 in reply to 18 [link] [source]

Just beware that the repo DB is limited to about 1GB when it is accessed using appendvfs.

(20) By Stephan Beal (stephan) on 2022-02-28 21:35:50 in reply to 18 [link] [source]

That was an impressive bit of collaborative bug squashing.

It was thoroughly collaborative - we had half a dozen folks on this one, i think :).

We have a preliminary fix at fossil:/timeline?r=nameofexe-appendvfs-check, pending testing by a Windows user.

For completeness's sake, here's how you can use the appendvfs feature (i only learned this in the past hour):

[stephan@nuc:~/fossil/fossil]$ cp -p fossil x; cp ../cwal.fsl r; sqlite3 -append x "attach r as r" "vacuum r into 'x'"
[stephan@nuc:~/fossil/fossil]$ ./x

That will launch the UI with the embedded repo.

The fix in the branch linked to above ensures that if that binary name is pass as its own db name, e.g.:

$ ./x ui x

then it correctly sees that both refer to the same file (even if their user-entered paths are different) and Does The Right Thing.

(21) By gabriele balducci (balducci) on 2022-03-01 09:20:37 in reply to 20 [link] [source]

We have a preliminary fix at fossil:/timeline?r=nameofexe-appendvfs-check, pending testing by a Windows user.

this is just to confirm that the above check-in does indeed fix things for me

thank you very much

ciao

-gabriele

(22) By Florian Balmer (florian.balmer) on 2022-03-03 06:20:01 in reply to 20 [link] [source]

Is there any specific reason for this change?

On Windows, the best way to retrieve the executable file name is to call GetModuleFileNameW(). The Windows NT image loader stores the fully qualified (absolute) path name of the executable as a (counted) UNICODE_STRING in the PEB (Process Environment Block), so GetModuleFileNameW() boils down to a memcpy() call.

This seems less complicated than looking for the executable file in the current directory, or in the PATH environment variable, and assuming it has a .exe extension (it could be renamed to fossil.com, for example). Also, the file_fullexename() approach relies on the fact that it be called before processing both the global --chdir option and the CGI setenv: control line, so is more fragile regarding to future edits.

The only problem with the old code is that in theory, the executable file name could be longer than MAX_PATH -- but I guess both the Windows NT loader and Fossil itself will have other problems in this case.

The following solution restores the GetModuleFileNameW() call, and tries to handle paths beyond MAX_PATH:

Index: src/main.c
==================================================================
--- src/main.c
+++ src/main.c
@@ -420,20 +420,39 @@
   unsigned int nArg;        /* Number of new arguments */
   char *z;                  /* General use string pointer */
   char **newArgv;           /* New expanded g.argv under construction */
   const char *zFileName;    /* input file name */
   FILE *inFile;             /* input FILE */
+#ifdef _WIN32
+  WCHAR buf[MAX_PATH];
+  DWORD dwRet;
+#endif

   g.argc = argc;
   g.argv = argv;
   sqlite3_initialize();
 #if defined(_WIN32) && defined(BROKEN_MINGW_CMDLINE)
   for(i=0; i<g.argc; i++) g.argv[i] = fossil_mbcs_to_utf8(g.argv[i]);
 #else
   for(i=0; i<g.argc; i++) g.argv[i] = fossil_path_to_utf8(g.argv[i]);
 #endif
-  g.nameOfExe = file_fullexename(g.argv[0]);
+#ifdef _WIN32
+  dwRet = GetModuleFileNameW(NULL, buf, ARRAYSIZE(buf));
+  assert( dwRet!=0 );
+  if( dwRet<ARRAYSIZE(buf) ){   /* Success: static buffer was large enough. */
+    g.nameOfExe = fossil_path_to_utf8(buf);
+  }else{
+    SIZE_T nbuf = USHRT_MAX/2 * sizeof(WCHAR) + 1; /* UNICODE_STRING limit. */
+    LPWSTR mbuf = fossil_malloc(nbuf);
+    dwRet = GetModuleFileNameW(NULL, mbuf, nbuf);
+    assert( dwRet!=0 && dwRet<nbuf );
+    g.nameOfExe = fossil_path_to_utf8(mbuf);
+    fossil_free(mbuf);
+  }
+#else
+  g.nameOfExe = g.argv[0];
+#endif
   for(i=1; i<g.argc-1; i++){
     z = g.argv[i];
     if( z[0]!='-' ) continue;
     z++;
     if( z[0]=='-' ) z++;

Note that unlike most Win32 APIs, for GetModuleFileNameW() there's no way to query the required buffer size in advance, i.e. by calling the function with a NULL buffer and a zero buffer size argument, so the maximum size is allocated for the unlikely case that MAX_PATH was too short.

The limit is due to the UNICODE_STRING structure: the unsignet 16-bit MaximumLength field allows up to USHRT_MAX (65535) bytes, corresponding to USHRT_MAX/2 (32767) WCHARs, without a terminating NULL. The practical limit for extended length path names is slightly lower, however.

Is there any problem I'm overlooking here, or is it fine to commit?

(23) By Stephan Beal (stephan) on 2022-03-03 07:03:58 in reply to 22 [link] [source]

Is there any specific reason for this change?

Because if we don't then we can't be guaranteed that the canonicalization of the db filename and the binary filename work identically.

It also simplifies that code by removing a Windows-specific dependency.

it could be renamed to fossil.com, for example

No, it can't. COM files are limited to 64kb: http://fileformats.archiveteam.org/wiki/DOS_executable_(.com)

  • g.nameOfExe = file_fullexename(g.argv[0]);

We have to use file_fullexename(), or equivalent, to resolve the bug discovered in this thread. We cannot simply assign from argv[0] - that was the cause of the bug. We already have/had that function (it's not new) and it's known to work, so there would seem to be no reason not to use it.

On Windows, the best way to retrieve the executable file name is to call GetModuleFileNameW().

The current change works as expected, according to those who have tested it on Windows. Why replace it with a huge block of Windows-specific code:

+#ifdef _WIN32
+  dwRet = GetModuleFileNameW(NULL, buf, ARRAYSIZE(buf));
+  assert( dwRet!=0 );
+  if( dwRet<ARRAYSIZE(buf) ){   /* Success: static buffer was large enough. */
+    g.nameOfExe = fossil_path_to_utf8(buf);
+  }else{
+    SIZE_T nbuf = USHRT_MAX/2 * sizeof(WCHAR) + 1; /* UNICODE_STRING limit. */
+    LPWSTR mbuf = fossil_malloc(nbuf);
+    dwRet = GetModuleFileNameW(NULL, mbuf, nbuf);
+    assert( dwRet!=0 && dwRet<nbuf );
+    g.nameOfExe = fossil_path_to_utf8(mbuf);
+    fossil_free(mbuf);
+  }
+#else

:-?

(24) By Florian Balmer (florian.balmer) on 2022-03-03 07:21:18 in reply to 23 [link] [source]

The current change works as expected, according to those who have tested it on Windows. Why replace it with a huge block of Windows-specific code ...

I don't think that block is "huge" ... and, on Windows, any PE executable file of any size can be renamed to .com -- I tested that before posting, of course.

Because if we don't then we can't be guaranteed that the canonicalization of the db filename and the binary filename work identically.

I'm not sure if I get what you mean: they always will and have to work identically, because there's only one canonical form -- the canonical form -- for file names on Windows. The only relevant consideration is that file names may be case-insensitive.

(25) By Stephan Beal (stephan) on 2022-03-03 07:34:02 in reply to 24 [link] [source]

I don't think that block is "huge"

It's 13x larger than the current code and doesn't solve a known problem. The current code is not demonstrably broken.

If you can find a technical fault/bug with the current solution, then by all means let's revert it. If it's just a matter of principle, however, then let's leave it like it is. Adding 14 lines of platform-specific code which doesn't solve a known problem - just an imaginary/hypothetical one (e.g. MAX_PATH limits) - seems unnecessary.

I'm not sure if I get what you mean: they always will and have to work identically, because there's only one canonical form -- the canonical form -- for file names on Windows.

Indeed, but the how they get canonicalized needs to be 100% identical. Even if it's buggy, it needs to be 100% consistent. We can't guaranty that if we use one vendor's API for one canonicalization and our own code for another canonicalization.

During the /chat discussion of this bug, Warren brought up the topic of "one version of the truth." We had two different algorithms for checking whether a db is really a repository db and they didn't agree with each other 100%. We have the same issue here: since this change, we have a "single source of truth" for the executable's canonicalized form instead of having two. That's a feature, not a bug.

(26) By Florian Balmer (florian.balmer) on 2022-03-03 07:58:01 in reply to 25 [link] [source]

I agree, bug-for-bug compatibility is important, here. Thanks for clarification.

Also, my "it doesn't work with fossil.com" argument is not relevant, of course, but I suspect there may be more problems with similar patterns. (The CreateProcess() API allows loading executable files with any -- or without any -- extension. The PATHEXT variable with its list of .EXE, .COM, .BAT, .CMD, ... is just a hint/convention to the shell.)

Anyway, I have disabled the code to check whether a repository DB is appended to the executable file in my patched version of Fossil right from the day it was added, for the very fear of hidden and late-manifesting bugs that may be introduced with such "seemingly small and simple" features.

And, I don't really care about the MAX_PATH limit, this was just a fly-by fix while touching the code. Because 14 lines of C code is not the same as 14 lines of JS code with respect to executable file size, if that's what you're worried about ;-)

Thanks! Take care all of you!

(27) By Florian Balmer (florian.balmer) on 2022-03-03 08:15:42 in reply to 26 [link] [source]

Anyway, I have disabled the code to check whether a repository DB is appended to the executable file in my patched version of Fossil right from the day it was added, for the very fear of hidden and late-manifesting bugs that may be introduced with such "seemingly small and simple" features.

That sentence should be:

Anyway, I have disabled the code to check whether a repository DB is appended to the executable file in my patched version of Fossil right from the day it was added, for the very fear of hidden and late-manifesting bugs that may be introduced with such "seemingly small and simple" features that nobody would ever use.

The sudden appearing of "random" bugs somehow proves the bold part? Removing that feature sounds like an alternative to me -- but of course not for others!

That's it. Bye for now, and really take care all of you!

(28) By Martin Gagnon (mgagnon) on 2022-03-03 19:38:08 in reply to 22 [link] [source]

Why not moving the original windows code (Or your improved version) inside the #ifdef _WIN32 block of the file_fullexename() function (slightly modified to return pointer allocated with fossil_malloc() to match Unix implementation) ?

We would also get only one code path in windows to get the full executable path. Because this part of code that you don’t like inside file_fullexename() is already called in other places anyway.

What do you think ?

(29) By Florian Balmer (florian.balmer) on 2022-03-04 06:08:42 in reply to 28 [link] [source]

Yes, I have also been thinking of various alternatives, in the meantime.

My coding attempt was to "restore the deleted code in-place", without any effort to "modularize" it. Sorry for my laziness.

Other alternatives may be to call GetFullPathNameW(), the Windows built-in API function to canonicalize file names, just once on the repository DB file name right before comparing it to the path returned by GetModuleFileNameW(). Modifying file_canonical_name() to always use GetFullPathNameW() seemed too risky to me, so far, even for my own patches, as Fossil may rely on its own path canonicalization format here and there.

Stephan is right that nothing is broken, at the moment, but I suspect one day there may be trouble we don't see, yet.

On Unix, it seems like a convention that env -i fossil only works with absolute paths. On Windows, however, the CreateProcessAsUserW() and CreateProcessWithLogonW() APIs, for example exposed through the runas utility, perform PATH searches in the calling environment, and not in the environment of the new process -- so (strange) bugs may or may not happen, one day.

But I don't want to insist here, and push this "as a matter of principle", as Stephan said. However, the principle could also be to write "correct" code for each target platform.

BTW, the old mailing list thread about the feature which revealed the current problem is interesting, and there was quite some reservations. My impression why the feature really made it to trunk is to have a proof-of-concept, and to use Fossil as a testbed for SQLite.

But why I really think nobody ever uses the feature to append repository DB files to executable images:

  • It's forgotten/unknown (obviously, nobody seemed to be aware of it, these days?).
  • It's not portable (i.e. how to access the repository appended to the platform-specific executable image on another platform?).
  • It's not secure (i.e. how to update a buggy Fossil version?).
  • It's not as "self-contained" as it may seem (i.e. a "single-file" embedded HTTPS server would at least require additional certificate files?).

So why not just disable/remove this feature, which seems to somewhat contradict the Fossil philosophy, and simply restore the old GetModuleFileNameW() code?

(34) By Florian Balmer (florian.balmer) on 2022-03-06 11:42:19 in reply to 29 [link] [source]

... CreateProcessAsUserW() and CreateProcessWithLogonW() ...

It's much easier to prove the failure:

Copy Fossil to "C:\test1\fossil.exe".

Save the following program as "launch.c", build with cl launch.c, and move the output to "C:\test1\test2\launch.exe".

#include <windows.h>
main()
{
  PROCESS_INFORMATION pi;
  STARTUPINFOW si = {0};
  si.cb = sizeof(STARTUPINFOW);
  if(CreateProcessW(
        NULL,
        L"..\\fossil.exe test-echo",
        NULL,
        NULL,
        FALSE,
        CREATE_DEFAULT_ERROR_MODE,
        NULL,
        L"C:\\Windows\\System32", /* ← Change this line to see the effect. */
        &si,
        &pi))
    WaitForSingleObject(pi.hProcess,INFINITE);
}

Wrong output with current trunk [ab7ad2348c], where Fossil thinks its located in "C:\Windows".

C:\test1\test2>launch.exe
g.nameOfExe = [..\fossil.exe]
argv[0] = [..\fossil.exe]
argv[1] = [test-echo]

Correct output if the startup directory "C:\Windows\System32" in the test program is replaced with NULL, or if the test is performed with a version of Fossil before [ab7ad2348c]:

C:\test1\test2>launch.exe
g.nameOfExe = [C:\test1\fossil.exe]
argv[0] = [..\fossil.exe]
argv[1] = [test-echo]

BTW: The file_fullexename() function is used in the test-which command, and probably intended as a general-purpose function, so it's probably not safe to patch it to use GetModuleFileNameW().

I just wanted to share my findings.

Thanks!

(30.1) By Florian Balmer (florian.balmer) on 2022-03-04 06:25:23 edited from 30.0 in reply to 28 [link] [source]

Why not moving the original windows code (Or your improved version) inside the

#ifdef _WIN32 block of the file_fullexename() function (slightly modified to return pointer allocated with fossil_malloc() to match Unix implementation)?

Are we sure file_fullexename() is not used for other purposes than finding the own executable path?

Also, I'm utterly puzzled by this patch! How can changes inside #ifdef _WIN32 trigger a bug on Macs? Also note that the patch is incorrect, i.e. it returns a (non-static/function-local) wchar_t* variable, while the function declaration says it returns a char*, and the comment says from fossil_malloc()?

Edit: fixed hyperlink.

(31) By Stephan Beal (stephan) on 2022-03-04 06:54:59 in reply to 30.1 [link] [source]

Also, I'm utterly puzzled by [this patch]! How can changes inside #ifdef _WIN32 trigger a bug on Macs?

That was an in-development/incomplete patch. He only posted it in /chat because i asked him to so we could see if i could reproduce the weird diff rendering bug he and jshoyer are seeing.

(32) By Martin Gagnon (mgagnon) on 2022-03-04 11:45:46 in reply to 31 [source]

Thanks Stephan, however I have mention exactly this relative to the memory that needs to be allocated by fossil_malloc()on the next sentence inside the (….) just after the part of my reply you quote.

This patch doesn’t cause the bug, making the diff using mac binary version 2.18 from fossil website download page trigger same bug. I just tough this diff was special, but realize any diff slightly bigger than a few lines of changes will do the same. And this only when generating HTML diff output.

(33) By Florian Balmer (florian.balmer) on 2022-03-04 12:17:22 in reply to 32 [link] [source]

Thanks Stephan, however I have mention exactly this relative to the memory that needs to be allocated by fossil_malloc()on the next sentence inside the (….) just after the part of my reply you quote.

Yeah, I got that ... I should have replied on the other thread, sorry for the confusion.

This patch doesn’t cause the bug, making the diff using mac binary version 2.18 from fossil website download page trigger same bug. I just tough this diff was special, but realize any diff slightly bigger than a few lines of changes will do the same. And this only when generating HTML diff output.

But I missed that one! This is a patch to apply so that Fossil has a diff to show that triggers the bug -- it's not a patch to Fossil that triggers the bug after rebuilding Fossil. Oh, my!