fossil CGI with directory: warning missing

(1) By Heiko (scheit) on 2022-06-22 08:56:11 [source]

Hi there,

I am using the fossil CGI feature with 'directory: <dir>' including subdirectories.

If you now have the following setup:

foo.fossil
foo/bar.fossil

Both show up fine in the 'repolist', but when one tries to access

   foo/bar.fossil

it (obviously) does not work, as the the URL

   .../foo/bar/home

looks identical to a URL accessing a page 'bar' in the repository 'foo(.fossil)'.

What to do?

1) The manual should state clearly that 'repos' should not have an identical name to a subdirectory.  (Maybe it does already, but I missed it.)

2) The 'repolist' should show 'foo/bar.fossil' as inaccessible, so

 - don't link to a URL and

 - put the text 'foo/bar.fossil_no-access_as_shadowed_by_repo_bar.fossil'

Or just leave it out of the list and add on top (in red): 
"One or more repos in directories are not accessible, since the directory is shadowed by a repo of the same name.  Please rename the repo or the directory."  Then give the list of inaccessible repos and the shadowing repo.

This is an issue, also if the 'repolist' is not used.   Not sure how to handle it then.  Maybe for any URL containing '/foo/' an error message should be emitted... making all involved repos inaccessible, until the setup is fixed.

(2.1) By Stephan Beal (stephan) on 2022-06-22 09:12:26 edited from 2.0 in reply to 1 [link] [source]

Not sure how to handle it then. Maybe for any URL containing '/foo/' an error message should be emitted... making all involved repos inaccessible, until the setup is fixed.

Whether a subdirectory should take precedence over a repository with the same base name, or whether the repository should take precedence, is debatable. Before fixing this i'd be interested in hearing other folks' opinions on whether the top-most repository or the subdirectory should take precedence. We can't, as Heiko points out, serve both because that would result in ambiguous URIs.

To the best of my recollection this corner case has never come up before.

Or just leave it out of the list and add on top (in red):

We can't add that at the top because the page is rendered as the repositories are read. By the time we know there's a conflict, the top of the page has already been rendered. What we could do is add the "foo" entry in the list with no link and, in place of its repo description, add text warning that it's being ignored due to shadowing.

Edit:

What to do?

"Don't do that!" For the time being, renaming your directories so that they don't match a repository is the only approach which will permit access to both repositories.

(3) By Stephan Beal (stephan) on 2022-06-22 09:35:34 in reply to 2.1 [link] [source]

Whether a subdirectory should take precedence over a repository with the same base name, or whether the repository should take precedence, is debatable.

Fixing this such that both /foo and /foo/bar work intuitively would require patching both the directory listing part and the URI dispatcher to check for the ambiguity. If we disallow such subdirectories and emit a warning in the directory list, we don't need to touch the URI dispatching code.

My personal preference would be to simply document this inability as a limitation and, at most, replace the colliding foo/bar entries in the dir listing with a warning about this limitation.

(4) By Stephan Beal (stephan) on 2022-06-22 11:14:23 in reply to 1 [link] [source]

Not sure how to handle it then. Maybe for any URL containing '/foo/' an error message should be emitted... making all involved repos inaccessible, until the setup is fixed.

Based on a /chat discussion, that's approximately the approach we took. This is now in the trunk. When both X.fossil and X/Y.fossil exist, the repolist entry for X/Y.fossil is listed but is ~~stricken out~~ and not linked, and includes a very brief description of why it's not linked.

(5) By Heiko (scheit) on 2022-06-22 12:02:07 in reply to 4 [link] [source]

The 'repolist' is not the only way to handle things.  It is not even the default. (Right?).

Say:

  X.fossil 

exists, then someone creates 

  X/Y.fossil

and tries to access the URL

   .../X/Y/home

He/She will only see the message:

  Not Found
  Page not found: Y

(As there is no page 'Y' in X.fossil.)

This is very difficult to solve, as the normal access to a single repo does not involve a scan of all others...

How to do?  Check for directory 'X/' every time?

Maybe include a scan like this in the 'back office'.  Once per day, check if there is (suddenly) directory 'X'.  Then send off an email to admin...

Or best maybe: 'the page not found code' should check for 'X/' and emit an error message...

Sorry for thinking while typing...

The next problem is, if this page actually exists in 'X'.  Say, someone creates 

  X/home.fossil

and then access to URL

   .../X/home

-> back office ?  ...

(6) By Stephan Beal (stephan) on 2022-06-22 12:33:19 in reply to 5 [link] [source]

... exists, then someone creates

That depends how the repos are set up. Most(???) people use one CGI script per repository, and this problem doesn't come up in that case. It only comes up when directory mode/repolist is used.

Maybe include a scan like this in the 'back office'. Once per day, check if there is (suddenly) directory 'X'. Then send off an email to admin...

Backoffice wouldn't work: backoffice is associated with a specific repository, but repolist does not have an associated repository.

How to do? Check for directory 'X/' every time?

That's exactly what i wanted to avoid, as that could impact every request to every repo and would likey affect several places in fossil's code. The current solution only applies when creating the repolist view and is confined to a single "if" block. Since creating that view already requires opening every single database found in the directory (recursively), adding one more check for existence of a file isn't going to impact the performance measurably. This check only applies when a repository in a subdirectory is visited, so it doesn't affect users who put all of their .fossil files in a single top-level directory (like most(?) people who use repolist probably(?) do?). When we iterate over X/*.fossil, we simply check if X.fossil is also in the repo list. If it is, that's an "error" because X.fossil will shadow any repositories under directory X.

With the exception of repolist mode, fossil will never generate links to "nested" repositories, so this check only has to be performed for that one case. If someone constructs their own invalid links, that's their own bug. It would be relatively expensive (in terms of code) for fossil to try to recognize such cases and provide an explanation to the user.

Sorry for thinking while typing...

No worries - i tend to do the same.

Say, someone creates X/home.fossil and then access to URL .../X/home

Such a link would only work when running in repolist mode with X, or one of its parents, as the repolist directory. In all other contexts it wouldn't work because fossil cannot server a repository file directly: it requires either a CGI wrapper script, an HTTP server process (the "server" or "ui" command), or repolist to first intercept the request and then determine that home.fossil is the repository it needs to open. The "server" command and CGI mode are normally limited to a single repository which is provided when fossil starts. The repolist mode is the only exception to that rule (unless i'm sorely mistaken, which is possible).

(7) By Martin Gagnon (mgagnon) on 2022-06-22 14:11:05 in reply to 6 [link] [source]

That depends how the repos are set up. Most(???) people use one CGI script per repository

I think that even of it's not "Most" people that use it per repo, I think that logically, it make sense that most specific match should have priority. (in this case, the repo file is more specific than a directory).

May be it would be different in a case where you have some "rules" define in a certain order in a file, in this case a latter rule could override an earlier rule.

But here, there's no "ordering" of rules, you just have different matching cgi script on the filesystem. So I think it make sense to prioritize the most specific one over the general ones.

(8) By Heiko (scheit) on 2022-06-23 08:39:21 in reply to 6 [link] [source]

Here is the documentation:

   https://www.fossil-scm.org/home/doc/trunk/www/aboutcgi.wiki

Then scroll down to 

  Serving Multiple Fossil Repositories From One CGI Script

See point 3) under "Here is what happens"

I think the only solution is:

After finding a repository (in any subdir) fossil must check if there is a directory present (in the same subdir) with the same name.

If there is one: emit an error message.

HTML should be clear.  Not sure how to handle this, if a fossil client connects to do a sync...

(9) By Heiko (scheit) on 2022-06-23 08:41:25 in reply to 7 [link] [source]

If there is any ambiguity there should be an error message.

(10) By Stephan Beal (stephan) on 2022-06-23 09:34:42 in reply to 9 [link] [source]

If there is any ambiguity there should be an error message.

The problem with that is that adding extra checks to spot such an ambiguity is extremely invasive, affecting several places in the code and being checked for every single request, just to check for a low-impact condition which has come up exactly one time in all of fossil's 15 years. Such checks would be extreme overkill.

In any case, as of yesterday there is no more ambiguity: the display of the repolist no longer creates ambiguous links, and that was the only place fossil would generate such links. A note about this limitation will be added to the docs later on.

As to the question of syncing: since it's impossible to clone (via repolist mode) X/Y.fossil when X.fossil exists, synching is not an issue. In any mode other than repolist, this cannot come up and is a non-issue.