Fossil Forum

Forcing PATH changes in environment causes unexpected SSH failures
Login

Forcing PATH changes in environment causes unexpected SSH failures

Forcing PATH changes in environment causes unexpected SSH failures

(1) By Andy Bradford (andybradford) on 2024-02-05 03:14:19 [link] [source]

Hello,

I would  like to discuss the  possibility of improving (or  backing out)
the change introduced in:

[805c931479569576] https://www.fossil-scm.org/home/info/805c931479569576

As it stands, this interferes  with SSH servers that impose restrictions
on the client command that is sent.  For example, I have a Fossil server
that does something similar to:

------------------------------------------------------------------------
#!/bin/sh
REPO="${1:-}"
set -- $SSH_ORIGINAL_COMMAND
while [ $# -gt 1 ]; do shift; done
if [ x"$REPO" != "x" ]
then
  WHICH="$REPO"
else
  WHICH="$1"
fi
case $SSH_ORIGINAL_COMMAND in
  fossil*)
    exec /path/to/fossil http "$WHICH"
  ;;
esac
sleep 10
exec /usr/bin/false
------------------------------------------------------------------------


When I  try to  pull or  clone, the operation  fails because  the server
doesn't respond (failing to recognize the command):

$ fossil clone ssh://anonfsl@fossil.bradfords.org//fossil fossil.fossil
server did not reply
Clone done, wire bytes sent: 293  received: 0  remote: fossil.bradfords.org
server returned an error - clone aborted

This  is  because  the  case  statement no  longer  matches  on  "fossil
test-http ..."  as it used  to because the  Fossil client now  sends the
equivalent of:

PATH=$HOME/bin:$PATH fossil test-http ...

While  I appreciate  the  problem that  this is  trying  to solve,  e.g.
the  PATH  for  non-interactive  shells   is  not  set  correctly,  this
seems  more  like  a  local  configuration issue  than  a  problem  with
Fossil  proper. Either  the  server administrator  should configure  the
environment correctly, or the user  should be using Fossil's support for
?fossil=bin/fossil in  the query  parameter of the  clone URL  which has
been in Fossil since the dawn of time.

Furthermore, it  just seems  odd for  Fossil to be  trying to  alter the
remote environment  to accomplish  this---I certainly don't  want Fossil
trying to alter the remote environment  of my SSH shells. Using ?fossil=
does not alter  the environment as it only controls  what remote command
the client sends, which is really what the problem is here.

Having  Fossil send  the PATH  now  means that  everyone inspecting  the
SSH_ORIGINAL_COMMAND will  have to account  for that. Arguably I  may be
the only one doing this, but it  seems less clean to be sending the PATH
in the remote command than the alternatives.

There  are  many  ways  to  go  about  addressing  this  with  SSH.  SSH
has   the   AcceptEnv[1]   and  SetEnv[3][5]/SendEnv[4]   options,   the
PermitUserEnvironment[2] option,  and of  course Fossil has  support for
?fossil= as I already suggested.

The documentation already suggests using it here:

https://www.fossil-scm.org/home/help?cmd=clone

SSH protocol:

    ssh://[userid@]host[:port]/path/to/repo.fossil[?fossil=path/fossil.exe]


Any chance we can reconsider  this new PATH implementation? Perhaps make
it a  configurable thing that  the user is  able to enable  when desired
rather than sending it always? What if the user doesn't have a $HOME/bin
but has $HOME/altbin, or some other name?

Thoughts?

Andy

[1] http://man.openbsd.org/sshd_config#AcceptEnv
[2] http://man.openbsd.org/sshd_config#PermitUserEnvironment
[3] http://man.openbsd.org/sshd_config#SetEnv
[4] http://man.openbsd.org/ssh_config#SendEnv
[5] http://man.openbsd.org/ssh_config#SetEnv

(2) By Warren Young (wyoung) on 2024-02-05 03:22:07 in reply to 1 [link] [source]

+1. This change is a sketchy solution to a real problem.

My solution was to install fossil where sshd expects to find it.

(3) By Andy Bradford (andybradford) on 2024-02-05 05:04:43 in reply to 2 [link] [source]

Installing fossil where expected is certainly ideal, but if the user has
no control over that, then I think  giving the user some kind of knob is
alright. I  just don't  think a statically  configured PATH  override is
suitable.

That's what  the ?fossil=  query parameter  provides, however,  that may
"feel" somewhat clunky in some cases  so it may be preferable to provide
some kind  of "server  configuration" option that  either allows  one to
specify the  absolute path (e.g.  ssh-remote-command in the DB  could be
any of /home/user/bin/fossil, bin/fossil, /opt/fossil/bin/fossil, or any
other path that is appropriate). It may  not be obvious to some that one
can change  the execution path  on the  remote host using  ?fossil= even
though it's been this way for a long time.

This  line stuffs  in the  remote fossil  command, the  default is  just
"fossil" but it can be overridden by ?fossil= in the query parameters:

https://www.fossil-scm.org/home/file?udc=1&ln=141&ci=trunk&name=src%2Fhttp_transport.c

It could just as easily, and alternatively, be derived from a DB setting.

Andy

(4.1) By Richard Hipp (drh) on 2024-02-05 10:16:35 edited from 4.0 in reply to 3 [link] [source]

None of y'all use the ssh: remote method to Mac's, I'll bet....

The purpose of the PATH= prefix on the ssh fossil command is so that the "ssh:" method "just works" on Macs. Without it, you always have to add the "?fossil=bin/fossil" argument, which is insanely annoying. And adding a requirement that you must adjust your configuration in order to reasonably "ssh:" to a Mac seems even more annoying, and contrary to the "it just works" principle.

The default PATH on Macs for SSH is: /usr/bin:/bin:/usr/sbin:/sbin. All of those directories are locked down, so you cannot install "fossil" there.

(5.2) By Warren Young (wyoung) on 2024-02-05 10:21:36 edited from 5.1 in reply to 4.0 [link] [source]

Yes, I do use ssh: URLs as remotes for a macOS machine, frequently.

The question then is, why does the output of this command:

  $ ssh macos 'echo $PATH'

…include /usr/local/bin here, where Fossil's make install puts it by default, when it evidently does not on your Mac?

I used to ./configure --prefix=$HOME with Fossil, but because of this very issue, I began dropping the prefix override.

I don't remember what I did to get my macOS server's sshd to behave, but this method should work. The only adjustment I would suggest is putting your local edits into /etc/ssh/sshd_config.d/001-local.conf instead of the main sshd_config file since every OS upgrade overwrites that one.

Another good thing to put in your 001-local.conf file is:

  PasswordAuthentication no
  ChallengeResponseAuthentication no
  Ciphers -chacha20-poly1305@openssh.com

The first two lines force use of public key authentication, and the last one is a workaround for the Terrapin Attack, needed while we wait for Apple to get around to shipping OpenSSH 9.6 or later.

(6) By Richard Hipp (drh) on 2024-02-05 10:21:25 in reply to 5.0 [link] [source]

I'm on travel. I'll try to work on this when I get back to the office.

(7) By Warren Young (wyoung) on 2024-02-05 10:24:19 in reply to 6 [link] [source]

Okay. :)

Check your other servers for Terrapin, too. It's not a super-high risk, but several Linuxes remain vulnerable, too. RHEL and all its clones for one, for "stability".

(10) By Andy Bradford (andybradford) on 2024-02-05 19:25:09 in reply to 6 [link] [source]

> I'm on travel.

If you're on vacation hopefully you  ignore all this until you get back.
:-) By  no means is  this a fire  that needs to  be put out  urgently if
you're on vacation!

I just  wanted to start  a discussion since  I don't think  putting PATH
into every single invocation of Fossil is the best approach.

Thanks,

Andy

(16) By Richard Hipp (drh) on 2024-02-05 21:35:11 in reply to 10 [link] [source]

I was at a family event out-of-state. I'm back in the office now.

(12.1) By Andy Bradford (andybradford) on 2024-02-05 19:50:44 edited from 12.0 in reply to 5.2 [link] [source]

> include /usr/local/bin  here, where Fossil's  make install puts  it by
> default, when it evidently does not on your Mac?

Does your PATH have /usr/local/bin when you run:

ssh machost env

If so, I'm curious how you made that work.

[edit] I  see you already answered  the question... there's a  link that
you included which describes how.

Andy

(14) By Martin Gagnon (mgagnon) on 2024-02-05 20:05:29 in reply to 12.1 [link] [source]

For what it worth..

I use a Mac with a version prior to 805c931479569576 and it works with ssh:// protocol.

I added this a while ago on top of my ~/.bashrc (for another reason)

# set PATH so it includes /usr/local/bin for 3rd party software installed manually.
if [ -d "/usr/local/bin" ] ; then
        PATH="/usr/local/bin:$PATH"
fi

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
    PATH="$HOME/bin:$PATH"
fi

I also have this on my ~/.bash_profile.
(I'm not sure if I added it myself or if it was there by default)

[[ -r $HOME/.bashrc ]] && . $HOME/.bashrc

I never had issue with fossil, which always been installed in /usr/local/bin.

(13) By Andy Bradford (andybradford) on 2024-02-05 19:57:41 in reply to 5.2 [link] [source]

>  but this method[1] should work. 

Is   this  the   method  that   *you*  used?   It  recommends   altering
PermitUserEnviromnent (which is one of  the options I actually suggested
in my first post on this thread).

Andy

[1] https://dev.to/spaceghost/fixing-macos-ssh-path-missing-usr-local-bin-ap

(19) By Warren Young (wyoung) on 2024-02-05 22:29:41 in reply to 13 [link] [source]

Is this the method that you used?

No, which is why I used the word “should,” not “will,” and also why I said I didn’t remember how I actually did it. My results are much like Preben’s, below.

The question then becomes, why doesn’t drh's Mac do that?

(23) By Andy Bradford (andybradford) on 2024-02-06 05:08:22 in reply to 19 [link] [source]

> The question then becomes, why doesn’t drh's Mac do that?

I suspect it has something to do with the difference in shells?

Some  here have  reported using  bash  or zsh.  Perhaps they  do read  a
profile for  non-interactive shells whereas  the shell he is  using does
not?

I know that on OpenBSD, the  default shell ksh(1) does not read .profile
for  non-interactive shells.  I'm  able  to make  it  work by  adjusting
login.conf(5) though.

Andy

(9) By Andy Bradford (andybradford) on 2024-02-05 19:18:52 in reply to 4.1 [link] [source]

> None of y'all use the ssh: remote method to Mac's, I'll bet....

You're absolutely  right that I don't  use Mac's very often,  however, I
completely understand the challenge and  problem. I've actually run into
it  myself and  in these  cases,  I usually  just fall  back onto  using
?fossil=, as an example, here's one of my projects that uses a different
bin directory to serve up fossil. One one of my remote hosts, I have the
Fossil binary  in a directory called  "altbin", so I end  up with things
like:

ssh://remotehost/fossils/nmh.fossil?fossil=altbin/fossil

I  also recognized  that  the  ?fossil query  paramter  may be  somewhat
clunky, which is why I suggested perhaps Fossil has some other mechanism
that  allows the  user to  override the  Fossil defaults.  One mechanism
could just be a setting stored in the DB that is the value of the remote
PATH.  Another mechanism,  which  doesn't actually  do anything  special
would be to just store what the  remote path to the Fossil binary should
be.  Maybe these  are all  just  as clunky  as using  the ?fossil  query
parameter?

Is it not possible to modify the  SSH configuration on Mac OS to provide
a sensible PATH to SSH?

Personally, I'm  fine with  the ?fossil query  parameter, it  has served
it's purpose,  but that doesn't mean  that there doesn't exist  a better
mechanism.

As for  the "it just works"  principle... well, in this  case it doesn't
always "just work". It works if you  happen to have the Fossil binary in
$HOME/bin, but that's  not where I have my Fossil  binaries. But it also
interferes  with other  uses of  the  SSH protocol,  thus limiting,  not
enhancing the experience,  so it "works" at the expense  of other things
that used to "just work".

Thanks,

Andy

(15) By Richard Hipp (drh) on 2024-02-05 20:49:16 in reply to 9 [link] [source]

My current approach is to omitted the PATH= argument initially. But if that fails, it will retry using:

PATH=bin:/usr/local/bin:$PATH

With that change, you can drop a fossil executable into ~/bin or /usr/local/bin of a Mac and it just works, without the need to adjust the Mac's configuration or add a ?fossil=... query parameter. (Should I add /opt/homebrew/bin to the PATH= list?) The PATH= is only added if there is no ?fossil=... query parameter.

I want to also add a cache so that on future SSH sync attempts with the same user and hostname, it adds PATH= on the initial attempt if that worked the previous time then falls back to removing PATH= if the first attempt fails.

(17.1) By Preben Guldberg (preben) on 2024-02-05 22:00:38 edited from 17.0 in reply to 4.1 [link] [source]

None of y'all use the ssh: remote method to Mac's, I'll bet....

FWIW, I do, and PATH has not been an issue for me for a number of years (using zsh).

As a test, I added two users to my system and gave them PATH="$HOME/bin:/usr/local/bin:$PATH".

  1. zshuser, using the default zsh shell. I then amended PATH in $HOME/.zshenv.
  2. bashuser, using a bash shell, where I amended PATH in $HOME/.bashrc

This gave me:

% for u in zshuser bashuser; do ssh $u@localhost 'echo "$USER: $PATH"'; done
zshuser: /Users/zshuser/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
bashuser: /Users/bashuser/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

This is tested on macOS Ventura 13.6.3.

[FWIW, OpenBSD7.4 does not include $HOME/bin by default, nor does my Debian 12.2 test VM. My Fedora 39 does, though.]

(25) By Richard Hipp (drh) on 2024-02-06 11:38:49 in reply to 2 [link] [source]

Another place where this problem used to come up for me is attempting to ssh-sync to a machine where I did not have admin access and could not install Fossil in a system directory. I used to have that problem on a shared-hosting account I had with Hurricane Electric. I could log in as an unprivileged user and get read/write access to my home directory, and read access to a few system directories, but could not see anything else. Binaries had to go in my home directory (for which I created ~/bin). Fossil was not installed in any system directory. I could not mess with the sshd configuration and the ssh PATH did not include any directories in which I had write access. Passing in something like:

PATH=bin:$PATH

was the only way to avoid the "?fossil=bin/fossil" query parameter.

(8) By Richard Hipp (drh) on 2024-02-05 18:23:11 in reply to 1 [link] [source]

What is the error message that your system gives when you try to "fossil sync" using the ssh: method? I'm looking for the output you get when you include the options: "-v -v -sshtrace". Yes, the -v option is repeated. For example, the output I need to see might be similar to this (but there will certainly be some differences):

RUN ssh -e none -T -- imac 'PATH=$HOME/bin:$PATH' nobin/fossil test-http Fossils/fossil.fossil
URL: ssh://imac/Fossils/fossil.fossil?fossil=nobin/fossil
Sending 193 byte header and 266 byte payload
bash: nobin/fossil: No such file or directory
Got line: []
server did not reply
Sync done, wire bytes sent: 459  received: 0  remote: imac
Uncompressed payload sent: 0  received: 0

(11) By Andy Bradford (andybradford) on 2024-02-05 19:35:31 in reply to 8 [link] [source]

> What  is the  error message  that your  system gives  when you  try to
> "fossil sync" using the ssh: method?

$ fossil sync -v -v --sshtrace
Sync with ssh://anonfsl@fossil//fossil
                Bytes      Cards  Artifacts     Deltas
RUN ssh -e none -T -- 'anonfsl@fossil' 'PATH=$HOME/bin:$PATH' fossil test-http /fossil
URL: ssh://anonfsl@fossil//fossil
Sending 182 byte header and 6698 byte payload
Got line: []
server did not reply
Sync done, wire bytes sent: 6880  received: 0  remote: fossil
Uncompressed payload sent: 0  received: 0
/***** Subprocess 16595 exit(0) *****/


When I use Fossil version b86d4da5a2 I get:

$ build/fossil-2.24 sync -v -v --sshtrace
Sync with ssh://anonfsl@fossil//fossil
                Bytes      Cards  Artifacts     Deltas
RUN ssh -e none -T -- 'anonfsl@fossil' fossil test-http /fossil
URL: ssh://anonfsl@fossil//fossil
Sending 182 byte header and 6695 byte payload
Got line: [Status: 200 OK]
Read: [Status: 200 OK]
Got line: [Cache-control: no-cache]
Read: [Cache-control: no-cache]
Got line: [X-Frame-Options: SAMEORIGIN]
Read: [X-Frame-Options: SAMEORIGIN]
Got line: [Content-Type: application/x-fossil]
Read: [Content-Type: application/x-fossil]
Got line: [Content-Length: 6686]
Read: [Content-Length: 6686]
Got line: []
Reading 6686 bytes with 0 on hand...  Got 6686 bytes
Sent:           12072        171          0          0
Received:       12024        172          0          0
Sync done, wire bytes sent: 6877  received: 6816  remote: fossil
Uncompressed payload sent: 12072  received: 12024
/***** Subprocess 34555 exit(0) *****/

Andy

(18) By Richard Hipp (drh) on 2024-02-05 22:09:04 in reply to 1 [link] [source]

Change on a branch seem to work (mostly - after checking in there are issues with "fossil sync -all").

Algorithm summary:

  1. If HOSTNAME is the hostname component of an ssh: URL and if the setting named "use-path-for-ssh:HOSTNAME" is false or does not exist, then the initial sync attempt does not use the PATH= prefix. If that initial attempt fails in a way that suggests that the fossil executable could not be found, then the sync attempt is retried after adding PATH=.

  2. If a setting named "use-path-for-ssh:HOSTNAME" exists and has a true value, however, then step one is inverted. PATH= is added on the first attempt and removed for the retry if the first attempt failed.

  3. The "use-path-for-ssh:HOSTNAME" is adjusted if a retry was necessary and the retry actually worked.

I think this new approach will work well for everyone. Please try it out and report problems.

Note again: I noticed a problem with "fossil sync -all" that has some ssh: and some https: remotes after I committed the initial change. I'll fix that later.

(20) By Richard Hipp (drh) on 2024-02-05 23:30:01 in reply to 18 [link] [source]

The "fossil sync -all" bug is preexisting. Turns out that if you define multiple remotes using "fossil remote add ...." and some of them are HTTPS and others are SSH, the HTTPS entries that come after the first SSH entry fail to work. I don't know why yet.

So the branch is good to experiment with.

(21) By Andy Bradford (andybradford) on 2024-02-06 03:27:27 in reply to 20 [link] [source]

> The "fossil sync -all" bug is preexisting.

It looks like this  bug was a latent bug that only  got exposed when the
"fossil sync  -all" functionality was  introduced. I've fixed it  on the
branch, though I suppose I should have  placed it on trunk since the bug
isn't really introduced in this branch:

https://www.fossil-scm.org/home/info/bcf6abe5b6e0cc7b

Andy

(26) By Richard Hipp (drh) on 2024-02-06 11:39:35 in reply to 21 [link] [source]

Thanks for the fix. All is now merged to trunk.

(27) By Richard Hipp (drh) on 2024-02-06 14:11:58 in reply to 26 [link] [source]

Except: The fossil patch command (the "push" and "pull" sub-commands) also invokes ssh to connect to fossil executables on remote systems. It also adds the PATH= argument. I suppose I need to go in and fix that as well...

(33) By Richard Hipp (drh) on 2024-02-06 23:52:10 in reply to 27 [link] [source]

The "fossil ui HOSTNAME:directory/of/checkout" command also uses ssh and also injects a PATH= argument. It too needs to be fixed.

(34) By Andy Bradford (andybradford) on 2024-02-07 02:36:29 in reply to 33 [link] [source]

> The "fossil ui HOSTNAME:directory/of/checkout" command also uses ssh

Thanks, I wasn't even aware of this functionality.

(35.1) By Richard Hipp (drh) on 2024-02-07 15:12:59 edited from 35.0 in reply to 33 [link] [source]

Issue Resolved?

All instances of the use of PATH= have now been converted so that the PATH= argument is initially omitted and is only added if the initial attempt fails to find a fossil executable. The following commands have been updated:

  • fossil sync
  • fossil push
  • fossil pull
  • fossil clone
  • Autosync
  • fossil patch push REMOTE:...
  • fossil patch pull REMOTE:...
  • fossil ui REMOTE:...

Fossil remembers if the PATH= argument is required for each hostname that it uses. This is accomplished using the "use-path-for-ssh:REMOTE" setting in the "global_config" table. The new "fossil test-ssh-needs-path" command can be used to view or modify this cache.

As of check-in ed6495baa68341d2 (2024-02-07) I think the issue described by this thread has now been resolved. Post follow-ups if you discover otherwise.

(36) By Andy Bradford (andybradford) on 2024-02-07 14:15:39 in reply to 35.0 [link] [source]

> The following commands have been updated:

I believe  that this  list also  includes "fossil  clone" since  that is
the  most likely  place  that  people will  encounter  this.  I find  it
highly unlikely  that once someone  has successfully cloned from  an SSH
repository that the path on the remote host will change.

Thanks,

Andy

(37) By Richard Hipp (drh) on 2024-02-07 15:14:35 in reply to 36 [link] [source]

I confirm that "fossil clone" also works. Forum post 9f21572a809b8cce has been updated to include "fossil clone" in the list.

(38) By Andy Bradford (andybradford) on 2024-02-07 15:26:41 in reply to 37 [link] [source]

Thanks. I also  realized that while clone may be  the first interaction,
it's also  very possible  that after one  has successfully  cloned, that
this could be encountered after  adding, removing, or changing a "fossil
remote" (all intentional changes though).

Andy

(39) By Andy Bradford (andybradford) on 2024-02-07 15:36:55 in reply to 35.1 [link] [source]

> Post follow-ups if you discover otherwise.

Should Fossil not return the original error that the SSH server returns?

For example, when I login and mistype my password, I see:

$ ssh localhost
amb@localhost's password: 
Permission denied, please try again.
amb@localhost's password: 

But when I do the same with Fossil, I only see:

$ fossil clone ssh://localhost//tmp/new.fossil clone.fossil           
amb@localhost's password: 
amb@localhost's password: 

Only after I enable --sshtrace do I get to see the reason why it failed:

$ fossil clone --sshtrace ssh://localhost//tmp/new.fossil clone.fossil
RUN ssh -e none -T -- localhost fossil test-http /tmp/new.fossil
amb@localhost's password: 
Got line: [Permission denied, please try again.]
amb@localhost's password: 

Andy

(40) By Richard Hipp (drh) on 2024-02-07 15:51:26 in reply to 39 [link] [source]

I changed that back. I don't know what this will break. Please test.

(41) By Andy Bradford (andybradford) on 2024-02-08 06:00:39 in reply to 40 [link] [source]

> I changed that back. I don't know what this will break. Please test.

I tested a few scenarios (e.g. clone, sync, pull), including the new [to
me] "fossil ui HOSTNAME:" functionality and it seemed to work fine.

Andy

(42.1) By Warren Young (wyoung) on 2024-04-11 02:29:32 edited from 42.0 in reply to 27 [link] [source]

EDIT: After posting the below, I stopped being able to SSH in at all, even after closing everything else down, although it had been working all day, and other network connections continue working. If this report doesn't reproduce anywhere, please consider it bogus. The only reason I thought it might be relevant here is that I began seeing Fossil try opening the SSH tunnel twice, once without the path change, once with, even though I've had it installed to /usr/local/bin/fossil, avoiding the need for the PATH adjustment. Rebooting the client fixed it all; let's see how long it stays fixed.


Either that isn't done yet, or there's an odd interaction under this new scheme.

Simple reproduction: with the latest version, open two Terminal windows, one ssh'd into a remote macOS machine, then from the other tab running a shell on the local machine, say:

$ ssh -e none -T hostname 'fossil ver'
kex_exchange_identification: read: Connection reset by peer
Connection reset by 192.168.1.2 port 22

As soon as I drop the interactive SSH connection from the second Terminal window, the command succeeds.

I'm replying here because I discovered in the context of trying to "fossil patch push" a change to that remote machine, with the intent of running "fossil ci" there through the second SSH connection after the changes had been processed on the remote successfully. This failure causes Fossil's new PATH fiddling to occur, trying twice before failing. The above repro case is simpler and sufficient, however.

This only began happening today, but I don't do this patch-push thing all that often, so there's no saying where it crept in short of doing a bisect. Do you need me to do that?

(43) By Andy Bradford (andybradford) on 2024-04-11 04:47:34 in reply to 42.1 [link] [source]

> As  soon as  I drop  the interactive  SSH connection  from the  second
> Terminal window, the command succeeds.

Are you  perhaps doing  anything with  ControlMaster and  ControlPath in
your ssh_config?  Maybe something was  misbehaving with SSH.  Given your
simple reproduction steps, I'm inclined to believe that it wasn't really
Fossil related at all. What happened when you ran something like:

ssh -e none -T hostname 'hostname'

Or any other commands while you had the first SSH into the remote?

I tried reproducing here and all I see is:

$ fossil ver
This is fossil version 2.24 [2304041e42] 2024-03-23 05:54:48 UTC

And now a remote 'fossil ver' while SSH is open in another terminal:

$ ssh -e none -T remote 'fossil ver'
This is fossil version 2.24 [d27cb05f6b] 2024-02-07 14:25:38 UTC

Seems like  it's working  fine at  least with  these versions  which are
perhaps slightly older than you have since I haven't had as much time to
work on Fossil lately.

(44) By Warren Young (wyoung) on 2024-04-11 06:46:22 in reply to 43 [link] [source]

It was really weird. The symptom misled me here, and it didn't help that I'd upgraded Fossil on both ends from mid-March to current earlier in the day, but now that the symptom is gone after the reboot, I don't know what it was I saw.

If it recurs, I'll try to chase it deeper before tossing all the relevant state like that.

I suspect you're right, though, and the Fossil aspect of the symptom was pure coincidence.

(45) By Warren Young (wyoung) on 2024-04-18 17:00:54 in reply to 44 [link] [source]

It just happened again, but pstree informed me of the surprising truth: a background process unrelated to Fossil had been slowly chewing up SSH handles between these two systems. At the time of failure, there were about forty active sessions, which isn't a huge amount, but it appears to be enough to prevent the remote system from accepting more under whatever restrictive sanity limit Apple shipped in their their default sshd configuration.

Stopping the other process cleared it up and allowed Fossil to continue operating normally, no reboots required.

If this was a blocker to the coming release, I hereby release it. :)

(22) By Andy Bradford (andybradford) on 2024-02-06 03:34:11 in reply to 18 [link] [source]

> Please try it out and report problems.

Not necessarily a problem, but I  wonder why is it necessary to close(2)
here and redirect stderr to stdout?

https://www.fossil-scm.org/home/artifact?udc=1&ln=199&name=e6675fd982e390d4

Thanks,

Andy

(24) By Richard Hipp (drh) on 2024-02-06 11:32:06 in reply to 22 [link] [source]

Otherwise, an error about "unknown command: fossil" appears on the output of the "fossil sync" command when attempting a PATH-less ssh to a system that requires PATH=. That seems like it might be confusing to users.

ToDo: I need to make a corresponding change on the Windows side.

(28.1) By Andy Bradford (andybradford) on 2024-02-06 15:00:33 edited from 28.0 in reply to 24 [source]

> Otherwise, an  error about  "unknown command:  fossil" appears  on the
> output of the "fossil sync" command

I  don't  think  that's  necessarily  a bad  thing  as  Unix  users  are
accustomed to having  errors reported on stderr. Consider  that now they
will just get  prompted twice for the password  without realizing what's
going on. Previously they would get  an error, realize that fossil isn't
in an  appropriate PATH, fix it  by putting fossil where  it belongs [or
use ?fossil=] and then move on their merry way.

$ fossil clone ssh://localhost//tmp/fossil.fossil clone.fossil
amb@localhost's password: 
ksh: fossil: not found
server did not reply
Clone done, wire bytes sent: 311  received: 0  remote: localhost
server returned an error - clone aborted


Now, they get prompted twice for a password and nothing explains to them
why:

$ fossil clone ssh://localhost//tmp/fossil.fossil clone.fossil           
amb@localhost's password: 
amb@localhost's password: 
Round-trips: 10   Artifacts sent: 0  received: 60191
Clone done, wire bytes sent: 3308  received: 43372599  remote: localhost
Rebuilding repository meta-data...
  100.1% complete...
Extra delta compression... 23 deltas save 107,146 bytes
Vacuuming the database... 
project-id: CE59BB9F186226D80E49D1FA2DB29F935CCA0333
server-id:  7698daba523927c31017a456e9421126199b5c49
admin-user: amb (password is "Rtnish7QzM")


Would it not be better to let  them know what went wrong so they realize
that Fossil is performing some additional work on their behalf?

Here's what it looks like with popen left the way it is:

$ fossil clone ssh://localhost//tmp/fossil.fossil clone.fossil
amb@localhost's password: 
ksh: fossil: not found
amb@localhost's password: 

That  doesn't look  too  scary  to me,  though  arguably  this could  be
improved to say, "fossil not found in PATH, retrying with a new PATH".

Thanks,

Andy

(29) By Richard Hipp (drh) on 2024-02-06 15:05:21 in reply to 28.1 [link] [source]

A diagnostic message (less cryptic than "ksh: fossil: not found") is now printed if the ssh command is retried. It never occurred to me that anyone would still be using password authentication for SSH...

(30) By Andy Bradford (andybradford) on 2024-02-06 19:37:12 in reply to 29 [link] [source]

>  It never  occurred to me  that anyone  would still be  using password
> authentication for SSH

Hard to believe, I  know, but I know many people  who could never really
be bothered  to use SSH keys,  despite how convenient and  flexible they
are [maybe  this is  because of  the "just  works" principle  applied to
passwords].  Personally, I  prefer to  use SSH  keys where  possible and
especially with Fossil.

Andy

(31) By Andy Bradford (andybradford) on 2024-02-06 20:22:53 in reply to 29 [link] [source]

> A diagnostic message  (less cryptic than "ksh: fossil:  not found") is
> now printed if the ssh command is retried.

Yep, looks great, thanks:

$ fossil clone ssh://localhost//tmp/fossil.fossil clone.fossil
amb@localhost's password: 
First attempt to run fossil on localhost using SSH failed.
Retrying with the PATH= argument.
amb@localhost's password: 
Round-trips: 10   Artifacts sent: 0  received: 60191
Clone done, wire bytes sent: 3277  received: 43372599  remote: localhost

This way when they do run into this scenario, they will not be surprised
and  think that  they entered  their password  incorrectly or  something
else.


After successfully cloning I now get other errors:

$ fossil sync
Sync with ssh://localhost//tmp/fossil.fossil
amb@localhost's password: 
Round-trips: 1   Artifacts sent: 0  received: 0
Error: bad command: igot fb156a12bfa2aefbca413f9b7affa2b299d5eefb28ab0d16e92e9c97e3e8a00a

Round-trips: 1   Artifacts sent: 0  received: 0
Sync done, wire bytes sent: 4330  received: 4195  remote: localhost


But that's because Fossil is now finding my $HOME/bin which has an older
version of Fossil  in it (instead of my $HOME/altbin)  and so it chooses
the PATH=  option. To fix  it I  have to update  my URL to  use ?fossil=
after which it  works (though without the new notice  about using PATH I
see):


$ fossil sync                                                            
Sync with ssh://localhost//tmp/fossil.fossil?fossil=altbin/fossil
amb@localhost's password: 
amb@localhost's password: 
Round-trips: 1   Artifacts sent: 0  received: 0
Sync done, wire bytes sent: 4335  received: 256  remote: localhost

Should "fossil  sync" also issue  the new  message, or is  it sufficient
that cloning does it?

Thanks,

Andy

(32) By Andy Bradford (andybradford) on 2024-02-06 20:39:51 in reply to 31 [link] [source]

> $ fossil sync                                                            
> Sync with ssh://localhost//tmp/fossil.fossil?fossil=altbin/fossil
> amb@localhost's password: 
> amb@localhost's password: 

Oddly enough I don't seem to be able to make this happen again. Not sure
why it  happened the first  time. It now just  seems to always  send the
PATH,  so  I  see  this  after  cloning,  opening,  updating  the  query
parameter, then syncing:

$ fossil sync                                                            
Sync with ssh://localhost//tmp/fossil.fossil?fossil=altbin/fossil
amb@localhost's password: 
Round-trips: 1   Artifacts sent: 0  received: 0
Sync done, wire bytes sent: 4839  received: 3320  remote: localhost

If I see it again, I'll give better steps.

Thanks,

Andy