ext CGI feature segfaults
(1) By anonymous on 2020-03-18 18:42:12 [link] [source]
> fossil ui --extroot `pwd`/cgi --nojail
where extroot
contens is:
~/Code/fossil-scm/cgi:
drwxr-xr-x 4 russki staff 128B Mar 18 18:18 ./
drwxr-xr-x 35 russki staff 1.1K Mar 18 18:19 ../
-rwxr-xr-x 1 russki staff 125B Mar 18 18:18 test*
-rw-r--r-- 1 russki staff 434B Mar 11 11:40 index.html
localhost:8080/ext/index.html
renders static content fine
localhost:8080/ext/test
segfaults
panic: Segfault
(0) 0 fossil 0x0000000102986d85 sigsegv_handler + 40
(1) 1 libsystem_platform.dylib 0x00007fff6b091f5a _sigtramp + 26
(2) 2 libdyld.dylib 0x00007fff6ad83149 _Z21dyldGlobalLockReleasev + 0
(3) 3 libsystem_c.dylib 0x00007fff6ae320f7 putenv + 121
(4) 4 fossil 0x000000010295c102 ext_page + 802
(5) 5 fossil 0x00000001029882dd process_one_web_page + 2636
(6) 6 fossil 0x0000000102988f47 cmd_webserver + 1416
(7) 7 fossil 0x0000000102985bb0 fossil_main + 1869
(8) 8 fossil 0x0000000102985463 fossil_main + 0
(9) 9 libdyld.dylib 0x00007fff6ad83015 start + 1
(10) 10 ??? 0x0000000000000005 0x0 + 5
Here's what's in the test
script:
#!/usr/local/opt/tcl-tk/bin/tclsh
puts {Status: 200 Ok}
puts {Content-Type: text/html}
puts ""
puts {<span>Hey there</span>}
> fossil version
This is fossil version 2.10 [9d9ef82234] 2019-10-04 21:41:13 UTC
> sw_vers
ProductName: Mac OS X
ProductVersion: 10.13.6
(2) By anonymous on 2020-03-18 20:43:37 in reply to 1 [link] [source]
#!/usr/local/opt/tcl-tk/bin/tclsh
Does your test
script run properly by itself on command-line?
I just tested this case on Ubuntu (from an new repo) and fossil (2.10) properly renders ext/test
result.
Here's a test script (./test-ext.sh
):
#!/bin/bash
trap on_ctrl_c INT
on_ctrl_c() {
cleanup
}
setup() {
mkdir test-ext && cd test-ext
fossil init ../test-ext.fossil
fossil open ../test-ext.fossil
mkdir cgi
cat <<EOF>>cgi/test
#!/usr/bin/tclsh
puts {Status: 200 Ok}
puts {Content-Type: text/html}
puts ""
puts {<span>Hey there</span>}
EOF
}
cleanup(){
rm cgi/test
rmdir cgi
fossil close
rm ../test-ext.fossil
cd ..
rmdir test-ext
}
##-------------
if [ "$1" == "clean" ]; then
cd test-ext || exit 0
cleanup
echo "$0:DONE $1"
exit 0
fi
setup
set -v
#TEST
chmod +x cgi/test
./cgi/test
## CTRL-C to exit
fossil ui --extroot `pwd`/cgi --nojail ##Open https://localhost:8080/ext/test
set +v
echo "$0:DONE"
(3) By anonymous on 2020-03-19 09:39:02 in reply to 2 [link] [source]
Does your test script run properly by itself on command-line?
runs perfectly fine when called as a script on command line. I also thought it was the case of being run in jail, hence --nojail flag there
(4) By vlad on 2020-03-19 12:51:27 in reply to 2 [link] [source]
original author of the thread here
My attempt at running fossil under debugger proved futile, sadly. Mind you I've never programmed C, so it could be just me.
Steps I took:
> FOSSIL_BREAK=1 fossil ui --extroot /Users/russki/Code/fossil-scm/cgi --nojail
attach to the fossil process with lldb
fine
br set -n ext_page
since it is the one that appeared in the backtrace of that segfault
continue
looks like it starts listening and loads the timeline as expected
Two problems I encountered here:
- run under debugger like this I can't even get static content,
- above break never gets hit.
I also compiled "out of source". While UI opens up, looks like ext doesn't work at all. Can't even get static content with otherwise the exact same command line args. But replace custom build with whatever brew
installed system wide and at least static content is served via ext/.
Another observation is that segfault
sometimes takes a minute to manifest itself. As in request ext/foo.html
and you get it. A minute later I see a segfault in terminal.
I expect I'll be working with Fossil going forward so naturally I'd like to figure this out but also understand how I can debug things like that. Adding a printf
at the start of fossil_main
seems to at least print. Doing the same at the beginning of ext_page
doesn't print. Maybe my assumption that it ought to be hit is wrong but then why it appears in the bt?
(5) By Richard Hipp (drh) on 2020-03-19 14:44:46 in reply to 4 [link] [source]
The "fossil ui" command forks for each incoming HTTP request, which messes up debuggers.
The way to do this is to put the raw text of the HTTP request that causes the segfault into a file. Name the file anything you want, but here we will call it "x1.txt". The x1.txt file probably should look something like this:
GET /ext/test HTTP/1.0
Note that there must be a blank line (and extra \n) at the end. I just don't know how to show that blank line using Markdown. Verify that this causes the segfault by running:
fossil test-http --extroot /Users/russki/Code/fossil-scm/cgi <x1.txt
You might need to adjust the content of the x1.txt file to get to fail. But once you do get it failing, then run the command in a debugger. If you can give us a detailed stack trace, that will be useful.
(7) By Warren Young (wyoung) on 2020-03-19 15:25:58 in reply to 5 [link] [source]
I had to use a variation on that to get a useful result:
$ lldb -f ./fossil -- test-http --extroot ~/tmp
Given the OP's test script as ~/tmp/x.tcl
and saying "run" at the LLDB prompt, I had to paste the following in, since there doesn't seem to be a way to tell LLDB to attach a file to the subprocess's stdin:
GET /ext/x.tcl HTTP/1.0
That done, I get the following from the debugger:
Process 83626 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00007fff79b123e6 libsystem_c.dylib`__setenv_locked + 306
libsystem_c.dylib`__setenv_locked:
-> 0x7fff79b123e6 <+306>: cmpq $0x0, (%rsi)
0x7fff79b123ea <+310>: movq %r15, %r13
0x7fff79b123ed <+313>: je 0x7fff79b12404 ; <+336>
0x7fff79b123ef <+315>: xorl %r15d, %r15d
Target 0: (fossil) stopped.
This seems to confirm my guess that the environment is somehow "locked" in this state, but my search-fu turns up nothing useful about what's going on.
Maybe try setting the environment up before the fork()
call?
(10) By vlad on 2020-03-19 15:42:11 in reply to 5 [link] [source]
Here's the best I could do in the debugger:
> lldb fossil
(lldb) target create "fossil"
Current executable set to 'fossil' (x86_64).
(lldb) process launch -i x1.txt --stop-at-entry -- test-http ../fossil-scm.fossil --extroot /Users/russki/Code/fossil-scm/cgi
Process 90140 stopped
* thread #1, stop reason = signal SIGSTOP
frame #0: 0x000000010036a19c dyld`_dyld_start
dyld`_dyld_start:
-> 0x10036a19c <+0>: popq %rdi
0x10036a19d <+1>: pushq $0x0
0x10036a19f <+3>: movq %rsp, %rbp
0x10036a1a2 <+6>: andq $-0x10, %rsp
Target 0: (fossil) stopped.
Process 90140 launched: '/usr/local/bin/fossil' (x86_64)
(lldb) c
Process 90140 resuming
Process 90140 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00007fff6ae35104 libsystem_c.dylib`__setenv_locked + 313
libsystem_c.dylib`__setenv_locked:
-> 0x7fff6ae35104 <+313>: cmpq $0x0, (%rsi)
0x7fff6ae35108 <+317>: je 0x7fff6ae3511e ; <+339>
0x7fff6ae3510a <+319>: xorl %ebx, %ebx
0x7fff6ae3510c <+321>: movq -0x38(%rbp), %rdi
Target 0: (fossil) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x00007fff6ae35104 libsystem_c.dylib`__setenv_locked + 313
frame #1: 0x00007fff6ae320f7 libsystem_c.dylib`putenv + 121
frame #2: 0x000000010003c102 fossil`ext_page + 802
frame #3: 0x00000001000682dd fossil`process_one_web_page + 2636
frame #4: 0x0000000100065bb0 fossil`fossil_main + 1869
frame #5: 0x0000000100065463 fossil`main + 9
frame #6: 0x00007fff6ad83015 libdyld.dylib`start + 1
frame #7: 0x00007fff6ad83015 libdyld.dylib`start + 1
I guess EXC_BAD_ACCESS
doesn't sound too good.
If we run in terminal all I get is:
> fossil test-http ../fossil-scm.fossil --extroot /Users/russki/Code/fossil-scm/cgi --nojail <x1.txt
Illegal instruction: 4
If I change x1.txt to GET /ext/static.html ... etc it works fine, so static pages work.
I'll try all of this under OpenBSD next, but others reported ext to work e.g. under Linux, so I don't expect the same error.
(6) By Warren Young (wyoung) on 2020-03-19 15:15:16 in reply to 1 [source]
This reproduces on macOS 10.14.6.
I can say that the thread of execution enters the putenv()
call underlying fossil_setenv()
in src/file.c
, and that the segfault occurs in that call.
What's most curious is that printing the parameters to the function doesn't segfault, so it's not a simple case of dereferencing a bad string, else my fprintf(stderr, ...)
debug calls would also fail.
Is the environment somehow "locked" in this state?
I don't see how to proceed short of tracing into a debug build of libc
at this point.
Incidentally, changing the implementation of this Fossil wrapper function to use setenv(3)
doesn't help. Thus my latest commit being a branch.
(8) By graham on 2020-03-19 15:31:51 in reply to 6 [link] [source]
Thinking aloud... does macOS impose any (unusual) limits on the lengths of names/values/total-length of the environment? Is it the very first call to putenv()
that fails, or do some get through? (On Windows, I see about 20 values being set, with HTTP_COOKIE being the longest at about 400 characters).
(9) By Warren Young (wyoung) on 2020-03-19 15:38:00 in reply to 8 [link] [source]
does macOS impose any (unusual) limits on the lengths of names/values/total-length of the environment?
I doubt it.
This macOS libc code appears to come straight from FreeBSD, which someone could confirm by attempting to reproduce the symptom there.
Is it the very first call to putenv() that fails
Yes, the one for DOCUMENT_ROOT
due to the use of the --extroot
flag.
(11) By Warren Young (wyoung) on 2020-03-19 16:05:59 in reply to 9 [link] [source]
Well, this is fun: the symptom does not reproduce on FreeBSD 11.3-p6, updated just now. It gives the expected result, the script's output.
I may attempt to upgrade to 12.1 later.
(12) By graham on 2020-03-19 16:28:55 in reply to 9 [link] [source]
To me, that suggests memory corruption: either what mprintf()
returns is invalid (did you try fprintf
ing zString
?), or earlier corruption is triggering a segfault in putenv()
(e.g. if it needs to resize the environment).
For a rather "hacky" exploration attempt, you could try replacing the call to mprintf()
with something like:
static char zString[2000];
sprintf(zString,"%s=%s", zName, zValue);
and seeing if (at least the first call to) putenv()
returns without segfaulting. If it does, you "just" have to find where memory got corrupted :-)
(13) By Warren Young (wyoung) on 2020-03-19 16:34:42 in reply to 12 [link] [source]
did you try
fprintf
ingzString
?
Yes, per this post up-thread.
you could try replacing the call to
mprintf()
I effectively did that already with my setenv-alternative
branch, which doesn't use the mprintf
'd string.
I probably also should have mentioned that I previously rebuilt Fossil with:
$ ./configure --with-sanitizer=address,enum,null,undefined
I got no complaints from any of the sanitizers.
And since my last post, I've updated my FreeBSD test VM to 12.1-p2, and it's still not failing the same as on macOS 10.14. That makes me wonder if Apple made some changes to this mechanism, since it appears to work properly on FreeBSD.
(14) By anonymous on 2020-03-23 01:28:44 in reply to 1 [link] [source]
Please check the update in [6e7211a26]. This should address the SEGV you caught. I used the test-ext.sh
above to test the fix.
In brief, OSX seems to be peculiar about assignments of NULL to double pointers. Previously, we were setting the environ
to NULL to "empty" the environment, but this was treated as a NULL pointer instead of NULL first element, thus *environ
dereference caused the SEGV.
So, environ[0]
is what we meant to NULL ...sorry, OSX :) Meanwhile, Linux is all fine about this either way.
By the way, looks like I also found a clang
optimizer bug (or maybe just a "usage issue") in the process of diagnosing this issue (not directly related).
(15) By vlad on 2020-03-23 09:32:36 in reply to 14 [link] [source]
That fixed it on OSX. Thank you very much for the fix!