panic: Segfault during process_one_web_page

(1.1) By Alfred M. Szmidt (ams) on 2022-05-11 09:49:30 edited from 1.0 [source]

I keep getting:

  panic: Segfault during process_one_web_page

with the latest trunk [0833f7225b] on OpenBSD 7.1 when trying to visit the Repository List. I can reproduce this using:

  fossil server /var/www/htdocs/fossil --repolist

Any tips on debugging?

(2) By mark on 2022-05-11 10:56:48 in reply to 1.1 [link] [source]

I can't reproduce on OpenBSD 7.1-current or 6.9-release with neither a
debug nor release build of trunk.

Can you run a backtrace on the core file?

(3) By Richard Hipp (drh) on 2022-05-11 11:09:25 in reply to 1.1 [link] [source]

Please try again with the latest trunk check-in.

(4.1) By Stephan Beal (stephan) on 2022-05-11 11:12:49 edited from 4.0 in reply to 1.1 [link] [source]

Any tips on debugging?

The first thing to try, assuming you're working from a checkout which has been used to build multiple versions, is "make clean" and then reconfigure and rebuild. Once in a blue moon dependencies don't quite work and we end up linking an old/binary-incompatible object file in there somewhere. It doesn't happen often, but when it does it often results in weirdness like inexplicable segfaults.

Edit: nevermind - Richard's concurrent response seems like the more likely culprit.

(5) By Alfred M. Szmidt (ams) on 2022-05-11 11:11:57 in reply to 2 [link] [source]

No core dump is produced, this is a call to fossil_panic() or whatever and that doesn't do that AFAIU. Did you compile with -static?

(6) By Alfred M. Szmidt (ams) on 2022-05-11 11:16:36 in reply to 3 [link] [source]

That does the trick, thank you!

(7) By mark on 2022-05-11 11:31:35 in reply to 5 [link] [source]

Both with ./configure --static and without. I've no idea why I can't
reproduce it. But I now see Richard's fix.

I just looked at the code, Richard's installed a segv handler, that's
why there's no core file. Though it looks like it provides a backtrace
on some platforms.

(8) By Alfred M. Szmidt (ams) on 2022-05-11 11:37:15 in reply to 7 [link] [source]

Yeah, on GNU systems it will do a nicer backtrace. OpenBSD lacks backtrace(3), I think some of the other BSDs might have it -- one can get around it by using libexecinfo on OpenBSD if one wants. That might be a nice thing to do to the configure script, check if the library exists, and define HAVE_BACKTRACE ...

(9) By Richard Hipp (drh) on 2022-05-11 11:40:08 in reply to 4.1 [link] [source]

I found the problem by running:

valgrind ./fossil server /home/drh/www/repos --repolist

Valgrind told me the exact source code line where the problem was occurring. From there, the fix was easy.

(10.1) By mark on 2022-05-11 11:52:53 edited from 10.0 in reply to 8 [link] [source]

I discovered that recently when looking to install a segv handler in
fnc. When I realised I couldn't show the trace on base OpenBSD, I
scrapped the idea. The BUGS section in 6.9's backtrace(3) is funny
but I can't bring it up on https://man.openbsd.org for some reason.

That's a good idea about tweaking the configure script though.

ETA: copypasta from my local manpage

BUGS
     As typical with GNU software the interface is clumsy and error prone.
     While writing a more sophisticated backtracing mechanism it was obvious
     that the GNU functionality could be trivially emulated.

     Due to a bug in gcc one has to compile applications with the following
     flags -Wl,--export-dynamic in order to get human readable function names.

(11) By george on 2022-05-11 21:39:58 in reply to 3 [link] [source]

Thank you for handling the issue while I was away!
I'm sorry for the bother; I did not expect that g.zPath may be NULL.

@ams, thank you for reporting that issue.