fossil databse gets corrupt by logging

(1) By anonymous on 2019-07-11 10:04:29 [source]

Sometimes, rarely, but every now and then, my fossil database gets corrupt after I used it from fossil cgi (in a webbrowser).

The start of the file is not the usual sqlite stanza anymore ("SQLite format 3 ..."), it then is some logging (like "no-match [REQUEST_URI] env-match [SCRIPT_NAME] = [/fossil.cgi] env-match [PATH_INFO] = [/] no-match [HTTP_COOKIE] no-match [QUERY_STRING]"). The file size is unchanged, only the beginning is crippled/overwritten by log content.

This seems to happen when I set debug: FILE Causing debugging information to be written into FILE.

errorlog: FILE Warnings, errors, and panics written to FILE. in my fossil.cfg for cgi, i. e. if I enable logging.

It seems that some content which should be written to "debug: FILE" is now at the beginning of the fossil file itself.

I suspect this has something to do with backoffice, as I can see in the logfile around that time that the cgi is waiting for backoffice to finish its job to get access to the database.

Restoring from backup was always my solution, I had it running often enough to never loose anything.

Is this a known bug?

For now I just disabled logging, hoping that this will fix it. Maybe I can also just disable backoffice.

Nevertheless, probably it might be a good idea to catch that bug?

Thank you.

(2) By Stephan Beal (stephan) on 2019-07-11 10:12:52 in reply to 1 [link] [source]

Is this a known bug?

Definitely not. My suspicion is that your repo is being hosted from a USB stick, SD card, or SMB network share, all of which are known to be problematic from time to time (or more often).

What fossil version are you using?

Sidebar/trivia: there was an ancient problem where any assert() triggered in C code could indeed overwrite part of the database file(!!!), but that was fixed ages ago (5+ years). You mention backoffice, which is a new feature, so that assert problem is not what's affecting you.

(3) By anonymous on 2019-07-11 10:23:27 in reply to 2 [link] [source]

The version is quite new: fossil version 2.9 [5b6be64760] 2019-06-14 00:24:04 UTC
Repo is on ext4, locally.

(4) By Richard Hipp (drh) on 2019-07-11 10:44:38 in reply to 1 [link] [source]

It seems that some content which should be written to "debug: FILE" is now at the beginning of the fossil file itself.

Perhaps the backoffice is writing to a file descriptor that has been closed, but then later reopened by SQLite. I have to be away from the office for a couple of hours, but I will look into this when I get back.

(5) By Richard Hipp (drh) on 2019-07-11 12:20:54 in reply to 4 [link] [source]

Please try patch https://www.fossil-scm.org/fossil/info/458ced35354314b1 and report back whether or not this seems to clear the problem. Thanks.

(6) By anonymous on 2019-07-11 14:16:32 in reply to 5 [link] [source]

Thank you, Richard. This looks like you found it!

-> recompiled, testing now.

(As I wrote earlier, it only corrupts rarely for me, so it might take some time until I can reliably report "success")

BTW: Shouldn't fossil better do a "double-fork"?

(7) By Richard Hipp (drh) on 2019-07-11 14:24:24 in reply to 6 [link] [source]

I'd never heard of a "double-fork" before. Sounds like something that needs to be added to the backoffice implementation. This might clear some of the problems that (for example) OpenBSD was having.

But, since that is a potentially destabilizing change, I'll wait to do that after the 2.9 release, which will happen soon (perhaps this weekend).

(8) By Andy Bradford (andybradford) on 2019-07-12 01:57:01 in reply to 6 [link] [source]

What problem would actually be solved in Fossil by using the double-fork as suggested in these articles?

Thanks,

Andy

(9) By Andy Bradford (andybradford) on 2019-07-12 02:08:23 in reply to 6 [link] [source]

Also, I believe that using a  double-fork in a daemon breaks the ability
for daemon  monitoring services to  function properly. For  example, the
double-fork  breaks things  like runit,  daemontools, and  anything else
that is built on  a similar model. If Fossil does  need a double-fork to
daemonize, perhaps it should be optional?

For example, I have in a daemontools run script the following:

#!/bin/sh
exec 2>&1
exec envdir ./env setuidgid _fossil fossil server --repolist /repos

If  "fossil  server"  were  to   employ  the  double-fork  mechanism  to
daemonize,  this  would break  my  ability  to effectively  manage  this
service.

Or have I misunderstood the suggestion?

Thanks,

Andy

(10) By Warren Young (wyoung) on 2019-07-12 02:36:03 in reply to 9 [link] [source]

I believe that using a double-fork in a daemon breaks the ability for daemon monitoring services to function properly.

systemd can handle double-forked daemons. :)

have I misunderstood the suggestion?

Is this proposal not specific to the backoffice, and nothing to do with daemonization of the Fossil server at all?

Fossil already does the right thing with regards to daemonization: it doesn't fork itself into the background on fossil server; it stays in the foreground if that's how its caller started it. The thing to avoid is automatic and unconditional forking of the process into the background, because that takes flexibility away from the caller. If a process's caller wants it double-forked, it should do the double forking itself!

The backoffice can't depend on someone else — e.g. your daemontools run script — to do that on its behalf, so it has to arrange to do it itself.

I also don't see that double-forking solves this file handle confusion. My understanding of double forking is that it's just about avoiding zombie processes. It's easier to double-fork than to ensure that you do all of the wait() and SIGCHLD stuff properly.

(11) By anonymous on 2019-07-12 05:50:48 in reply to 9 [link] [source]

The suggestion is only for that very code place that drh has touched in this thread where 'backoffice' is started to go away, do its job and forget about it. There is only a single fork() followed by setsid() at the moment.

Other places where fork() is used have to be checked separately, independantly and carefully (if at all)-

AFAIK fossil server doesn't involve forking at the moment at all, so there is nothing to fix (going from one to two). The suggestion is not to introduce new forking.

(12) By anonymous on 2019-07-12 05:53:08 in reply to 10 [link] [source]

correct, wyoung.

Just an improvement for backoffice found as by-catch. No new daemon features. And independant from the file handle bug/fix.

(13) By Richard Hipp (drh) on 2019-07-16 12:24:08 in reply to 6 [link] [source]

Investigating further, I find that Fossil probably does not need a double-fork.

A double-fork is useful when the parent process continues running but does not invoke wait() to harvest dead children. The double-fork causes the daemon process (the child process) to disconnect from the parent, so that it does not become a zombie when it dies but the parent is still running.

But in fossil, the backoffice is only started as the parent process is shutting down. The parent will not continue running, but will itself die very shortly after launching the backoffice child. Hence, it seems the double-fork is superfluous and would accomplish nothing beyond consuming CPU cycles.

(14) By anonymous on 2019-07-16 12:45:50 in reply to 13 [link] [source]

OK!

( BTW: no more corruption so far since https://www.fossil-scm.org/fossil/info/458ced35354314b1 )

(15) By Warren Young (wyoung) on 2019-07-16 15:30:48 in reply to 13 [link] [source]

That's plausible. I think what we'd want to see next is ps output showing a lot of zombies on someone's Fossil server. No zombies, no problem.

(16) By Andy Bradford (andybradford) on 2019-07-17 01:25:27 in reply to 10 [link] [source]

> My understanding  of double forking  is that it's just  about avoiding
> zombie processes. It's  easier to double-fork than to  ensure that you
> do all of the wait() and SIGCHLD stuff properly.

Among other things,  yes, that's what it's about. I  didn't think Fossil
had a  zombie problem because it  uses wait() when appropriate  which is
why  I  asked the  question.  It's  actually  both fork()  and  setsid()
combined together that  make it so the process cannot  get a controlling
terminal.

At  any rate,  I would  like to  know what  the actual  problem is  that
Richard hinted at earlier in this thread when he said:

> This might clear some of the  problems that (for example) OpenBSD was
> having.

Thanks,

Andy

(17) By Andy Bradford (andybradford) on 2019-07-17 01:27:59 in reply to 10 [link] [source]

> systemd can handle double-forked daemons. :)

Yeah, and some people think that storing PIDs in files in the filesystem
and then  killing what  you find  in that file  later on  a good  way to
manage daemons too. :-)

Thanks,

Andy