Desynchronized working copy after restore of fossil-repo file from backup & pulling

(1) By Sergey G. Brester (sebres) on 2019-01-10 14:38:18 [source]

Prehistory:

My HDD with fossil-repo's files going broken, the working copies were located on the other HDD, thus I restored the fossil-repository files from backup (month old), made pull from origin-remote and was a bit surprised that all my working copies are not "synchronized" anymore. I see neither the current commit nor the branch where I'm (exec of fossil branch current returns nothing), all files are marked as modified, if I call fossil status.

Effectively nothing changed at all in the working copies self (was not affected by desaster, because other HDD). Just trying to check all (working copies containing _FOSSIL_) are resulting in the "state of current revision is lost".

Issue:

I'm searching for the way to "synchronize" my several working copies with restored from backup fossil-repository (@drh, it looks like fossil bug to me):
After restore of backup & pull, each copy effectively lost a reference to the current commit/branch, so e. g. fossil branch current returns nothing...

Bug description (if it would be accepted as):

Expected:

After restore+pull or (re)clone of the repository file, all working copies previously referenced this repo-file should remain the current revision-state.

Observed:

The current state is lost. If I execute fossil co -f --keep <branch-I-assume-was-checked-out>, it switched pointer (so quasi "restores" the state). And this working copy works as expected hereafter (I see all the modifications, can pull/commit/push, etc).

Just if I don't know which branch was checked out (and whether some modification are present in folder), it is too hard to find corresponding branch to checkout it, to stash or to do this soft reset, in order to remain local changes (related to the branch), etc.

@drh: I assume this is a bug, if so please open a ticket, I don't see "new ticket" on the main page here.

Thanks, sebres.

(2.1) By Stephan Beal (stephan) on 2019-01-10 14:56:16 edited from 2.0 in reply to 1 [link] [source]

If you have restored your repo from a month-old copy then the state of _all revisions_ made between the time of the backup and the time of your HDD failure are _irrevocably lost_. The local checkouts _do not_ maintain that state - it is stored in each copy/clone of the repository.

The best you can do here is something like the following, from each checkout:

  fossil close --force
  fossil open --keep /path/to/repo.fossil
  fossil commit -m ...

The "close" is required because restoring an old repo backup for use with a newer checkout will cause problems because the checkout refers to blob IDs which are no longer in the repository.

The 'open --keep' will re-open the repo but keep any modified files.

Then check in those changes (presumably a month's worth).

(3) By Richard Hipp (drh) on 2019-01-10 15:05:30 in reply to 2.1 [link] [source]

I was about to say what Stephen said, but he beat me to it.

Just to be clear: Sebres, you do not have any up-to-date clones of your repository?

Perhaps we should accept this as a bug report with a resolution to enhance the documentation to encourage people to make use of the auto-sync feature to keep one (or more) remote clones that are always up-to-date, so that they do not rely on manual (and perhaps months-old) backups?

Also, is there something we can do to make it clearer that Fossil does not store complete project history in the local check-out, but only in the repository? I'm guessing the OP has had prior exposure to Git which does store complete project history in the local check-out, as Git does not have separate repositories and there is no other place to store the project history. With many people coming from a Git background, perhaps it behooves us to make this point more explicit.

(5) By Sergey G. Brester (sebres) on 2019-01-10 15:12:42 in reply to 3 [link] [source]

I already answered to Stephen (see above), as I saw your answer.

No, it was not local repo (and no private branches, no local commits there, all revisions were pushed into remote repository). After restore, I get completely intact repository synchronized with remote-origin repository. So all SHA-IDs are the same.

Just the working copies lost the states.

An as already said, I don't know any other SCM, that will act here like a fossil.

(7) By Richard Hipp (drh) on 2019-01-10 15:25:22 in reply to 5 [link] [source]

Obviously, we are misunderstanding exactly what it is that you did.

Let me try again: You have a working checkout that points to some repository, say "/drive1/myproj.fossil". That repository file goes bad due to hardware problems. So you replace it with a clone from a different machine?

The _LOCAL_ database on a working checkout stores integer BLOB ids that reference the repository from which it originally checked out. But those integer BLOB ids are not consistent across clones. So, yes, replacing a repository with a clone can disrupt the local check-out.

Is that a better description of what happened?

(8) By Sergey G. Brester (sebres) on 2019-01-10 15:36:47 in reply to 7 [link] [source]

Yes, if I understand correctly what you said. (Just I don't have _LOCAL_ database, I have only _FOSSIL_ file in working copy). Possibly a scenario I already provided describe it better.

I've no problem at all with holding of internal ref's as integer IDs, just searching for a way to "synchronize" both (WC & repo-file) or at least anything that can say me current state of the art (branch/commit-hash of WC).

Is there some command that I could use?

Anyway, I assume now I could get another issue: some of my repo's were exported to git using marks... So I suspect that the marks (of fossil) are broken now?! As for your statement with integer BLOB ids.
Am I right?

P.S. BTW, I get no messages per email saying me I get answers from you (does this forum send an email)?

(9) By Richard Hipp (drh) on 2019-01-10 16:49:04 in reply to 8 [link] [source]

You can sign up for email notification separately: https://www.fossil-scm.org/forum/subscribe

(10) By Sergey G. Brester (sebres) on 2019-01-10 16:57:38 in reply to 9 [link] [source]

I'd not want to subscribe all, I thought I could receive at least the answers to my comments/subject.

(12) By Richard Hipp (drh) on 2019-01-10 17:06:30 in reply to 10 [link] [source]

We deliberately do not do that, because there are robots on the internet that will locate services that do work that way and then sign up unsuspecting email victims to receive notifications in order to harass them. This is called "backscatter spam".

(16.1) Originally by Sergey G. Brester (sebres) with edits by Richard Hipp (drh) on 2019-01-10 17:58:10 from 16.0 in reply to 7 [link] [source]

Originally by "sebres" - edited by "drh" for formatting

I tried to help me self and I'm really confused now.

Here is a small PoC from my side in order to find current state/ref's:

sqlite3 _FOSSIL_ "select value from vvar where name = 'checkout';"

returns really integer as you correct said.

But weird is - there is no hash at all in working copy DB as for current checkout, but also **no some other info** (from all 3 tables vfile,vmerge,vvar) that would help to find current revision id.

The question: **how fossil would guarantee the consistency regarding the reference between the local DB and referenced clone/repo?**

If this integer-IDs gets reordered for some reasons in the clone it may cause very unexpected consequences... I can remember, some commit of Jan Nijtmans with wrong timezone and/or utc-time in tcl-repo, that has changed "topological" orders on some clones, what is resulting to several wrong repo-states on several clones. This had also many after-effects everywhere, for example in the git-mirrors (this has caused fossil fix c0a3e9ff, etc.

Another question is why not simply to add still one value in `vvar`, something like `checkout-hash` that would hold hash of the latest revision. So it could be verified if fossil starts working with local checkout, in order to avoid topological issues like this.

Something like this:

  int checkOut = (int)...get_local_co_value("select value from vvar where name = ?", "checkout");
  string checkOutHash = ...get_local_co_value("select value from vvar where name = ?", "checkout-hash");
  string cloneCheckOutHash = ...get_cloned_value("select hash from ... where id = ?", checkOut);
  if (checkOutHash != cloneCheckOutHash) {
    printf("WARNING: found inconsistency (topological order changed / working copy deviates from cloned repository).\\n"
           "         trying to repair ...\\n");
    /* ... update local checkout to the value from cloned repo, corresponding the hash-value ... */
    ...
  }

(6) By Sergey G. Brester (sebres) on 2019-01-10 15:25:12 in reply to 3 [link] [source]

Just to be sure, we are talking about the same thing...

Following scenarion causes the bug I issued here:

   ## clone, backup and open:
   fossil clone https://.../test-repo /repos/test-repo.fossil
   cp /repos/test-repo.fossil /backup/test-repo.fossil
   cd /work/test
   fossil open /repos/test-repo.fossil

   ## change something & commit (possible multiple times) then push ...
   echo "change" >> a.txt
   fossil commit -m 'test'
   fossil push

   ## ... disaster case on /repos/ ...

   ## overwrite with backup & sync it with remote via pull
   cp -f /backup/test-repo.fossil /repos/test-repo.fossil
   cd /tmp/test && fossil open ...
   fossil pull
   
   ## check:
   cd /work/test
   fossil branch current
   ===> empty
   fossil status
   ===> all files are signaled as changed.

Oh, almost forgotten to amend - I was on windows (corporate pc).

(4) By Sergey G. Brester (sebres) on 2019-01-10 15:07:09 in reply to 2.1 [link] [source]

If you read my post carefully, you will realize that I restored the "made between the time" revisions from remote origin (via pull).
All this revisions was pushed therein previously and have surely the same SHA-hash value.

So I hope you can explain me why this situation is unimaginable using any other SCM system of the world (e. g. git).
Anyway otherwise I do not really understand where is the problem to restore the states in case like this, if all IDs of the commits (SHA-hash) are unique...

(11) By Richard Hipp (drh) on 2019-01-10 17:02:28 in reply to 1 [link] [source]

I am now inclined to consider this a real issue. The question becomes what to do about it. The origin of the problem is in the design of the local checkout database. That database contains references to integer BLOB-ids, which can be different from one clone to the next. So if the original repository is replaced by a clone, the references to the BLOB-ids in the local checkout database become obsolete.

This is a data design issue. The INTEGER fields here need to be hashes instead of integers so that the links do not break if the BLOB-ids get renumbered. There are also BLOB-ids in some entries of the VVAR table that would need to be changed to hashes as well.

Changing those fields means recoding every query that references the VFILE and VMERGE tables, and auditing references to selected entries in VVAR to see if they need to change. Doing this is a way that is backwards compatible would be very difficult.

As an interim measure, we can detect the problem by having the checkout database remember the latest entry in the RCVFROM table of the repository and then checking to make sure that entry still exists and is unaltered whenever the checkout uses the repository. That entry will be different if the BLOB-ids have changed in any way that will impact the checkout.

I do not know the impact on the marks associated with incremental Git export, but I suspect it will not be good.

(13) By Stephan Beal (stephan) on 2019-01-10 17:17:53 in reply to 11 [link] [source]

i have to wonder why this hasn't bitten all of us more often. Hypothetically, any long-time user should have stumbled over this several times, yet i can't remember ever having done so (since picking up Fossil during Christmas break of 2007).

Given that, while i fundamentally agree that it's a design issue worth addressing at some point, i personally categorize it as low-priority because of the scope of the required changes. It affects not only queries, but also C code which passes around RIDs, which is no small number of references:

$ cat *.c | grep -w -c 'rid' 
2371

(That's a pessimistic upper bound, as many of those are in comments.)

It would negatively impact memory usage and, to a lesser degree, performance, but those are admittedly minor nitpicks for this type of app.

A deconstruct/reconstruct of a repo db "should" be enough to trigger the out-of-sync RIDs fairly reliably, but we've yet, insofar as i can remember, to receive a report of this problem which traces back to that (perhaps nobody's using de/reconstruct).

(15) By Richard Hipp (drh) on 2019-01-10 17:25:57 in reply to 13 [link] [source]

Yeah. Perhaps reworking the checkout-database schema to store hashes instead of RIDs is going too far. For the near term, let's focus on reliably detecting the situation and giving a sensible warning. Perhaps some recovery mechanism can be worked out in the (rare) event that repository swap is detected.

(27) By anonymous on 2019-01-10 21:12:33 in reply to 13 [link] [source]

> i have to wonder why this hasn't bitten all of us more often.

I've run into it once in the 5+ years that I've been using Fossil.  I believe that I had a clone of a repository, I made changes, decided I didn't want them, so I replaced the entire repository with a different copy that I had.  This happened at a time when I was still new to Fossil so I didn't understand the implications of what was going on.

I believe someone (maybe you) explained in the old ML what had happened and since then I no longer try to replace fossils without using fossil close first.

(14) By Warren Young (wyoung) on 2019-01-10 17:20:37 in reply to 11 [link] [source]

Does re-opening the repository fix it?

If so, maybe all that's needed is for the checkout repo to have a UUID or similar that's specific to the repository it opened, so it can detect that the repository it was "opened" from has been swapped out from under it.

The only irreplaceable thing we'd lose that way is the stash, and we can warn about that.

(17) By Stephan Beal (stephan) on 2019-01-10 17:33:00 in reply to 14 [link] [source]

The only irreplaceable thing we'd lose that way is the stash, and we can warn about that.

A warning at that point is informative but doesn't offer any assistance in avoiding loss of the stash, as there's no(?) recovery strategy which would allow them to keep the stash once RIDs are out of sync.

As soon a single RID is out of sync, any use of the checkout db (which includes the stash) risks what amounts to corruption unless fossil happens to recognize it (because an RID in the checkout is missing from the repo). The worst-case scenario is, IMO, that the RIDs are out of sync but the checkout's RIDs point to valid, but semantically different, blobs in the repo. "Cats and dogs living together."

(18) By Richard Hipp (drh) on 2019-01-10 17:40:16 in reply to 14 [link] [source]

Things you currently lose when the RIDs get renumbered:

The stash (I didn't notice this before - but the STASH and STASHFILE tables also record RIDs from the repository.)
History of all "merge", "add", "rm", and "mv" operations that have occurred since the last commit.
Bisect history.
The ability to "undo" the previous operation.

Perhaps some of the above can be remediated by storing additional information (example: storing hashes in addition to RIDs) and then fixing up the RIDs when a repository swap is detected.

(19) By Sergey G. Brester (sebres) on 2019-01-10 17:57:12 in reply to 18 [link] [source]

(example: storing hashes in addition to RIDs) and then fixing up the RIDs when a repository swap is detected.

Exactly.

Additionally the storing would help also by possible manual repairs (if needed or no auto-repair possible for some reasons).

(20.1) By Warren Young (wyoung) on 2019-01-10 18:14:57 edited from 20.0 in reply to 18 [link] [source]

stash

If stash ls and stash cat also break when this happens, then I'd agree that re-opening isn't good enough.

I don't mind losing the record IDs of the stashes or being forced to push things back into the stash from patch files produced by stash cat, but being unable to extract the stash pieces before re-opening is a problem worth fixing.

undo

As with the stash, that's worth saving, as long as it's not difficult to achieve.

file operation history

I'm not considering that "irreplaceable" as it should be readily reconstructed from the shell history and the user's memory. I'm basing that on my philosophy of "Commit early, commit often," though, which I know not everyone shares.

bisect history

A bisect sequence isn't irreplaceable, in my view, since you'd presumably make the same good/bad decisions when restarting a bisect.

The window of vulnerability is pretty small here. You're multiplying the probability of a crash bad enough to require restoration from secondary-level backups by the probability of being in a bisect at the time by the average time you spend doing a bisect sequence.

This is one answer to Stephan's question above, why hasn't this problem been noticed more often before? Because it's rare and usually has only small practical impacts when it does happen.

(21) By Warren Young (wyoung) on 2019-01-10 18:21:54 in reply to 20.1 [link] [source]

I'm assuming above that the stash contains only textual diffs. If binary diffs in the stash are common enough, then even if stash cat still works without valid RIDs, that's still not good enough, since stash cat doesn't produce a format you can apply to a newly reopened checkout.

Personally, I doubt I've ever stashed a binary diff. There must be others that do that, though.

(25) By Kevin Kenny (kbk) on 2019-01-10 20:19:15 in reply to 20.1 [link] [source]

I'm surprised that I didn't have a disaster with this, myself, a couple of months ago when I replaced my SSD.

I have a RAID partition that is never backed up - because what it contains is stuff for which the recovery plan is 'the master copy of this is offsite or on other media, so I'll recover by recreating.' This includes movies, music, books, geodata, software distributions, ... and Fossil repositories. (For some stuff, like the geodata and the repos, I even maintain scripts that would govern the re-download process - make fresh clones, do a lot of wget's, etc.)

My working copies live on the SSD, and are backed up daily, and I've assumed incorrectly that they'd still be usable even if the repository were re-cloned. (The repositories are synced or at least pulled regularly, including by 'cron' jobs in case I forget to sync.)

I think I just was fortunate that I didn't have any uncommitted changes or stashes about. (I don't use the stash a lot - I tend just to make another working copy.)

Not being a regular here, I had no idea that local repos would have to be backed up. I had always presumed that fresh clones would suffice.

The impact of the disaster upon sebres is much greater than loss of working copies - the invalidation of rids in the workflow used for git mirroring (together with the fact that he has a great many branches divided among multiple git remotes) has apparently thrown his exports into turmoil, and I am not familiar enough with how that stuff works to advise him at all. (For what it's worth, several local working copies were successfully recovered with 'fossil close --force'/'fossil open --keep')

(26) By Richard Hipp (drh) on 2019-01-10 20:30:43 in reply to 25 [link] [source]

If you consistently sync with the same remote repo, probably your RID values will end up being in the same order, even if you reclone. There is no guarantee of that, but with a reasonable amount of luck, a reclone could be harmless.

As for the Git clones, I'm thinking it is only incremental cloning that will be messed up. To recover, you just run a full clone. That takes a while, but you only have to do it once, and then incremental cloning should start working again. I think. (Surely Sergey will let us know if I'm wrong.)

(30) By Sergey G. Brester (sebres) on 2019-01-11 00:05:51 in reply to 26 [link] [source]

Surely Sergey will let us know if I'm wrong.

Sure, and he'll do :)

The main reason why I used export marks was not the performance (which is important also, but not primary role in my case).
The reasons is:

I've get experienced the situation that fossil export new history-tree for git already 3 times:
- once as fossil changes the user-info as regards the mail (previously sebres <sebres>, then sebres@example.com <sebres>). Note, the mail and username are confused.
- second time as it was corrected (so to sebres <sebres@example.com>), or it was something else I'm not sure anymore.
- third time - as the format (normalization algorithm) of message has been changed again (from version 1.xx to some 2nd and then still one, if I'm not wrong again in 2.13 or nearby)... regarding the new-lines (and/or) leading/trailing spaces.
Each time the whole git-repository got new history-tree (with effort to me to find/rebuild/resync all branches and git working-copies, origins/remotes and sub-modules, etc).
As I switched it to export mark export (incremental) it saves for me and my colleagues a lot of work (in case fossil does again something wrong after update to next version), at least the history would be rewritten only from some common base (that is quasi frozen to me).
Only one time I got it broken this way (do you still remember, Richard? ;) And this was anyway only part of whole tree (so I get it repaired many times faster).

And once the format of the export-marks has been changed (previous fossil version wrote only hashes, next wrote the rids too).

This all as regards the incremental export.

Additionally, my git-repo's are many time larger (commit/branch counts is 10x extensive as fossil-repo, because contains all the commits of fossil and 20+ years of corporate/private works), many different remotes (that should be then synchronized too), sub-modules, conflicts in foreign remotes and clones, several testing-, CI- & development-clones and so on.

So the export-marks really help me to avoid the complete disaster. If I'll try today to do the full export/import (without the marks), I'd produce totally new history-tree (of fossil) in all my git-repo's from begin of tcl-epoch, so I could start the search for another job (e. g. cemetery gardener - quiet customers, no stress...).

If you consistently sync with the same remote repo, probably your RID values will end up being in the same order, even if you reclone.

Not really. And I guess there are too many ways where it will be not the same:

as already said, Jan has produced this once (with his wrong TZ), so other topology by new clone;
if one does something wrong in local repo (without auto-sync), for example if wrong branch was merged and committed, and then removes (purge) this "failed" artefact (it is not the problem without auto-sync, even with pullonly also)... in this case the identity is increased once, so the next clone get rid equals old rid minus 1, starting from this artefact.
in several other cases I can imagine if I take a look to the fossil source code.

(33) By Kevin Kenny (kbk) on 2019-01-11 17:49:19 in reply to 26 [link] [source]

If you consistently sync with the same remote repo, probably your RID values will end up being in the same order, even if you reclone. There is no guarantee of that, but with a reasonable amount of luck, a reclone could be harmless.

The rid's get out of sequence the first time a sync encounters both local and remote changes. If you occasionally work on a disconnected repo and then sync when you're back on the network, you'll eventually hit that case, and you likely will have rid sequence disagreements from that day forward. Or if you forget to pull just before committing, so that 'fossil commit && fossil sync' winds up both pushing and pulling changes.

Also, my cron jobs sync some of my repos with both core.tcl.tk and with mirrors on chiselapp, so I have some local repos that are routinely synced with multiple masters.

I'm still mildly surprised to have got away with the flawed plan of relying on a remote repo as the backup for a local one.

(34) By Richard Hipp (drh) on 2019-01-11 18:19:37 in reply to 33 [link] [source]

I'm still mildly surprised to have got away with the flawed plan of relying on a remote repo as the backup for a local one.

I don't see that as a flawed plan.

No checked-in work is ever lost due to this problem. You might have to be do a little finagling to salvage the edit that you were in the middle of in your checkout. The worst that might happen is that your uncommitted changes get so confused that you need to start over.

Do you regularly do remote backups of your checkouts, to preserve uncommitted changes in case your disk goes out prior to your next "fossil commit"? The worst-case scenario for the renumbering-RIDs problem is the same as losing your uncommitted changes in a check-out due to a disk problem.

(35) By Sergey G. Brester (sebres) on 2019-01-16 17:04:16 in reply to 33 [link] [source]

The rid's get out of sequence definitely more often as assumed.

By my attempts to restore rids in FOSSIL DB (shash, current branch, etc) for several working copies, I found now another strange case.

For better analysis I've just restored yet older backup of tcl-core-repo (from Apr. 2017) and after pull I see for example following strange picture:

My test working copy (0-day vulni continuation work 8.5-based) contains a large stash that I would repair/restore...

This has a rid 163305 which points in the temp-fossil-clone to artifact [134e487b22] (committed by Jan, 2017-09-04 12:38:14).
And I found the real artifact where it should be (by the date of stash and known branch), so it was definitely [707127bd2d] (committed by me, 2018-08-30 11:08:51).
But in new temp fossil-clone (after pull) it got rid 153293!

sqlite> select * from artifact where hash='134e487b227864429a7c7ab99b77ed9aafb8690e' limit 1;
rid|rcvid|size|atype|srcid|hash|content
163305|254|577|1||134e487b227864429a7c7ab99b77ed9aafb8690e|
sqlite> select * from artifact where hash='707127bd2d5bf27523922cb854210ab4d1b56e187b51d298b89387bc5ce8a47c' limit 1;
rid|rcvid|size|atype|srcid|hash|content
153293|254|1535|1|152421|707127bd2d5bf27523922cb854210ab4d1b56e187b51d298b89387bc5ce8a47c|

I know that the topology (between Apr. 2017 and today) is newly built and may be therefore totally different.

But the point is: how it may be that the newest artifact has lower rid as older?

Still again:

707127bd2d, 2018-08-30 11:08:51 --> 153293
134e487b22, 2017-09-04 12:38:14 --> 163305

If the synchronization occurs branch-related or grouped somehow else, the order how the rids are created is mostly undefined.

And it looks so, here is an output of 5 last commits from this repo with the rid's ordered by commit-time - as one can see the rid's have different order here:

sqlite> SELECT datetime(e.mtime) date, e.objid rid,
   ...>   coalesce(e.euser,e.user) user,
   ...>   printf('%15s', (SELECT value FROM tagxref WHERE rid=e.objid AND tagid=8)) branch,
   ...>   quote(substr(coalesce(e.ecomment,e.comment),0,50)) comment
   ...>   FROM event e
   ...>   WHERE e.type='ci'
   ...>   ORDER BY e.mtime DESC limit 5;
date|rid|user|branch|comment
2019-01-15 12:34:27|166639|sebres|          trunk|'next amend to [3e0a2d99f3]: fixes TclGetIntForInd'
2019-01-15 11:38:22|166622|sebres|          trunk|'fixes out-of-range index [string is ... -fail idx'
2019-01-14 20:03:36|166651|sebres|          trunk|'merge 8.7 (mingw/win-autoconf build, etc)'
2019-01-14 19:59:19|166656|sebres|  core-8-branch|'merge 8.6, conflicts resolved in win/Makefile.in '
2019-01-14 19:51:02|166650|sebres|core-8-6-branch|'normalize package provide for tcltests 0.1 (decla'

And compare with the output for same statement of my fossil-clone, I used to do this commits:

2019-01-15 12:34:27|166706|sebres|          trunk|'next amend to [3e0a2d99f3]: fixes TclGetIntForInd'
2019-01-15 11:38:22|166703|sebres|          trunk|'fixes out-of-range index [string is ... -fail idx'
2019-01-14 20:03:36|166701|sebres|          trunk|'merge 8.7 (mingw/win-autoconf build, etc)'
2019-01-14 19:59:19|166697|sebres|  core-8-branch|'merge 8.6, conflicts resolved in win/Makefile.in '
2019-01-14 19:51:02|166691|sebres|core-8-6-branch|'normalize package provide for tcltests 0.1 (decla'

As one can see the rid's are ordered here (same sequence as they have been committed by me).

From this point, the rid's get out of sequence every time you don't pull a bit longer (so even a restore of few days older backup could produce totally different rid's sequence and related topology as it was before the possible disaster).

(36) By Stephan Beal (stephan) on 2019-01-16 17:08:56 in reply to 35 [link] [source]

FWIW, Richard added a check for this mismatch condition to the trunk today, so it may be worth updating. There is not yet an automatic recovery, but with this addition fossil can recognize and report the problem.

(37) By anonymous on 2019-01-16 19:54:25 in reply to 35 [link] [source]

> From this point, the rid's get out of sequence every time you don't pull
> a bit longer (so even a restore of few days older backup could produce
> totally different rid's sequence and related topology as it was before
> the possible disaster).

You should not take a backup copy of a repository and expect it to have the same RIDs in your working checkout.  The only time that you would want to use your old backup is in an effort to *discover* potential things that you might want to recover out of the .fslckout (or _FOSSIL_ database).  If you ever replace the actual Fossil repository that a working checkout has opened, you cannot reliably use that working checkout any longer, except for forensic and recovery work at that point.

(38) By anonymous on 2019-01-16 20:05:42 in reply to 37 [link] [source]

> You should not take a backup copy of a repository and expect it to have the same RIDs in your working checkout.

Correction:

You should not take a backup copy of a repository and expect new modifications to that repository (e.g. via sync/pull) to generate the same RIDs in your working checkout.

(39) By Sergey G. Brester (sebres) on 2019-01-16 21:49:08 in reply to 37 [link] [source]

And now try to read carefully the whole discussion (it is always good before one writes a comment).

Starting situation: you have a local (cloned) repository of some remote, that has several working copies checkout's with different branches, where you are actively working.

The issue is - nobody makes a regular backup of such cloned repo's, because distributed and so they could be anytime "restored" from remote/origin repository (using pull or new clone). And one expects the certain consistency for all distributed copies no matter which SCM/VCS's (like git, hg/mercurial, etc or even fossil) used.
So the theory.

The worst case is - if you have lost a repository-file for some reasons and restored it from a bit older backup (and it has been kosher synchronized hereafter via pull from remote), your hash/uuid's are consistent with the remote, but unfortunately not the rid's...

And this is actually not the problem (because it is really internal thing of fossil), but (as long as there is no possibility to repair/restore)...

This means that several working copies will:

lose a references to current checkout, what makes impossible to find any modifications (which were accidentally or because WiP not yet committed) in all the working copies (do you still remember - you are actively working with several working copies with different branches etc);
lose all stashes (because of wrong id of stash as well as id of the blob's involved in stash);
if there are export-marks (that internally reference to rid's of commit/blob), the import (in other mirrored or joined repositories, like git) fill create a new history-tree (starting from the point where new rid-related topology was created);
all other things are mentioned by Richard in /forum/forumpost/b5947863d2;

Additionally drh as well as some other already see this or at least inclined to consider this as a real issue.
I must therefore ask you to stop advising here what one "should not do" or "should expect"... Please believe me, I'm aware.

(40) By Richard Hipp (drh) on 2019-01-17 13:25:49 in reply to 39 [link] [source]

Additionally drh as well as some other already see this or at least inclined to consider this as a real issue.

Indeed. And I'm increasingly aware of the seriousness of the issue. I hope to find time to fix it soon. I have plans to do so. In fact, the latest trunk version of Fossil does at least detect when a repository has been swapped out for a clone that has different RID values, and it issues a warning when this happens. It just cannot fix the problem, yet. I would have taken care of this already, but I am struggling with more important problems on other projects right this minute, and this problem has been in Fossil for over a decade without causing serious inconvenience, and so I think it can probably wait a week or so. Thank you for your patience.

(42) By anonymous on 2019-01-17 15:24:07 in reply to 40 [link] [source]

> the latest trunk version of Fossil does at least detect when a repository has been swapped out for a clone that has different RID values

What does one have to do to activate this?  Or does it happen the first time a pull is made with the new code?

(43) By Richard Hipp (drh) on 2019-01-17 15:36:18 in reply to 42 [link] [source]

Rebuild your fossil executable using the latest code from trunk. The new detection logic is completely automatic. The limitation is that it currently just gives you a warning, rather than actually fixing the problem.

(44) By anonymous on 2019-01-17 15:39:17 in reply to 43 [link] [source]

> Rebuild your fossil executable using the latest code from trunk. The new
> detection logic is completely automatic. The limitation is that it
> currently just gives you a warning, rather than actually fixing the
> problem.

Yes, I just discovered it (on an intentionally broken repository):

$ fossil status
Oops. It looks like the repository database file located at
    "/tmp/clone/../clone.fossil"
has been swapped with a clone that may have different
integer keys for the various artifacts. As of 2019-01-11,
we are working on enhancing Fossil to be able to deal with
that automatically, but we are not there yet. Sorry.

bad fingerprint

(49) By Stephan Beal (stephan) on 2019-01-17 19:49:43 in reply to 43 [link] [source]

@Richard, if there's no objection i'd like to extend the current warning to include:

    fossil_print(
      "As an interim workaround, try:\n"
      "  %s close --force\n"
      "  %s open \"%s\" --keep\n\n",
      g.argv[0],
      g.argv[0], zDbName
    );

Which comes out looking like:

[stephan@lapdog:~/fossil/fossil]$ ./fossil time
Oops. It looks like the repository database file located at
    "/home/stephan/fossil/fossil/../fossil.fsl"
has been swapped with a clone that may have different
integer keys for the various artifacts. As of 2019-01-11,
we are working on enhancing Fossil to be able to deal with
that automatically, but we are not there yet. Sorry.

As an interim workaround, try:
  ./fossil close --force
  ./fossil open "/home/stephan/fossil/fossil/../fossil.fsl" --keep

bad fingerprint


Is there any reason to believe that a close/open would be problematic for this case? AFAIK it's the only workaround we can currently offer.

(50) By Richard Hipp (drh) on 2019-01-17 19:54:49 in reply to 49 [link] [source]

Please do so. Thanks.

(52) By anonymous on 2019-01-17 21:20:00 in reply to 49 [link] [source]

>Is there any reason to believe that a close/open would be problematic for this case?

Will this leave whatever information might be useful for recovery/forensics in the .fslckout/_FOSSIL_ file around for use?  Or is that not useful information?

Seems like a last resort thing to recommend without investigating the potential loss and recovery options.

Specifically, even though --keep is used, it most likely won't open to the correct checkout which means the files that are actually modified won't diff correctly.

Might be better to copy the bad working checkout to a new location and run close/open there?

Just my thoughts.

(53) By Stephan Beal (stephan) on 2019-01-17 21:45:55 in reply to 52 [link] [source]

Will this leave whatever information might be useful for recovery/forensics in the .fslckout/FOSSIL file around for use?

'close' removes the checkout db, so those would be lost. Forensics are unlikely, IMO, to be of any use because there's literally no reliable way to match up mismatched RIDs, so there's not much one could do with the raw data in the checkout db. The causes of the mismatch are well-understood, so an investigation of why it happens is unnecessary.

Seems like a last resort thing to recommend without investigating the potential loss and recovery options.

There are no recovery options until an auto-recover can be implemented, and there will be cases where auto-recovery cannot recover. In short, the checkout db needs to hold a map of RIDs to UIDs in order to be able to determine if a complete recovery is even possible. For any UIDs which the checkout has but cannot be mapped to the opened repo, recovery really isn't a possibility (IMHO). That could happen if, e.g., one recovers the repo from an old backup which doesn't contain all checkins which were made before the original repo db was "recovered".

Specifically, even though --keep is used, it most likely won't open to the correct checkout which means the files that are actually modified won't diff correctly.

They will diff correctly vis-a-vis the newly-opened checkout, which is as close as we can currently get. Worst case, the diff is larger than intended. Once a mismatch has occurred, because one or the other db was replaced, there no way to get them them diff "correctly" because the version they were intended to diff against is now "lost information".

(46) By Sergey G. Brester (sebres) on 2019-01-17 17:48:53 in reply to 40 [link] [source]

For the case it may help by the searching of solution to restore export-marks:

I wrote a small helper script repair-export-marks.tcl that already has successfully restored the export-marks (the swapped rid's within).

Expects Tcl (tested with >= 8.5) and sqlite3 (I used 3.26.0)...

Result example for restoration of marks for my clone of tcl-core-repo:

  Repair fossil export-marks: .fossil2git-fssl -> .fossil2git-fssl.repair-export-marks-20190117
  Done. ** marks 113626 checkins 24004 blobs 89622 sane-rids 100668 wrap-rids 12958 wrap-checkins 3349 wrap-blobs 9609 **

Meant that 12958 of 113626 rid's got swapped in the fossil-marks file.

And the prime thing: Git import does not produce wrong check-in's and the whole history-tree is still consistent after import (across all git-refs: refs/remotes/fossil/, refs/remotes/upstream/, refs/remotes/sebres/*, etc).

(57) By Sergey G. Brester (sebres) on 2019-01-18 23:10:38 in reply to 40 [link] [source]

@Richard, for the case you can use something from my work...

I made new script repair-local-wc.tcl: that could be used to analyze local working copy states as well as to try to repair/re-swap working copies (current checkout, stashes inclusive referenced files, etc).
The script has parameter "process" (default 0), if not set to 1/true will do analysis (test-run) only and both fossil databases are opened in readonly mode in this case.

I had basically no interest for undo's (really temporary state in my case, mostly to neglect already after few hours), but if it is needed, could be done similar to the stashes.

For an example output of the test-run of script (using generated map of export-marks), see "repair-local-wc-example.txt".

Additionally I've extended my other (mentioned above) script "repair-export-marks.tcl" to generate map (a tcl-dictionary with old/new) for RID's from export-marks. This can be used by restoring of local working copies, because faster and have more correct RID's on the fly...

Because I have the export-marks for almost all repo's (have to repair), I used this RID-map by analysis/repairing of my copies, so I actually don't know how good the script will process without.

Do not hesitate to contact me should you have any question on this scripts etc (E-Mail, Tcl-chatroom, github, whatever).

(41) By anonymous on 2019-01-17 15:13:18 in reply to 39 [link] [source]

> I must therefore ask you to stop advising here what one "should not do"
> or "should expect"... Please believe me, I'm aware.

My apologies, I did not mean you specifically, but "you" in the general sense of anyone who might be reading your comments.

My advise was directed generally to those who might read your comments and then think that it is safe to reconnect an old copy of a cloned repository with an existing working checkout.  It isn't.  It hasn't ever been recommended, and has been advised against.

As for whether or not it's a problem, I don't think anyone is denying it's a problem.  Personally, I see the problem on the level of the same kind of thing that happens if ones has file system corruption and all the files end up in lost+found and one has to manually piece it all back together.  It isn't pretty and the OS can only do so much to make it easier on the admin who has to recover from it.

Also, the backup claim has always been directed at committed code in the repository; the loss of one Fossil among all the clones of a given Fossil repository only results in data loss if that clone hasn't synchronized it's contents somewhere.  This is why the auto-sync setting in Fossil defaults to on.

That being said, I think you have clearly pointed out a flaw in the working checkout portion of Fossil; over time I'm sure more improvements will be forthcoming.

(45) By Eric Junkermann (ericj) on 2019-01-17 17:11:59 in reply to 39 [link] [source]

The issue is - nobody makes a regular backup of such cloned repo's, because distributed and so they could be anytime "restored" from remote/origin repository (using pull or new clone)

Really? What distributed guarantees is that you can get all the committed and synced objects from the remote source. If the remote was unavailable when you did a local commit (with autosync) or you forgot the manual sync that is part of your process, (or ...) you won't get it back. The repository you just lost, a new clone, and a restored old repository + pull, will all have different RIDs, and possibly other differences as well.

Do you back up all your working copies regularly? A working copy can not (even when the current issue is covered) be guaranteed to be independent of the repository it is based on, so you should be backing that up too.

(47) By Sergey G. Brester (sebres) on 2019-01-17 18:17:03 in reply to 45 [link] [source]

Really.
And I never got situation like this using git, hg (even fewer "distributed" svn or cvs too) working on 3 different places and syncing repo's and working copies remotely.

Also note the emphasis was on word "regularly".

And I already wrote about fundamental case where even a 1 day backup/restore may cause swap the repo out.

I had days where I created 15 very large stashes testing a new feature and searching a bug, the half of that were important.

Savvy?

And yes, I do back up my system basically (but even rarely full-sync-able repo's including fossil). Until now with regard to the latter (and I'm still pretty sure about my git&hg-clones).

Let alone the backup of git/hg are pretty easy using the push method to remote which is backup, because it supports multiple remotes:

git push --all "git://my-backup-remotes/..."

That are and remain consistently as regards history-tree/topology/exports/etc of both remotes/clones.

This is the prime reason I thought fossil follows the same rules and therefore disregarded regular backups.

(22) By Sergey G. Brester (sebres) on 2019-01-10 19:30:18 in reply to 11 [link] [source]

Exactly.

But I don't see a large problem to "repair" the situation in the future. As well as I can't understand the Stephan's argu's about a "impact memory usage and, to a lesser degree, performance, etc".

One should just store a RID to the last integer-ID in local-checkout DB additionally. And verify it (and swap others) only in case the RID of last checkout deviates from RID stored in the cloned-DB (so effectively the comparison takes place only once if fossil-app starts to process command).

One compare. Once. That's it.

(23) By Stephan Beal (stephan) on 2019-01-10 19:46:14 in reply to 22 [link] [source]

Memory/performance: fossil uses a _lot_ of queries internally to do its work, and allocating memory for numbers (RIDs) is less costly than allocating memory for hash codes (strings which are at least 5-6x the size of 64-bit numbers). The performance hit would come in the form of the extra memory manipulation required for allocating/copying/comparing strings, as opposed to allocating/copying/comparing numbers. In the grand scheme of things, though, that difference would likely be effectively immeasurable in the context of all of the goings-on in fossil. Those hashes require much more space in the db, and tables like the mlink table would grow considerably if they used hashes instead of RIDs (that said, that particular table is in the repo, not the checkout, so they could continue to use RIDs - only cross-references from within the checkout db are problematic in that regard).

RID comparison: think of an RID as, essentially, a shorthand form of a UID. An RID (Record ID) refers to an entry in the blob table, and all entries in the blob table have a unique UID, but there is no inherent mapping of RIDs to UIDs - RIDs are assigned sequentially by the current copy of the repo db. An RID by itself is absolutely meaningless. It can refer to any blob, which includes any checkin (a checkin record is itself a blob). If we only compared RID numbers, it would still be possible that both have the same number but that they semantically mean different things (e.g. point to different checkins). Thus an RID comparison is not only useless but potentially dangerous, in that relying on it could cause a checkout to inadvertently derive from the wrong parent checkin.

An auto-repair feature would require that all cross-referencing uses hash codes, rather than RIDs. The RIDs could potentially still be used, but they would _have_ to backed (in the checkout db) by hash codes for their validity to be verified. If all cross-referencing is done with hashes instead of RIDs then the need for using RIDs in the checkout db effectively goes away, which hypothetically means that auto-repair becomes unnecessary.

(24) By Richard Hipp (drh) on 2019-01-10 19:48:27 in reply to 22 [link] [source]

If the checkout-database tables are modified to store 32-byte SHA3 hashes instead of 2-to-4 byte integers, that takes up additional memory. Extra CPU cycles are consumed in moving that memory around and in comparing the values.

If the RID values change in the repository, it is not possible in general to compute the mapping from old values into new values, so there is not a general way to update the check-out database. We can pull some tricks to remap some of the RID values in the checkout database, but not all of them. In order to do a full remap, the checkout database will need to be enhanced to store additional information that is only used when it becomes necessary to remap the RIDs.

(28) By Sergey G. Brester (sebres) on 2019-01-10 23:19:42 in reply to 24 [link] [source]

If the checkout-database tables are modified to store 32-byte SHA3 hashes instead of 2-to-4 byte integers...

Sure, but this was not my suggestion. I would only provide additional map (rid <-> hash), similar way the export-marks. So not replace int's but complement.

If the RID values change in the repository, it is not possible in general to compute the mapping from old values into new values?

Why not? For example cloned repo had 123 <-> abcdef entry. Same map in local working copy DB. After possible swap (fossil can determine it rapidly once by start, simply using last commit hash-rid comparison), e. g. new entry would be 120 <-> abcdef, the cloned rid's (120) could be rewritten using local rid (123), or vice versa. There are not too many references in local DB, thus possible this way it could be implement faster and with lower effort.

Anyway the determine process is important, to notify user - there is something wrong. Further work to synchronize both could affect as an extra fossil command. At least the important thing could be retrieved and wrapped easy: current commit and stashes in work-copy... Other like export marks, etc. may be a to-do.

(29) By Stephan Beal (stephan) on 2019-01-10 23:28:37 in reply to 28 [link] [source]

Why not?

Richard was referring to the current code/schema, where there is no checkout-side mapping of RIDs/hashes (checkouts currently use only the RIDs). In the current db scheme this reverse mapping cannot be done. It requires changes to the db schema and number of queries.

(31) By Sergey G. Brester (sebres) on 2019-01-11 00:26:09 in reply to 29 [link] [source]

... or as already said simply extending a local-DB with new table mapped rid's <-> uuid's (by first usage of rid locally or by writing it to local-DB first time)
and the restoring process from there (if swap of the cloned repo determined), resulting in several statements like:

-- disable constraints (if some are there) --

update rid_uuid_map l set rid = (select rid from tmp_tab t where t.uuid = l.uuid);

update vvar set value = (
  select rid from tmp_tab t where t.uuid = (
    select value from vvar where name = 'checkout-uuid'
  )
) where name = 'checkout';

...

-- enable constraints --

and similar.

All internal references in local DB may retain further as integer rid's (as long as there is resolvable from map).

(32) By Sergey G. Brester (sebres) on 2019-01-11 00:40:52 in reply to 1 [link] [source]

And I want just to amend: the primary role of each SCM (especially distributed version control systems) is to provide the consistency of the history tree to the outside.

Internal it could use what it wants, but to the outside it should always deliver the same data (no matter what outside actually is - the clones, origins/remotes, working copies or exports). Emphasis on always.

Normally, a rebase-process is only acceptable if the user want it explicitly (forces it), for example if local revisions (one want to rewrite/remove/rebase/amend) are still not yet synchronized with the remotes.

(48) By Stephan Beal (stephan) on 2019-01-17 19:09:59 in reply to 1 [link] [source]

(i'm replying to the top of the thread because this response is aimed at several of the deeper-nested responses which seem to be exaggerating and/or misunderstanding the significance/relevance of this "RID mismatch" issue...)

Management summary: Your data are safe. Don't panic. Keep fossiling.

To clarify exactly how important RIDs (Record IDs) are to fossil's data integrity: not at all.

RIDs are, in essence, the primary key for an internal cache which is unique to each copy of a repository, and they have zero relevance for the long-term integrity of the underlying data. They are an internal optimization for linking records which themselves have no ID other than their unique hash. At its lowest level, fossil only references its data using their hashes, but RIDs are used "everywhere" internally because they're more efficient and easier to work with in C code. Fossil never, ever shares RIDs between two copies of a repository, nor does fossil publish any RIDs to the outside world via links or such. Each clone/copy uses only its own internal RIDs, and never compares/syncs RIDs with any other clone. Using the "deconstruct" and "reconstruct" commands will literally eliminate all RIDs and recreate the DB from scratch, which re-creates/re-assigns the RIDs (remember, they're just part of an internal caching mechanism).

Unlike a repository db, which is standalone and "durable", a checkout maintains its local state in a transient/disposable database which links its local state to the copy of the repo it was opened from, and it uses that copy's RIDs (artifact cache keys) to maintain that link. The relationship between repos and check databases is one-way: a checkout necessarily knows its repo but a repo does not know (nor need to know) about any of its checkouts.

This "RID mismatch" problem occurs only when one or both databases in a given repo/checkout pair is/are replaced with a copy which has a different sequence of RIDs. Except for one known hypothetical mismatch case which, as far as we know, has never happened in the wild, this problem is "cosmetic", in that it does not lead to data loss or parent/child mismatches in the data hierarchy, just confusion (as is evidenced by this thread). Closing and re-opening an affected checkout db is all that's required to resolve it (see the first response in this thread for the exact commands and an explanation of their relevance).

The fossil devs, above all Richard, are aware of the problem and it will be resolved at some point. Until then, it is most certainly not an "end of the world" situation and has never, in and of itself, caused data loss (though it's certainly possible that the resulting confusion has led developers down paths which caused collateral loss). As an interim workaround, Richard has added a feature which "fingerprints" a repo, so that such a mismatch can be detected and the user warned - simply update your fossil binary to the current trunk version to get that feature.

(51) By Sergey G. Brester (sebres) on 2019-01-17 21:08:47 in reply to 48 [link] [source]

Agree. This is indeed not "end of the world" situation.

nor does fossil publish any RIDs to the outside world via links or such.

Well, this is not whole true - export-marks contains RID's, thus yes, it does indirectly. Which as already said could cause new history-tree on mirrors, which is very bad if checkins are referenced elsewhere (foreign repo's and merges, sub-modules, etc).

As well as the sentence - this problem is "cosmetic" - sorry, but it sounds a bit undervalued to me.
From the point of view, I've not directly lost the data at the moment (excepting several stashes, that are important to me and I still don't know state of 2 working copies, because still can't find the matching check-in, so dunno some local modifications in the working copies are present or not), let alone some other minimalist issues... But I've lost a lot of time (almost a half week) on this problem (repairing, searching, checking, syncing with mirrors, etc.)...

(54) By Stephan Beal (stephan) on 2019-01-17 21:52:39 in reply to 51 [link] [source]

Well, this is not whole true - export-marks contains RID's, thus yes, it does indirectly. Which as already said could cause new history-tree on mirrors, which is very bad if checkins are referenced elsewhere (foreign repo's and merges, sub-modules, etc). ... As well as the sentence - this problem is "cosmetic" - sorry, but it sounds a bit undervalued to me.

RIDs are used in git incremental exports (apparently - i've never done a git export). As Richard mentions somewhere above, doing a full re-export resolves it. It's a "cosmetic" problem in the sense that no information is irrevocably lost, provided one doesn't lose all of their up-to-date repo copies/backups.

This may be incorrect, but i believe the "marks" used in incremental exports are not retained by git after it imports them. If it does then the git export is fundamentally flawed for using RIDs as a reference point. If git doesn't retain them then this is, IMO, a non-problem.

(55) By Sergey G. Brester (sebres) on 2019-01-17 23:52:10 in reply to 54 [link] [source]

doing a full re-export resolves it.

No, as I already mentioned above, it does not. Full reexport will make it still worse, because creates totally new history to me... This is caused by the lot of incompatible changes in fossil export process during several years (committer name, messages etc), which cause new hash values.

After import in git you'll see 2 completely independent trees (timelines), where one (previous) tree referenced in all foreign remotes and merges, submodules etc, and new timeline (with new hashes) parallel show totally new tree (with no one revision referenced). And this (new) tree grows with next imports (during the old tree does not become continuation anymore).

The incremental export protected me even for situation like this. See hier for more details, which consequences it had to me.

Don't forget that some long life branches cannot be rebased (due to conflicts).

Anyway, the reexport is not an option in my case, so unfortunately it is not "cosmetic" issue to me.

(56) By Sergey G. Brester (sebres) on 2019-01-18 00:13:23 in reply to 54 [link] [source]

Fortunately, I could few or more successfully resolve the export issues...

See - https://fossil-scm.org/forum/forumpost/6d53c630c3?t=c

Just one older repo still has an issue be repair and consequently by import. This goes to the confusion of two check-ins with pure file-blobs. Previously both RIDs have cNNNNN and bNNNNN marks. During repair only blob can be found (as well as the uuids references really only the files artifacts in fossil)... And I cannot find event entries for this RIDs (fit the uuids in artifacts). Strange, but...

I assume this repo has get some "broken" export (purge or something else in-between), so now it is a bit out of "tree".

Just lucky in the circumstances, exactly this repo has not too many foreign references, so if I don't find the solution to convince the git continue the import incrementally, I could rewrite the foreign branches in other remotes (using rebase, cherry-pick or squash) and update refs in submodules manually.

(58) By Richard Hipp (drh) on 2019-01-20 21:46:47 in reply to 1 [link] [source]

The latest check-in on the rid-renumbering branch (at the time of this writing b03652382a327740) should both automatically detect swapping out of the repository database AND make appropriate adjustments to the VFILE and VMERGE tables.

Caution: When you begin using the version of Fossil on the branch, it will automatically upgrade the schema of your local ".fslckout" or "_FOSSIL_" checkout database. This is suppose to be harmless, in the sense that you should still be able to use an older version of Fossil that only understands the older schema design. But there could yet be bugs, especially in the case where you repeatedly switch back and forth between using a newer and an older "fossil".

Limitations: You cannot "fossil undo" across a repository swap. Also, a repository swap will erase your bisect history, so if you are in the middle of a bisect when the repository swap occurs, you will need to restart the bisect.

Your testing of these changes will be appreciated.

(59) By Sergey G. Brester (sebres) on 2019-01-21 10:11:51 in reply to 58 [link] [source]

Thank you, Richard.

I've a little question about the branch - is the field "vfiles.mhash" really necessary? I mean, does the value of "checkout-hash" (now in "vvar" table) not enough for the simplest restoring of current vfiles-states of repo now?
Just curious, because I got it repaired (swapped the RID's of vfiles) via fossil checkout -f --keep $hash without this enhancement.

Simply to pursue an earlier Stephan's argument:

It would negatively impact memory usage and, to a lesser degree, performance.

The case the repo is in a temporary state of active merging/reverting process is almost to neglect here (either it is already committed, or the merge was "failing" due to many conflicts, or one could not complete this for some reasons) but such target of merge is apparent from the checkout/modifications.
Anyway checkout --keep does not do any changes on files of working copy, so all possible modification are still visible after the checkout (repair). So in the possible worse case, it is really affected, following scenario does the work too: backup local files -> re-merge/re-revert -> restore local files. Clear it expect manually activity, but IMHO it is anyway better in such (older) unfinished merge processes.

BTW, although the stashes are another thing, because they could really exist during considerable time and different stashes are based often on a different revisions.
But I believe also it could get the hash-field only in table stash, where the table stashfile can be repaired using RID's of base-checkout of the stash self (so table stashfile does not need to be altered with the new hash-field).

Just to hint the certain similarities:

stash(hash) <-> vvar(checkout-hash)
stashfile   <-> vfiles

Regards, Serg.

(60) By Richard Hipp (drh) on 2019-01-21 10:25:38 in reply to 59 [link] [source]

The vfile.mhash field is only populated for files that have been modified by a merge operation, and for which vfile.rid!=vfile.mrid. Usually vfile.mhash is NULL, so it does not use up a lot of extra space.

For the case where the current check-out contains multiple merges from different branches, I could not figure out a reliable way to repair the vfile.mrid value after a repository swap without having access to the vfile.mhash. I could determine separate candidate vfile.mrid values for each of the merges, but I could not come up with a good way to determine which candidate vfile.mrid value was correct.

(61) By Sergey G. Brester (sebres) on 2019-01-21 11:46:25 in reply to 60 [link] [source]

Sure.
Just my point was to ignore this too, also in case of single merge (because a bit hard to handle, and the case is rather very rare). Let alone, the merge-process is anyway repeatable as well as well apparent from the current checkout/modifications, after WC-checkout gets cleanly swapped.
But it's ok.

(62) By Richard Hipp (drh) on 2019-01-21 23:59:26 in reply to 58 [link] [source]

Just closing the loop: I have been using these changes all day, and they are now on trunk.