"fossil up" clobbers pending renames, and another bug
(1.1) By James Cook (falsifian) on 2024-01-06 02:49:48 edited from 1.0 [source]
I just ran into this; mildly annoying:
- Run
fossil mv --hard a b
- Try to commit; fossil says to update first.
- Okay, fossil update
- My rename went away!
Luckly I caught it before committing, and was able to find the target filename I wanted in my terminal scrollback. (The filename had a date in it that was a bit tricky for me to figure out, so I would have been annoyed if I'd lost it completely.)
Below is a script to reproduce. The punchline is the last "echo" command.
While writing the script I discovered another bug: If I modify a file very quickly after fossil open
, fossil sometimes thinks the file hasn't changed. I could imagine this could lead to lost data if the user doesn't notice.
#!/bin/sh
set -e
mkdir bug
cd bug
fossil new repo.fossil
# In working dir 0, create commit "v0".
mkdir wd0
cd wd0
fossil open ../repo.fossil
echo v0 > a
echo another file > b
fossil add a b
fossil commit -m v0
# In working dir 1, add commit "v1" on top of "v0".
cd ..
mkdir wd1
cd wd1
fossil open ../repo.fossil
sleep 1 # Otherwise fossil thinks nothing has changed.
echo v1 > a
fossil commit -m v1
# Back in working dir 0, start to rename b, but then update.
cd ../wd0
sleep 1
fossil mv --hard b c
# Running fossil commit at this point gives a suggestion to '"update" first'
fossil update
test -e c || echo 'c disappeared!'
(2) By James Cook (falsifian) on 2024-01-06 02:58:05 in reply to 1.1 [link] [source]
Just confirmed both behaviours still appear at current trunk (revision c8fda6a62c). Was previously running 4d9ede80be (2023-07-10).
(3) By Florian Balmer (florian.balmer) on 2024-01-06 07:50:26 in reply to 1.1 [link] [source]
While writing the script I discovered another bug: If I modify a file very quickly after
fossil open
, fossil sometimes thinks the file hasn't changed. I could imagine this could lead to lost data if the user doesn't notice.
At least this part is not a bug, but a speed optimization that can be flexibly adjusted:
To have Fossil detect changed files by contents (hash) instead of mtime, see the
mtime-changes
setting or the --hash
option for the changes
and
commit
commands.
The other part of your problem is outside of my field of knowledge about Fossil.
(4) By James Cook (falsifian) on 2024-01-06 15:37:01 in reply to 3 [link] [source]
At least this part is not a bug, but a speed optimization that can be flexibly adjusted:
I just poked around the code a bit, and as I suspected, it looks like mtime is being used at one-second resolution.
I think the problem would pretty much disappear if fossil used higher-precision times. At least I know those are available on POSIX systems.
(5) By James Cook (falsifian) on 2024-01-06 15:55:05 in reply to 4 [link] [source]
I think the problem would pretty much disappear if fossil used higher-precision times.
Actually, forget about that. Isn't this just a matter of <
vs <=
? If the current time equals the mtime (to one-second resolution), then consider the file as potentially modified.
(6) By James Cook (falsifian) on 2024-01-06 16:00:47 in reply to 5 [link] [source]
Oops, I guess it's a bit more complicated than that, since we're not comparing to the current time, we're comparing to a time recorded in the db.
(7) By Florian Balmer (florian.balmer) on 2024-01-06 16:46:52 in reply to 4 [link] [source]
I think the problem would pretty much disappear if fossil used higher-precision times.
For actions initiated by users on a keyboard, I think 1 second seems mostly sufficient.
But no matter how small a granularity you choose, you can always write a script that completes fast enough so changed files go undetected.
I'm using the --hash
option a lot in my scripts. If the script knows the
modified files (as is the case here), their names can be appended to the
commit
command, to avoid re-hashing the entire source tree to find the
modifications:
fossil commit -m v0 --hash a b
fossil commit -m v1 --hash a
(8) By James Cook (falsifian) on 2024-01-06 20:51:51 in reply to 7 [link] [source]
Thanks, I'll keep that in mind.
I think it should be possible to come up with an mtime mechanism that is safe as long as the system clock never goes backward, no matter how quickly you modify files. For example:
For each file, store values
hash
andhash_mtime
.hash_mtime
defaults to NULL, meaning thehash
should be ignored.Whenever you hash a file's contents, if the file's mtime is strictly less than the current time, then you can update its
hash
field to its hash and itshash_mtime
field to the file's mtime. If mtime = current system time, don't update the fields; that means next time you'll just have to hash again.If you ever want to check a file and notice its
hash_mtime
field matches the file's mtime, then you are guaranteed that itshash
field is correct, as long as time didn't go backward, filesystem didn't get corrupted, etc.
(9) By Preben Guldberg (preben) on 2024-01-07 11:40:49 in reply to 8 [link] [source]
For the mtime, the current logic includes (among others) currentMtime!=oldMtime
to decide if we need to compare using the hash of the file.
If the mtime is unchanged, I think it's simpler to hash anyway if the mtime matches the system time. That ought to be an infrequent occurrence - and in my workflows certainly more infrequent than the mtime differing for an unchanged file because I had tested something and reverted my changes.
I added such a check in the hash-if-mtime-is-current branch. Works for me, but mind confirming on your end?
FWIW, I'm looking at why the renamed file is being deleted, and think I understand what's going on. Haven't had time to properly code and check yet, though.
(10) By Preben Guldberg (preben) on 2024-01-09 18:58:45 in reply to 9 [link] [source]
James, mind testing the diff below against current trunk?
I think it resolves the rename problem for you.
I have not yet run tested thoroughly - beyond ensuring that it does not break the latest addition to the [testing-improvements branch][https://fossil-scm.org/home/timeline?r=testing-improvements]. (Which would also need a test for this diff, if it wind up committed).
--- src/update.c
+++ src/update.c
@@ -276,11 +276,11 @@
/* Add files found in the current version
*/
db_multi_exec(
"INSERT OR IGNORE INTO fv(fn,fnt,idv,idt,ridv,ridt,isexe,chnged,deleted)"
- " SELECT pathname, pathname, id, 0, rid, 0, isexe, chnged, deleted"
+ " SELECT pathname, COALESCE(origname,pathname), id, 0, rid, 0, isexe, chnged, deleted"
" FROM vfile WHERE vid=%d",
vid
);
/* Compute file name changes on V->T. Record name changes in files that
(11) By James Cook (falsifian) on 2024-01-10 22:12:25 in reply to 10 [link] [source]
This seems to solve half the problem.
I ran the script in my original post with your patch, and then I examined the directory bug/wd0
.
The file kept its new name c
, which is good.
However, fossil changes
now says MISSING b
and fossil extra
now lists c
. (Before fossil up
, fossil changes
said EDITED b -> c
and fossil extra
listed nothing.)
I guess that means I need to re-run fossil mv b c
(this with without --hard
) after fossil up
. It would be nice if that weren't needed, but this is better than before, where the new file name was lost completely.
(12) By Preben Guldberg (preben) on 2024-01-10 23:16:47 in reply to 11 [link] [source]
Thanks for testing!
I’ll have to look closer, but chances are that what I cooked up indeed only did the first part - and that the rest are undesired side effects of an incomplete fix.
(13.1) By James Cook (falsifian) on 2024-01-10 23:47:43 edited from 13.0 in reply to 9 [link] [source]
Thanks, I tried the hash-if-mtime-is-current
branch, and it indeed solves the problem in the case of my script.
I still feel a bit uneasy, though. I understand it may be that nobody has the time or inclination for a more thorough fix, but for the sake of argument, below are (a) an example scenario + repro script showing how this can still go wrong; (b) a very untested patch trying to show what I had in mind for a more sound fix.
(a)
For example, imagine a user devising a script that first runs "fossil ci" and then changes a file. (Maybe they are applying a series of patches to a file foo
, but want to manually change a different file bar
before committing each patch.)
In this case the hash-if-mtime-is-current
branch is ineffective, because there will be a delay between when the mtime gets updated and when the next commit is done.
Here is a script showing what I mean:
#!/bin/sh
set -e
mkdir bug
cd bug
fossil new repo.fossil
fossil open repo.fossil
echo v0 > a
fossil add a
fossil commit -m v0
echo v1 > a
sleep 1
fossil chan
The last fossil chan
command should say a
was changed, but it doesn't, even on the hash-if-mtime-is-current
branch.
The following change fixes it...
(b)
Here's a patch to be conservative by only updating the mtime
field in the vfile
table to a nonzero value when the current system time is strictly greater.
Warning: I don't fully understand the code I've modified, and I've barely tested this. Please consider it as an exposition of the idea only.
In cases where I think the mtime
was being updated right after the file was modified, comparing to current system time seems pointless so I just set mtime
to 0
or leave it unchanged.
The below patch causes the script from (a) to behave correctly. There may be a performance hit due to the extra hashing this would cause.
EDIT to add: this is against trunk
, commit 383087bfab
to be precise.
Index: src/update.c
==================================================================
--- src/update.c
+++ src/update.c
@@ -975,14 +975,14 @@
file_setexe(zFull, rvPerm==PERM_EXE);
fossil_print("REVERT %s\n", zFile);
mtime = file_mtime(zFull, RepoFILE);
db_multi_exec(
"UPDATE vfile"
- " SET mtime=%lld, chnged=%d, deleted=0, isexe=%d, islink=%d,"
+ " SET mtime=0, chnged=%d, deleted=0, isexe=%d, islink=%d,"
" mrid=rid, mhash=NULL"
" WHERE pathname=%Q OR origname=%Q",
- mtime, rvChnged, rvPerm==PERM_EXE, rvPerm==PERM_LNK, zFile, zFile
+ rvChnged, rvPerm==PERM_EXE, rvPerm==PERM_LNK, zFile, zFile
);
}
blob_reset(&record);
free(zFull);
}
Index: src/vfile.c
==================================================================
--- src/vfile.c
+++ src/vfile.c
@@ -170,10 +170,11 @@
** If the mtime of a file has changed, we still examine the on-disk content
** to see whether or not the edit was a null-edit.
*/
void vfile_check_signature(int vid, unsigned int cksigFlags){
int nErr = 0;
+ time_t now = time(0);
Stmt q;
int useMtime = (cksigFlags & CKSIG_HASH)==0
&& db_get_boolean("mtime-changes", 1);
db_begin_transaction();
@@ -273,11 +274,11 @@
}
}
#endif
if( currentMtime!=oldMtime || chnged!=oldChnged ){
db_multi_exec("UPDATE vfile SET mtime=%lld, chnged=%d WHERE id=%d",
- currentMtime, chnged, id);
+ currentMtime>=now ? 0 : currentMtime, chnged, id);
}
}
db_finalize(&q);
if( nErr ) fossil_fatal("abort due to prior errors");
db_end_transaction(0);
@@ -322,14 +323,11 @@
continue;
}
content_get(rid, &content);
if( file_is_the_same(&content, zName) ){
blob_reset(&content);
- if( file_setexe(zName, isExe) ){
- db_multi_exec("UPDATE vfile SET mtime=%lld WHERE id=%d",
- file_mtime(zName, RepoFILE), id);
- }
+ file_setexe(zName, isExe);
continue;
}
if( promptFlag && file_size(zName, RepoFILE)>=0 ){
Blob ans;
char *zMsg;
@@ -359,12 +357,10 @@
}else{
blob_write_to_file(&content, zName);
}
file_setexe(zName, isExe);
blob_reset(&content);
- db_multi_exec("UPDATE vfile SET mtime=%lld WHERE id=%d",
- file_mtime(zName, RepoFILE), id);
}
db_finalize(&q);
}
/*
(14) By Preben Guldberg (preben) on 2024-01-11 17:14:58 in reply to 13.1 [link] [source]
> echo v1 > a
> sleep 1
> fossil chan
Thanks for emphasising this. That had slipped my attention and I focussed on the immediate effect of committing and immediately updating and committing again.
I have been reviewing the code and think I evaluated all places where
vfile.mtime is updated. Here e.g. fossil merge
and fossil stash
already set mtime to 0, similar to what you proposed.
I have updated the hash-if-mtime-is-current branch to take this approach instead, In a manner similar to what your proposed. Rather than not updating the mtime in places, I explicitly set to 0. My worry is that an unchanged mtime may still expose the problem if you do multiple fossil operations in the same second.
I also added some extra debugging options to existing commands that let you inspect vfile.mtime.
(15) By Preben Guldberg (preben) on 2024-01-12 16:16:28 in reply to 14 [link] [source]
I think I cracked the rename issue as well.
James, I'd appreciate your input on the following branches:
- update-with-renames, which attempts to retain the rename status after an update.
- hash-if-mtime-is-current, which sets the mtime to 0, similar to what you proposed.
(16) By James Cook (falsifian) on 2024-01-13 18:04:30 in reply to 15 [link] [source]
Thank you!
I am not familiar enough with the code to say much intelligent (except: was shoeAll
supposed to be showAll
?) but I can confirm that these solve the respective issues, at least according to the scripts I posted. And they both make me more confident I won't lose work.
I wonder if using ctime
would make more sense since mtime
can be faked. But I guess that's a separate issue.
(17) By Preben Guldberg (preben) on 2024-01-13 19:03:48 in reply to 16 [link] [source]
Thanks for catching that typo.