--hash and identifying changed files
(1.1) By Kirill M (Kirill) on 2020-10-02 09:15:53 edited from 1.0 [source]
Hello everyone!
Having followed Fossil for a while (and also being a big fan of SQLite), I have finally started using Fossil myself.
I use it for my home-brewed DNS management system, where the previous reincarnation was using Mercurial since 2012 (and Subversion and CVS before that...).
I was about to ask a question about why some renames were causing strange errors, but discovered that adding --hash
to fossil commit
would solve the issue when rotating DNSSEC keys (rotation did a rename, and two key files would have same mtime and size, thereby confusing Fossil). Argh.
I am not quite sure how ls
and changes
work with regard to identifying changed files. The docs say that ls -v
displays the change status, in the manner of the changes command. The changes
command has a --hash
option to verify file status using hashing rather than relying on file mtimes.
The tech overview says that the checkout database records mtime and size of files as they were originally checked out, in order to expedite checking which files have been edited.
So, is there a risk that ls
or changes
fail to identify changed files if --hash
is not used? Should ls
get --hash
too?
Thanks for reading!
-- Kirill
(2) By Stephan Beal (stephan) on 2020-10-02 09:30:30 in reply to 1.0 [link] [source]
So, is there a risk that ls or changes fail to identify changed files if --hash is not used?
Coincidentally, this came up just yesterday: /forumpost/e41b2c4035
The risk, in practice, is very nearly zero. Personally i've never seen/noticed it happen in nearly 13 years of using fossil on an almost-daily basis. Similarly, it almost never comes up on the forum/mailing list. (That it's come up twice this same year is surprising.)
The docs say that ls -v displays the change status, in the manner of the changes command. The changes command has a --hash option to verify file status using hashing rather than relying on file mtimes.
ls does not use the has for change determination - it relies solely on mtime and size for that. That's inconsistent with status/changes ("changes" is a variant of "status"). Until your post, i don't recall anyone ever having pointed it out before. In practice that flag is truly never needed, but i'll add it to ls shortly for consistency with status.
(3.1) By Kirill M (Kirill) on 2020-10-02 09:50:50 edited from 3.0 in reply to 2 [link] [source]
Personally i've never seen/noticed it happen in nearly 13 years of using fossil on an almost-daily basis.
While I experienced in on my very first day of using Fossil... and this made me worry, to be honest.
Let me show it. In my use case, I periodically rotate DNSSEC ZSK by renaming files. Initially only cur and new keys are generated. Even though the former is generated before the latter, the time difference is not enough to make them differ. When the keys are rotated, the contents of cur go to old, new go to cur and new is populated with new keys.
In the file listing below, old and cur have same timestamp, because the keys have been rotated once already:
$ ls -l */uptime.is*
-rw-r--r-- 1 km km 152 Oct 1 08:14 cur/uptime.is.key
-rw-r----- 1 km km 114 Oct 1 08:14 cur/uptime.is.private
-rw-r--r-- 1 km km 152 Oct 2 09:15 new/uptime.is.key
-rw-r----- 1 km km 114 Oct 2 09:15 new/uptime.is.private
-rw-r--r-- 1 km km 152 Oct 1 08:14 old/uptime.is.key
-rw-r--r-- 1 km km 114 Oct 1 08:14 old/uptime.is.private
Now let's rotate them manually:
$ mv cur/uptime.is.private old/uptime.is.private
$ mv cur/uptime.is.key old/uptime.is.key
$ mv new/uptime.is.private cur/uptime.is.private
$ mv new/uptime.is.key cur/uptime.is.key
$ touch new/uptime.is.private new/uptime.is.key
fossil ls
does not detect that old keys have been changed!
$ fossil ls -v */uptime.is*
EDITED keys/zsk/cur/uptime.is.key
EDITED keys/zsk/cur/uptime.is.private
EDITED keys/zsk/new/uptime.is.key
EDITED keys/zsk/new/uptime.is.private
UNCHANGED keys/zsk/old/uptime.is.key
UNCHANGED keys/zsk/old/uptime.is.private
fossil changes --hash
catches the differences:
$ fossil changes --hash */uptime.is*
EDITED cur/uptime.is.key
EDITED cur/uptime.is.private
EDITED new/uptime.is.key
EDITED new/uptime.is.private
EDITED old/uptime.is.key
EDITED old/uptime.is.private
Before I added --hash
to commit
, it would complain about unchanged file being not equal to what has been checked out, but --hash
solved that for me.
(4) By Stephan Beal (stephan) on 2020-10-02 09:53:27 in reply to 2 [link] [source]
but i'll add it to ls shortly for consistency with status.
ls --hash is now a thing, but notice that it increases the runtime by literally 10x in the main fossil tree (with 983 files in this version - that ratio will change, approximately linearly, with the number of files):
[stephan@lapdog:~/fossil/fossil/src]$ time f ls -v | grep -v UNCH
EDITED src/checkin.c
real 0m0.027s
user 0m0.016s
sys 0m0.015s
[stephan@lapdog:~/fossil/fossil/src]$ time f ls -v --hash | grep -v UNCH
EDITED src/checkin.c
real 0m0.271s
user 0m0.238s
sys 0m0.036s
Useful trivia: running changes/status/ls with the --hash
flag updates the "is it changed?" flag in the checkout db. Thus if you plan to run those commands multiple times in a row, only the first one needs the --hash
flag. On subsequent calls it will just wast time if there have been no real changes. Unless your changes are happening less than 1 second since the last invocation with --hash
, that flag will just add runtime without any benefit.
And just to reiterate: --hash
is, in practice, not needed. It could hypothetically be useful in certain automation where changes are being made rapidly and change status is being queries via script (or possibly via extremely fast typists). In manual usage, though, it's really never needed.
Also note this bit of internal documentation from the routine which performs the real work of the status check:
** The mtime of the file is only a factor if the mtime-changes setting
** is false and the CKSIG_HASH flag is false. If the mtime-changes
** setting is true (or undefined - it defaults to true) or if CKSIG_HASH
** is true, then we do not trust the mtime and will examine the on-disk
** content to determine if a file really is the same.
The CKSIG_HASH
flag it's talking about is analogous to the --hash
flag. Thus if you do:
fossil set mtime-changes off
it will behave as if --hash
is always in effect:
[stephan@lapdog:~/fossil/fossil/src]$ f set mtime-changes off
[stephan@lapdog:~/fossil/fossil/src]$ time f ls -v | grep -v UNC
real 0m0.276s
user 0m0.246s
sys 0m0.033s
[stephan@lapdog:~/fossil/fossil/src]$ f set mtime-changes on
[stephan@lapdog:~/fossil/fossil/src]$ time f ls -v | grep -v UNC
real 0m0.028s
user 0m0.014s
sys 0m0.018s
Note that "on" is the default and the setting applies only to the current repo unless the -g
(global) flag is used to set it as the global default (for repos which don't otherwise set it).
(5) By Kirill M (Kirill) on 2020-10-02 10:01:23 in reply to 4 [link] [source]
ls --hash is now a thing, but notice that it increases the runtime by literally 10x in the main fossil tree (with 983 files in this version - that ratio will change, approximately linearly, with the number of files):
That was quick, thank you!
Useful trivia: running changes/status/ls with the --hash flag updates the "is it changed?" flag in the checkout db. Thus if you plan to run those commands multiple times in a row, only the first one needs the --hash flag. On subsequent calls it will just wast time if there have been no real changes. Unless your changes are happening less than 1 second since the last invocation with --hash, that flag will just add runtime without any benefit.
Good. I only need one ls
for all my files. :)
And just to reiterate: --hash is, in practice, not needed. It could hypothetically be useful in certain automation where changes are being made rapidly and change status is being queries via script (or possibly via extremely fast typists). In manual usage, though, it's really never needed.
Yeah, I just happened to hit the cornercase immediately, and that made me uncertain.
(6) By Stephan Beal (stephan) on 2020-10-02 10:05:19 in reply to 3.1 [link] [source]
Let me show it. In my use case, I periodically rotate DNSSEC ZSK by renaming files. Initially only cur and new keys are generated. Even though the former is generated before the latter, the time difference is not enough to make them differ.
Now let's rotate them manually:
... fossil ls does not detect that old keys have been changed!
If their sizes and timestamps are the same then changes
won't, by default, see them as modified until it's been run once with --hash
(see my follow-up post for why it's only needed once between each change). Note that mv
does not change any timestamps, so if both your old and new versions are from the same checkout they'll very likely have the same timestamps.
Out of curiosity: why rename them? Why not just overwrite and check in the new changes? Following renames through the history is not always straightforward (and presumably the history is important or you wouldn't have the files in fossil).
(7) By Kirill M (Kirill) on 2020-10-02 10:25:03 in reply to 6 [link] [source]
If their sizes and timestamps are the same then changes won't, by default, see them as modified until it's been run once with --hash (see my follow-up post for why it's only needed once between each change). Note that mv does not change any timestamps, so if both your old and new versions are from the same checkout they'll very likely have the same timestamps.
Out of curiosity: why rename them? Why not just overwrite and check in the new changes?
Rename is just faster than overwriting (or rather writing into a temporary file and then renaming...) When rotating, a new new
key (upcoming) is generated, the old new
key becomes cur
(and uses for signing) and whatever was in cur
is moved to old
in case someone wants to validate the signature made with the old cur
key. In practice I think I only publish cur and new keys, though...
Overwrtiting was a workaround I used before figuring out that the --hash
is the key to the solution. Here's the diff:
def rotkeys(self):
keys = []
for st in [['cur', 'old'], ['new', 'cur']]:
for suf in ['.key', '.private']:
ok = keydir.joinpath('zsk', st[0], self.zonefile.name + suf)
nk = keydir.joinpath('zsk', st[1], self.zonefile.name + suf)
- tk = keydir.joinpath('zsk', st[1], self.zonefile.name + suf + '.tmp')
- tk.write_bytes(ok.read_bytes())
- tk.chmod(0o640 if nk.suffix == 'key' else 0o644)
- nk = tk.rename(nk)
- nk = ok.rename(nk) keys.append(nk) for suf in ['.key', '.private']: nk = keydir.joinpath('zsk', 'new', self.zonefile.name + suf)
- tk = keydir.joinpath('zsk', st[1], self.zonefile.name + suf + '.tmp')
- tk.write_bytes(b'')
- tk.chmod(0o640 if nk.suffix == 'key' else 0o644)
- nk = tk.rename(nk)
- nk.touch(mode=0o640 if suf == '.key' else 0o644) keys.append(nk)
Following renames through the history is not always straightforward (and presumably the history is important or you wouldn't have the files in fossil).
For keys I don't need history, I only need the ones used at the moment. I do, however, want to keep the keys together with the zone files in case I need to set up the system somewhere else, and I also want to have history for my DNS zones, and a way to easily see what zone files have been changed and whatnot.
(8) By Dan Shearer (danshearer) on 2020-10-02 14:43:11 in reply to 5 [link] [source]
Kirill M (Kirill) on 2020-10-02 10:01:23:
I was the one who hit this yesterday. It is quite disconcerting.
I just happened to hit the cornercase immediately, and that made me uncertain.
There is a lot more than one corner case. This is a problem for all version control and build systems, and this discussion of corner cases demonstrates that there is no complete fix.
I suspect the particular corner case that got you was that "mv" doesn't update the mtime (although it does change ctime). And while ls --hash will catch that corner case there are others it does not.
Dan Shearer