Tag individual files with version in some way

(1) By anonymous on 2021-01-20 20:50:14 [link] [source]

Hi, because of legal requirements (yeah, dont ask :) )we have to give version numbers to individual source files. We are required to have some boilerplate text in the top of each file, the essential part being a line something like "MyFile V1.1.0", within MyFile.src.

It would be great if I could somehow tag the individual files with this version number in Fossil to see it in the timeline and so on, and keep track of it. Is this possible, or if there is other better ways to solve my problem? :)

Thanks!

(2) By Richard Hipp (drh) on 2021-01-20 21:31:05 in reply to 1 [link] [source]

The SHA3-256 hash of the file (which you can obtain conveniently using the fossil sha3 command) will give you an unforgeable version number. This number is not in the text of the file itself (as that would be inpossible - the act of inserting the hash into the file would change the hash). But it is simple to compute the "version number" of any file. And that version number can never be fudged, modified, or altered.

If you have a random file on disk, and you want to know what role it plays in a Fossil-hosted project, you can use the fossil whatis to look it up. In this way, you unambiguously identify the source of the file, when it was created, who created it, what it was used for, and so forth.

Boilerplate text at the top of a file, while convenient to managers, does not provide the same forge-proof guarantee about the identify of the file as does a separate SHA3 hash. If you are truly concerned about provenance, then the correct way to do that is with a separate cryptographic hash of the file content. Embedding the version number in the file itself is subject to forgery and fraud.

(3) By Warren Young (wyoung) on 2021-01-21 00:07:05 in reply to 2 [link] [source]

The SHA3-256 hash of the file (which you can obtain conveniently using the fossil sha3 command) will give you an unforgeable version number.

True enough, but any two files with the same content will end up with the same hash.

One way around that would be to put a full project-relative path to the file in question into each file somehow, such as in a header comment for source files. Documentation and other file types can be trickier, though, as not all formats have a "comment" type feature.

Another way would be to hash an aggregate that does have the desired uniqueness properties:

#!/bin/sh
usage() {
    echo "ERROR: $1"
    cat <<USAGE

usage: fossil-file-version FILENAME

     Gives a SHA3-256 hash of the named file that changes any time any single
     bit of the input file changes.  The set of possible hashes is statistically
     unique per FILENAME path.
USAGE
    exit 1
}

gfn="$1"
test -z "$gfn" && usage "FILENAME not given"
rfn="$(fossil finfo -n 1 "$gfn" | head -1 | cut -c13-)"
hash=$(fossil sha3sum "$gfn" | cut -f1 -d' ')
echo $hash | ggrep -Pq '^[0-9a-f]{64}$' || usage "$gfn not in repo"
echo -n "$rfn:$hash" | fossil sha3sum - | cut -f1 -d' '

This script gives the same unique hash for a file regardless of where it is in the tree, where that tree was checked out, or your CWD within that tree.

(Achieving that is what all that gfn vs rfn stuff is about: given file name vs repo root relative file name.)

However..., I think I might be reinventing manifests here. (§2.2)

that would be impossible - the act of inserting the hash into the file would change the hash

It's not impossible, just fiddly, annoying, and prone to failure, as we know from the lesson of CVS keywords. OP is asking for a rebirth of $Revision$ , where Fossil's equivalent would be the file artifact hash, as used in manifest files.

What we learned from that time is that it's more pain than it's worth to make this work reliably without corrupting files and such, which is why they were deprecated in Subversion and then never (?) copied by any subsequent VCS.

(4) By anonymous on 2021-01-21 07:29:08 in reply to 2 [link] [source]

Hi OP here, I totally agree, and I'm not advocating going back to the times before proper version control. The version number in the file is an "external" requirement brought upon us, and sadly something we have to live with.

I was just thinking that maybe if I could tag the individual source-file artifact, then this would be really useful for showing the version number in the timeline and so on, and it would be relatively easy to see human-related errors (like file got new hash without gettig assigned new version number tag etc.)

Thanks!

(5) By Richard Hipp (drh) on 2021-01-21 12:53:43 in reply to 4 [link] [source]

Crazy idea

What if there was a project setting such that when enabled and there is a "fossil commit", prior to computing the check-in content it looks as the first N bytes of each file that has changed (where N is perhaps 1000 or so), and if that file appears to be text and if the first N bytes contains text of the form:

    $Fossil-Timestamp: YYYY-MM-DD HH:MM:SSZ$

Then the value of the date/time string is automatically adjusted to the current date/time.

Could this be made to work?
Would it solve OP's problem?

More brainstorming:

Perhaps there is a separate command "fossil timetag" that goes through all the files in the check-out looking for any file that has been edited and which appears to be text and which has the "$Fossil-Timestamp: ...$ string somewhere in the first N bytes, and it updates the time tag on all such files. You could run this at any time. Then the setting mentioned above might be named "auto-timetag" and it has the effect of running the "fossil timetag" command automatically before doing each commit.

(6) By anonymous on 2021-01-21 13:03:15 in reply to 5 [link] [source]

That's the classic SCM feature of keyword substitution in text file, no?

The SCM I'm currently using can enable/disable it on a per-file basis,
and if I recall SVN, CVS, RCS, SCCS all supported it. Didn't know Fossil didn't.

(7) By Richard Hipp (drh) on 2021-01-21 13:26:15 in reply to 6 [link] [source]

I'm not aware of any of the newer cryptographic-hash based distributed version control systems that support keyword substitution. The fact that these newer systems are based on a cryptographic hash of the file content makes keyword substitution more complex.

If something like this is possible, and if we decide to do it in Fossil, then it could be enabled using a comma-separated list of GLOB patterns to identify the files to which it applies.

(9) By anonymous on 2021-01-21 13:46:10 in reply to 7 [link] [source]

Didn't know that either. But why can't keyword substitution simply happen pre-hashing?
Substitution is a commit-time action, the result of which are the actual desired sources
the user wanted, so it's normal to have the result hashed as-if the user had filled those in herself..

The only problem might be in conflict resolution I guess.

I have several multi-command CLIs, and usually I provide a debug-level
output that shows the revisions of all files implementing the various
commands, so that QA is sure to have the proper exe to test with, for a given
new feature or change. That's implemented using SCM keyword substitution.

Different use-case than the OP, but until know I didn't realize I would NOT
be able to do this in Fossil (or Git), given what you explained.

(11) By Stephan Beal (stephan) on 2021-01-21 14:02:46 in reply to 9 [link] [source]

Didn't know that either. But why can't keyword substitution simply happen pre-hashing?

Because then what you are checking in is not what you tested. That is, the contents change between the time you type "fossil ci ..." and the time the file is actually checked in.

As Warren points out, no(?) SCM which began life this century supports that feature. i, for one, feel that we should continue to avoid it.

Editing of the files is the domain of the user. Flawlessly keeping track of their opaque contents is the domain of the SCM.

(13.1) By Richard Hipp (drh) on 2021-01-21 14:18:48 edited from 13.0 in reply to 11 [link] [source]

what you are checking in is not what you tested

This is the biggest and most important problem, in my mind. Perhaps the compromise solution is this:

You must manually run "fossil timetag FILE ..." to adjust the timetags.
There is no auto-timetag. Instead, the "fossil commit" command warns you (and aborts unless you use --force) if you are committing changes with an out-of-date timetag.

In this way, there is always the opportunity to test your revisions before committing them.

Note that Fossil will, sometimes, change content during a check-in. If you have inconsistent end-of-line characters (some \n and other \r\n) you will get a warning, and an opportunity to automatically fix the problem. If you choose to auto-fix, then you are committing untested code. We could do something similar with time-tag, I suppose. Or, we could take Stephan's advise and revise the end-of-line converter so that it is a separate command, and "commit" always aborts and asks you to run the separate command before trying again. I'm leaning toward the second approach.

(18) By ddevienne on 2025-06-12 15:35:51 in reply to 11 [link] [source]

By that logic, newlines would not be converted / normalized by SCMs like Fossile or GIT for hashing and on checkout :)

(8) By Richard Hipp (drh) on 2021-01-21 13:32:42 in reply to 5 [link] [source]

Perhaps instead of using " $Fossil-Timestamp: ....$ " as the markup, we could use some text that could be harmlessly inserted into a Markdown or Fossil-Wiki document in such a way that the timestamp is actually displayed. Something like:

    <span class='fossil-timestamp'>YYYY-MM-DD HH:MM:SSZ</span>

This would enable embedded documentation to show the date of last modification as part of its displayed text.

(10) By Stephan Beal (stephan) on 2021-01-21 13:58:45 in reply to 8 [link] [source]

This would enable embedded documentation to show the date of last modification as part of its displayed text.

And it would be amend-friendly, in the case of clock snafus.

The addition of $Replacement Markers$ seems fraught with peril (clock snafus immediately come to mind) and RFE's to me. Next they'll want $Fossil-User$, then $Fossil-Checkin-Comment$, then they'll want to customize the markers, then, then, then...

(12) By Richard Hipp (drh) on 2021-01-21 14:08:44 in reply to 10 [link] [source]

And it would be amend-friendly, in the case of clock snafus.

Suppose the "fossil timetag FILENAME" command updates the time tag on FILENAME even if FILENAME had not otherwise changed. In other words, the "timetag" command is an override. Then

 fossil timetag FILENAME ... && fossil commit

Could be used to fix a clock snafu.

Next they'll want ...

I thought of that. In fact, I started to type in $Fossil-User:$ in my original message, but then thought better of it and erased that text before sending. I think we take a hard line here: The time-tag is just a time-tag and it is Zulu time only (no timezones).

I'm somewhat attracted to the time-tag idea because it solves item 14 in the Fossil To-Do List. Nevertheless, this is kind of a scary idea that needs to be thought through carefully before it lands on trunk.

(14) By anonymous on 2021-01-21 15:43:10 in reply to 12 [link] [source]

OP again, could I tag individual artifacts (source files) in fossil ui without touching the contents of the artifact?

The version number is manual labour in the source file, I was just looking for a way of somehow annotating it in fossil to lessen the pain of keeping track of this.

(15) By Stephan Beal (stephan) on 2021-01-21 15:51:40 in reply to 14 [link] [source]

OP again, could I tag individual artifacts (source files) in fossil ui without touching the contents of the artifact?

Hypothetically yes. The data model supports tagging any arbitrary hashed entity. It's hypothetically possible to tag a tag of a tag of a tag of a tag.

The catch, however, is that the UI/CLI interfaces for tagging currently intentionally limit themselves to checkins.

All that we need is a volunteer who's itched by that shortcoming enough to drill out the tag-related interfaces to allow them to accept non-checkin hashes.

(17) By Stephan Beal (stephan) on 2021-01-21 15:54:43 in reply to 15 [link] [source]

All that we need is a volunteer who's itched by that shortcoming enough to drill out the tag-related interfaces to allow them to accept non-checkin hashes.

FWIW, that would be my strong preference to any code which modifies client-side files (beyond the necessary and justifiable evil of line ending conversion, which is a fact of life we have to accept). That said, as i'm not volunteering to implement either one, have little say-so in the matter ;).

(16) By Richard Hipp (drh) on 2021-01-21 15:54:17 in reply to 14 [link] [source]

Yes, you can attach symbolic tags to any artifact you like. Fossil will correctly maintain and sync those tags. However, the current UI does not have any mechanism to show symbolic tags on individual files, at least not that I recall.

Anonymous OP: If you are working for a company and have resources to apply to this problem, we can probably help you. You can contact me directly at drh at sqlite dot org for further information.