Proposal for fossil improvement

(1) By ramsan on 2019-07-10 18:34:43 [source]

Hello,

There are some fossil commands that only allow working in an open checkout, like "commit", "add" and "rm". My proposal is to add option "--repository" to them in order to be able to work directly with a repository. In this way, it would be possible to do:

fossil commit -m "my message" FILE1 FILE2 -R myrepository.fossil
fossil add FILE1 FILE2 -R myrepository.fossil
fossil rm FILE1 FILE2 -R myrepository.fossil

When using option -R, it would be always necessary to include one or more files "FILE1 FILE2" in the list.

The operation would always be made in the most recent checkin for the selected branch or for trunk.

Why would it be useful?

a) When it is necessary to casually operate on one or a few files in a repository but without having an open checkout

b) It would facilitate the possibility to implement the commit, add and rm operations in the fossil server gui, so as we would be able to perform a commit from the web browser

c) Similar to b) for other automation tools and other uses of fossil in web servers

What is your opinion on this?

(2) By Warren Young (wyetr) on 2019-07-10 20:11:30 in reply to 1 [link] [source]

File names in Fossil are always fully-qualified relative paths from the root as seen from the checked-out UUID.

Take src/leaf.c in Fossil's own source tree as an example. That is its proper and only name, as far as Fossil is concerned, assuming it's never been renamed in the repo. It is never just "leaf.c". So, in order to check in a change against leaf.c, you'd have to pass the name to Fossil as "src/leaf.c".

That affects your proposal because now you implicitly need a local directory called src to hold the file leaf.c so that your FILE1 parameter can name both the local file name and the name it is known by in the Fossil repo.

Either that, or you need to be able to say something like this instead:

  $ fossil ci -m "my message" --local temp12345 --fqn src/leaf.c \
      -R repo.fossil

That is, you're checking in the contents of local file temp12345 in as the new version of the file with a fully-qualified name ("fqn") of src/leaf.c.

That gives you a choice of two bad situations:

You really do need a local checkout directory, or at least a subset of it, so you can give Fossil names of local files that match up with their names in the repo.
You make it difficult to pass more than one name to a single command, which ends up nerfing Fossil's atomic commit feature back to CVS equivalence.

I think I'd prefer to just give up on a) and implement b) and c) in terms of ckout within a checkout directory.

(3) By ramsan on 2019-07-10 20:26:14 in reply to 2 [link] [source]

I fail to see the bad situations.

Option 1:

create folder src
Have file leaf.c in folder src
fossil commit -m "my message" src/leaf.c -R repo.fossil

Create folders is not difficult. The point here is to avoid to create a checkout that could be on the order of 100s of MB in order to commit a small file.

Option 2:

fossil ci -m "my message" leaf.c --as src/leaf.c FILE2 --as src/FILE2

either of the two solutions, nice, beautiful and simple.

(4) By Warren Young (wyetr) on 2019-07-10 20:34:49 in reply to 3 [link] [source]

If you're going to take Option 1, then I don't see why you'd need to require file names at all. Fossil could just crawl the local tree and check in changes for any files it finds. Missing files can be ignored.

(5) By Richard Hipp (drh) on 2019-07-10 20:44:16 in reply to 3 [link] [source]

The point here is to avoid to create a checkout that could be on the order of 100s of MB in order to commit a small file.

I'm a little concerned about that. Fossil has a subsystem that verifies the proposed new check-in against files on disk prior to committing the transaction. Those cross-checks are designed to prevent coding errors from corrupting a repository and/or causing lost work, and they have kicked in once or twice over the past 12 years.

Your proposed enhancement would remove at least one of those cross-checks.

Of course, the cross-check is not required. It's really just a safety net to prevent problems if we make a silly coding error. But if we take the safety net away, that means that all future changes to Fossil become a lot more stressful, as we spend a lot of time worrying about whether or not we have created some bug by which work could be lost.

(6) By ramsan on 2019-07-11 09:49:33 in reply to 5 [link] [source]

Option 1

Repository has 100 files. Local checkout has modifications in 10 files and we want to commit only 2 files.

fossil commit -m "only two" FILE1 FILE2

Option 2

Repository has 100 files and we want to commit only 2 files

fossil commit -m "only two" FILE1 FILE2 -R myrepo.fossil

Why there would be any easier or better to perform the cross checks or verifications in Option 1, compared to Option 2?. I think that Option 1 is even more prone to errors, as there are modified files that need NOT to be committed.

To avoid any ambiguity, it should be possible to select the parent checkin with:

fossil commit -m "only two" FILE1 FILE2 -R myrepo.fossil -r trunk

(7) By Stephan Beal (stephan) on 2019-07-11 10:21:01 in reply to 6 [link] [source]

Repository has 100 files. Local checkout has modifications in 10 files and we want to commit only 2 files.

If you have changes to commit, you necessarily have a copy of the checkout somewhere (presumably where those two modified files are stored).

To avoid any ambiguity, it should be possible to select the parent checkin with:

Noting that the ambiguity only exists because of this feature.

Why there would be any easier or better to perform the cross checks or verifications in Option 1, compared to Option 2?. I think that Option 1 is even more prone to errors, as there are modified files that need NOT to be committed.

That aspect of Option 1 is no more error-prone than forgetting to pass the -r trunk flag to Option 2.

FWIW, i'm thoroughly convinced that such a feature would not be even remotely worth the significant re-architecting which would be needed in Fossil to make the feature possible. The majority of the information involved in/required for making a checkin is sitting in the checkout db, and that db depends, in turn on the files it represents being accessible to it.

(8.1) By ramsan on 2019-07-11 11:12:07 edited from 8.0 in reply to 7 [link] [source]

If you have changes to commit, you necessarily have a copy of the checkout somewhere

The main use of this capability would be, in my opinion, to add a capability for remote commit to server applications, even to fossil server GUI itself. In this case, you do not have a checkout in the server and you do not want one

Noting that the ambiguity only exists because of this feature

There is no ambiguity if you write in the help:

Not including -r is equivalent to including "-r trunk"

FWIW, i'm thoroughly convinced that such a feature would not be even remotely worth the significant re-architecting which would be needed in Fossil to make the feature possible. The majority of the information involved in/required for making a checkin is sitting in the checkout db, and that db depends, in turn on the files it represents being accessible to it.

I fail to see why it would be necessary to significantly re-architect fossil in order to implement the second option compared to the first:

fossil commit -m "only two" FILE1 FILE2
fossil commit -m "only two" FILE1 FILE2 -R myrepo.fossil

They are exactly equal, with the only difference that in the first case, the current checkin is obtained from FOSSIL and in the second case, the current checkin is obtained from "-r" or "-r trunk"

First option CANNOT depend on the local copy of the files, except FILE1 and FILE2, because even in the case that all that files are modified, deleted or renamed (the files, not fossil mv or fossil rm), the commit will continue to work and modify ONLY FILE1 and FILE2 inside the repository for the new commit

(9) By ramsan on 2019-07-11 11:28:49 in reply to 8.1 [link] [source]

Let's see it with an example.

Let's assume that we have a repository with files FILE1 FILE2 FILE3 FILE4

Option 1

fossil open myrepo.fossil mybranch
rm FILE3 FILE4
fossil commit -m "only two" FILE1 FILE2

Option 2

cp FILE1 FILE2 DIR
cd DIR
fossil commit -m "only two" FILE1 FILE2 -R myrepo.fossil -r mybranch

The code that deals with the commit in the first option, will see 3 files: _FOSSIL_, FILE1, FILE2. It will extract current checkin and current repository from _FOSSIL_ and will get the new version data of FILE1, FILE2 from the two local files.

The code that deals with the commit in the second option, will see 2 files: FILE1, FILE2. It will extract current repository from -R and current checkin from "-r mybranch" and will get the new version data of FILE1, FILE2 from the two local files

That's it. Same architecture, same verifications, same cross-checkings, only a new function to extract chekin information from "-r" instead of from _FOSSIL_

(11) By anonymous on 2019-07-12 02:15:01 in reply to 9 [link] [source]

What's suggested above seems more like automation of a check-out step for this specific use pattern.

You could accomplish this already with a wrapper CGI script (per your remote use-case), pre-processing and then properly routing the existing fossil commands.

Fossil client serves more broad purpose, also it very much fits into the common SCM usage patterns.

(10) By Stephan Beal (stephan) on 2019-07-11 14:04:04 in reply to 8.1 [link] [source]

There is no ambiguity if you write in the help:

We could just as easily say in the help "always type the exact list of files you want to commit so that you don't commit as-yet-unready changes."

I fail to see why it would be necessary to significantly re-architect fossil in order to implement the second option compared to the first:

i say with confidence, born of an above-average understanding of the fossil internals, that the way the checkout db is structured and used is not something which can simply be refactored over a lunch break to supports this. It would require a significant effort, a significant risk of new bugs, and, so far, none of the 2-4 people currently capable of potentially making such a change has expressed any interest in doing so. (In fact, the 3 of those 2-4 people i'm thinking of have all voiced concerns about it in this thread. ;)

We do appreciate that you've taken the time to propose a feature which would be interesting for you, but i personally consider it far outside the scope of any SCM to allow one to commit changes to files from outside of a checkout. (But that's just a personal opinion, and not a guaranty that the feature will remain unimplemented!)

(12) By anonymous on 2019-07-15 04:47:19 in reply to 10 [link] [source]

I had an idea a few years ago that it would be useful to allow the checkout db to sit outside the checkout. The motivation was manage working files in locations which weren't amenable to having extra files such as the checkout db added in. A secondary motivation was using fossil as sort-of versioned backup tool, again on some directory tree to which the checkout db couldn't be added.

I had a working prototype of this at one time, but I don't see the code here now so perhaps I threw it away. As I recall, I just added a setting indicating where the files were. You still had a checkout and the checkout db, but the files under management were in a different directory tree.

(13) By Eric Junkermann (ericj) on 2019-07-15 13:16:00 in reply to 12 [link] [source]

I had an idea a few years ago that it would be useful to allow the checkout db to sit outside the checkout. The motivation was manage working files in locations which weren't amenable to having extra files such as the checkout db added in. A secondary motivation was using fossil as sort-of versioned backup tool, again on some directory tree to which the checkout db couldn't be added.

What sort of directory is not "amenable" to having one extra hidden file added? And why?
Fossil is not a backup tool.

I had a working prototype of this at one time, but I don't see the code here now so perhaps I threw it away. As I recall, I just added a setting indicating where the files were. You still had a checkout and the checkout db, but the files under management were in a different directory tree.

It can't be a setting in the repository, since it is perfectly possible (and normal) to have multiple working directories (checkouts) for the same repository, each of which needs its own separate "checkout db". Repositories don't know what working directories they have. It could perhaps somehow be in the per-user database (in the user's home directory) but I suspect that won't work either for what you are trying to do.

(22) By anonymous on 2019-07-16 00:07:13 in reply to 13 [link] [source]

Fossil is not a backup tool.

Since no one here at my employer, particularly no one from IT, has said anything about version control not being a back tool, I asked the IT people, here, about that. Here's what I learned.

The rule "Version control is not a backup" arose because early VCSs, like RCS and SCCS, stored the revisions locally in a subdirectory of the files being versioned.

Later, VCSs like CVS and SVN introduced a client/server architecture to versioning systems. With a properly setup server AND proper backups of the repositories, a VCS can be a back up tool for "pure documents", like source files and even office documents.[1]

Even though the distributed VCSs use a local repository, they can still be used as back up tools WHEN changes are regularly pushed to remote repositories. Fossil with auto-sync on and communicating fits this.

Also note, backing up a VCS repository can often be better than just backing up files. On a busy day, there can be more commits, even to the same file(s), than there are back ups.[2]

[1] For system configuration files, our IT people have wrapper scripts, for SVN, to save/restore file attributes not tracked by SVN.

[2] For comparison, my employer stores many high importance documents in a "document management system. Document changes and additions in the system are auto-sync'd between our 2 US offices and to a 3rd server in a server co-location center. Each of these 3 repositories is backed up daily. But often, many documents are updated more than once on any given day.

For what it's worth, our IT director say she could do 99.9% of the same with SVN as the hugely expensive document management system. She said the main things SVN lacks are automatic document ID generation and an easy way to make TortoiseSVN display it and search by it. While a post commit hook could add a docID property to each new file, enhancing TortoiseSVN as needed is a large unknown. (Access control in the DMS is per project, which can be simulated by controlling who has access to each, per project repository.)

(23) By stevel on 2019-07-16 00:54:37 in reply to 22 [link] [source]

FWIW my policy is that Fossil is not a backup tool but it is part of a multi-layered backup strategy.

(26) By Stephan Beal (stephan) on 2019-07-16 07:14:30 in reply to 22 [link] [source]

The rule "Version control is not a backup" arose because early VCSs, like RCS and SCCS, stored the revisions locally in a subdirectory of the files being versioned.

The reason Fossil isn't a good backup tool is because it does not record file permission and ownership, and few backups can be fully functional without those bits of information. (Where "fully functional" means "can be used as-is for recovery.")

(14) By anonymous on 2019-07-15 14:42:25 in reply to 12 [link] [source]

in locations which weren't amenable to having extra files [...] As I recall, I just added a setting indicating where the files were

Added a setting where? To the check out DB?

Having that in the check out DB is only half the problem. From inside a check out, Fossil has be able to find the check out DB.

I suppose fossil could look up the locations of the check out DBs in ~/.fossil then check each DB for where its files are. But that's more work then simply searching ./.fossil, ../.fossil, ../../.fossil, etc to find it.

Also, what happens if the check out is in /web/www/foo and a system admin moves it to /web/ww2/foo? This has happened to me and several friends of mine at several businesses.

(24) By anonymous on 2019-07-16 03:41:40 in reply to 14 [link] [source]

Hi. I'm the anonymous who "had an idea a few years ago". I actually mentioned this to suggest a possible starting approach to the OP's requirement -- instead of having no checkout, have a checkout that includes only the checkout db, and check in your individual files from there. Get around the difficulty of not having the checkout db by having one.

About my idea, there are obviously other ways to get the same effect, e.g., rsync into a checkout directory or use a passthrough mount to splice the directory tree in question into the checkout, but having the ability to point the checkout somewhere else had enough appeal to me for me to see how hard it would be to implement.

To clarify how my POC worked functionally, you'd run fossil from an empty checkout directory. The checkout db would have a setting indicating where the actual files were (/usr/local/share/xyz or whatever). fossil was modified to work on /usr/local/share/xyz/x.conf when the name given on the command-line was x.conf. In the end, it's more an interesting idea than a useful one, but I can't help feeling like it ought to be more useful than it probably is.

-- Patrick TJ McPhee

(15) By anonymous on 2019-07-15 15:29:42 in reply to 12 [link] [source]

I had an idea a few years ago that it would be useful to allow the checkout db to sit outside the checkout.

My approach to this is to have a "staging checkout" where I can test everything before making it live. Then I remove the .fossil file, pause production, rename the production directory to prev-production then rename the staging directory to production and resume production.

If you are talking about using Fossil to store snapshots of, for example, /etc, I have a script that does:

    cd ~/system/etc
    rsync /etc .
    fossil ci

If I have to restore a previous configuration, I can use rsync --exclude .fossil . /etc - AFTER I review differences. (Very rarely, I might skip reviewing the differences before restoring. Even more rarely, I might skip making the snapshot before doing the restore.)

(16) By Stephan Beal (stephan) on 2019-07-15 15:36:24 in reply to 15 [link] [source]

Regarding:

cd ~/system/etc
rsync /etc .
fossil ci

Fossil is a poor choice of tools for doing backups of anything other than your own personal files. Fossil does not retain file ownership/permission information, except for the executable flag, and many files under /etc require specific permissions and/or ownership. Certain SSH config files, etc., are not allowed to be world-readable and may require ownership by certain users, making a fossil-stored backup useless for recovery purposes without additional scripts which record and restore the permissions and ownership.

tar is a far better choice for doing backups of /etc. Put the tar in fossil, if you must, but that seems a bit extreme.

(17) By Martin Gagnon (mgagnon) on 2019-07-15 16:07:42 in reply to 16 [link] [source]

I agree for backups by itself (to be able restore in case of failure), Fossil is a poor choice.

But it can be use in addition of backups, to have a history of changes.

Fossil can be use to monitor the change, so you have meaningful comments when changing some service configurations. You can looks at diff between arbitrary version from the timeline or you can even do a bisect when a bad configuration is discovered late.

(18) By anonymous on 2019-07-15 17:38:16 in reply to 16 [link] [source]

Yeah, /etc was a poor example. I don't know what the poster has in mind, so I the easiest example that came to mind.

My script is actually more general that what I typed. I just wanted to show a simple example.

(25) By Keisi (keisi) on 2019-07-16 05:18:30 in reply to 16 [link] [source]

etckeeper handles many of these things, and while it doesn’t currently support fossil, it’s meant to be extensible. If one is wedded to using fossil for this.

(28) By Andy Bradford (andybradford) on 2019-07-17 00:16:27 in reply to 16 [link] [source]

> Fossil is a  poor choice of tools for doing  backups of anything other
> than your own personal files.

Actually,  combined with  mtree[1]  from OpenBSD,  it  could work  quite
nicely, but definitely tar is simpler.  But without mtree, yes, it would
be a  challenge. The nice  thing about  having your mtree  files checked
into Fossil  coupled with the  files that  you want preserved  in Fossil
means that you actually have a  record of what you think the permissions
should be and their files and you can track changes to both.

With mtree, a restore would look someline like:

cd /etc
fossil co /etcrepo.fossil
mtree -U -f ./mtree/specfile

So  yes, you're  right,  you  certainly would  have  to have  additional
scripts and most people probably won't  want to go through the effort of
building  mtree files  to track  all the  permissions, but  it could  be
automated. Also, unfortunately, not all Unix-like OSes have mtree.

I used to use CVS for this type  of thing based on designs found in [2],
though Fossil would clearly be my choice today.

[1] https://man.openbsd.org/mtree
[2] http://infrastructures.org/

Thanks,

Andy

(19) By Marcelo Huerta (richieadler) on 2019-07-15 19:42:50 in reply to 15 [link] [source]

"Staging checkout"?

The attempts to "git-ify" Fossil continue.

I'd venture that people likes Fossil precisely for the lack of such esoteric "features".

(20) By anonymous on 2019-07-15 21:32:17 in reply to 19 [link] [source]

This has nothing to do with git.

This is about deploying a new version to production.

In this case, "staging checkout" means that the version to be deployed is checked out to a directory where it can be fully tested prior to deployment. Ideally, no changes will be made in this checkout. Once all the tests pass, remove the ./fossil file and make the staging directory the new production directory. (Or copy the files to the production directory.)

In the past, when using SVN, I would do a svn export to the staging directory. This creates a "check out" with no .svn directory. This really only mattered in early versions of SVN where every directory in check out had a .svn directory.

(I agree about git's esoteric features, especially "staging a commit". If I really have to commit a subset of the files I have changed, I will do a fossil stash on the files I want to exclude, then build/test/fix/repeat what's left, commit that, then fossil stash pop and continue. So far, the times I've had to use git, I check in a commit using git commit -a with no problems (of course, still have to use git add to add new files).)

(21) By Martin Gagnon (mgagnon) on 2019-07-15 22:03:56 in reply to 20 [link] [source]

In the past, when using SVN, I would do a svn export to the staging directory. This creates a "check out" with no .svn directory. This really only mattered in early versions of SVN where every directory in check out had a .svn directory.

May be not very elegant, but you can do something similar by piping the output of the fossil tar command (works on unix like system).

Example:

 fossil tar <version> -R /path/to/repo.fossil --name <topleveldir> - | tar -xzf -

This will create the <topleveldir> with content of <version> without the fossil checkout db.

(27) By ckennedy on 2019-07-16 18:32:47 in reply to 21 [link] [source]

Why not just fossil open … run your checks, then fossil close? It removes all fossil specific files while leaving all the source files behind.

(29) By Andy Bradford (andybradford) on 2019-07-17 00:18:21 in reply to 19 [link] [source]

Unfortunate clash  of terms in this  case. "Staging checkout" is  more a
devops term than something unique to Git.

I used to stage  websites (before devops was a term)  and then use rsync
to push them live, but it had nothing to do with git.

Thanks,

Andy