RFE: Fossil submodule feature

(1) By Warren Young (wyetr) on 2018-09-07 19:40:01 [source]

Proposal

Fossil should have a feature similar to Git submodules except with all of the sharp corners knocked off.

Usage

Let's start with a concrete example of how this feature should work in practice:

  $ mkdir -p ~/src/mainproj/trunk
  $ cd ~/src/mainproj/trunk
  $ fossil open ~/museum/mainproj.fossil        # trunk == v2 in development
  $ mkdir lib
  $ cd lib
  $ fossil open --subrepo ~/museum/lib.fossil 
  $ mkdir ../../v1                              # Also check out v1 stable branch.
  $ cd ../../v1
  $ fossil open ~/museum/mainproj.fossil v1     # Doesn't open lib because that...
  $ mkdir lib                                   # ...subrepo is not yet associated...
  $ cd lib                                      # ...with the v1 branch.
  $ fossil open --subrepo ~/museum/lib.fossil 2017-04-01    # old stable version

Contrast with Nested Opens

We already have fossil open --nested, so why do we need fossil open --subrepo?

The primary difference is that --nested doesn't create any tie between the nested repo and the repo in its parent directory. The --subrepo option associates the branch checked out in the parent directory with any version string Fossil understands that is valid within the subrepo, using the same rules open currently uses for its ?VERSION? parameter. This allows the parent repo to either:

track trunk in the subrepo
track some named branch in the subrepo
be tied to some tag in the subrepo
be tied to some particular checkin ID in the subrepo
be tied to the checkin in the subrepo nearest to a given timestamp

The third option may be particularly interesting to some Fossil users: a repo holding helper libraries might have a stable tag that gets updated each time the libraries are considered stable. Updating a parent repo after the tag moves will update to the latest stable version of the libraries as well.

The --subrepo option has several advantages over --nested:

fossil up will update subrepos as well.
fossil ci will check for changes in subrepos in depth-first order; see below.
fossil open will automatically open subrepos as well, cloning them first if necessary; see below.

Checkin Workflow

If the contents of lib/* are modified in the above scenario, the behavior of fossil ci at the parent level depends on which checkout directory you did it in:

If you change ~/src/mainproj/trunk/lib/*, a checkin will become the tip of trunk in the lib repo, since we didn't specify a version when opening it as a subrepo.
If you change ~/src/mainproj/v1/lib/* instead, you'll create a fork in the lib subrepo rooted at the oldest checkin made to it on April Fool's Day 2017.

If you cd into the subrepo's checkout directory, Fossil commands given there will affect only the subrepo, just as with --nested. This allows you to work on the subproject alone for a while, to temporarily switch it to another VERSION, bisect it, etc.

If you create a branch from a one that already has one or more associated subrepos, Fossil clones the configuration to the new branch:

  # fossil ci --branch feature-branch        # new branch gets lib subrepo association

Parent Repo Open Workflow

Having associated one or more branches in the parent repo with one or more subrepos, opening the parent repo will also create the subdirectory for each subrepo and open the specified VERSION there.

If there is already a local clone of the subrepo on the local machine, based on the URL stored in the parent repo's Fossil DB, it opens from that clone, else it makes a clone somewhere sensible. I'd suggest making the clone alongside the parent project's clone, named after the subrepo.

Update and Sync Workflow

In normal operation, update and sync is done from the parent repo level, and is transparent to the user, except for the extra time it takes.

To change a subrepo to another version, with the association with the parent repo's current checked-out branch and the subrepo's version being updated:

  $ fossil up --subrepo 2017-03-15 lib      # roll back to earlier stable version

Without the --subrepo option, it just updates the subrepo directory without changing the association: saying fossil up at the parent level will roll it forward to either trunk or 2017-04-01, depending on which parent checkout directory we're in, under the usage scenario at the top of this proposal.

Updating at the parent level may change the subrepo contents:

  $ cd ~/src/mainproj/trunk
  $ fossil up feature-branch                # changes lib accordingly

To change the clone URL for a subrepo:

  $ cd subrepo ; fossil sync https://example.com/subrepo

Rollback Behavior

The initial version of this feature does not have to attempt to handle inter-repo rollbacks. It can be implemented with subprocesses, much as fossil all currently does.

If you have changes in both the parent repo and in one of it subrepos, then give a checkin command at the parent level:

A failure down in a submodule stops processing. The changes to neither project are checked in: the subrepo because it was rolled back by the failure, and the parent because the child fossil instance died with a nonzero exit status due to the failure at the subrepo level. (If it were otherwise, the parent repo could have a checkin that depends on a change at the subrepo level that did not get checked in.
A failure at the parent level results in the checkin to the leaf being kept, with the parent repo changes not yet checked in.

Inter-repo rollbacks might be neat, but it's an edge case that doesn't need to be solved any time soon.

DB Requirements

Fossil needs to remember only a few things per branch to support this:

The list of subrepos.
The VERSION string specified for each subrepo, if any.
The relative path of each subrepo within the parent.
The clone URL for each subrepo, with which it can look up the local clone file using the underpinnings of fossil all .

I've thought about whether this should be stored in the blockchain, and I think the answer is, "No."

The clone URLs definitely should not be stored in the blockchain: repo URLs may change, if only because a site stops allowing HTTP clones, so if the URL were in the blockchain, moving the repo would prevent you from checking out old versions of the parent project.

That then calls into question whether any of the rest should be in the blockchain, since now we've implicitly got a new DB table mapping branch names to subrepo clone URLs. Why not store the rest of it there, too? The only reason I can think of is that you might want subrepos to be tied to individual checkins, in the same way that branch names are currently handled.

That seems like a bogus argument to me, though: I'm happy associating a subrepo with a branch. I see no reason to make it work with individual checkins. The set of subrepos should not be changing that rapidly. If you need a particular checkin to have a different set of subrepos than its ancestor checkin, create a new branch for that checkin.

(2) By anonymous on 2018-09-07 23:16:12 in reply to 1 [link] [source]

What problem do you intend to solve with the submodules?

Submodules could be thought of as symbolic links and as such are more external to the repository. So it's more of a system concern how to tie in all the pieces. Perhaps it could just be scripted.

Currently, Fossil repo contains all the data needed to restore its historical state. If submodules are by themselves independent repos, then in such a case the super-repo's dependencies are spilling out. Since external/submodule repos are not directly under control of the super-repo, the respective repos potentially could have differing remote-urls, usernames, schema versions or gone missing altogether.

Having Fossil manage these external concerns seems to bring in more complications than benefits.

(3) By Warren Young (wyoung) on 2018-09-08 00:14:05 in reply to 2 [link] [source]

What problem do you intend to solve with the submodules?

The same ones Git people use submodules for. :)

The main place I use nested opens is in a large repository that needs to share a small subset of its content among each open branch, with each branch seeing the same content. That rules out just checking that subset into the same tree, since then the small subset could have version drift between the branches.

As it stands, every time I open a new checkout, such as to test out a temporary feature branch, I have to remember to create the subdir for the subrepo and check it out too, else the main project won't build. It's a tedious annoyance; read: automatable.

Another common use would be to have a subrepo holding libraries shared across many projects, as I implied by my examples above. This is a less compelling case than my previous one, since you can just install all of the libraries' headers, shared objects, etc. in /usr/local and then link to them from everywhere. Still, it might be nice to vi lib/somehelper/foo.c, build it, test it against one project, then check the whole thing in with a single command. If you share libraries via /usr/local, you have to reinstall the library for each test, which means you're potentially overwriting good versions with bad.

I've also frequently seen Git submodules used to make one Git-based project depend on someone else's Git-based project. Without submodules, you must ask your users to clone from multiple locations, then build them in the proper order, then install them before your main project will build. With submodules, you say "clone && configure && make". Fossil pulls it all down into a known directory structure, where your configure and make scripts can find everything.

If submodules are by themselves independent repos, then in such a case the super-repo's dependencies are spilling out.

External dependencies are a fact of almost all development. Even Fossil has external dependencies, and it goes to heroic efforts to avoid them.

the respective repos potentially could have differing remote-urls

How?

I'll grant that external projects may move about from time to time, but I addressed that in my proposal above.

If you're saying that each end user of my project might somehow see the external dependency's repo at a different URL than I do, then that's on me as the project maintainer: I need to point to a stable source.

I considered something like that while writing the proposal. I might have a project where there are 2+ independent Fossil repos under my direct control, one dependent upon the other(s), so for speed I might prefer to use a LAN URL to set up the subrepo relationship — e.g. http://192.168.0.42/subrepo instead of https://example.com/subrepo — but that's no good. Since the subrepo relationship is part of the public repo, I need to clone both repos from my public server so that the subrepo relationship uses Internet-facing URLs.

...usernames...

The clone URL is part of the local Fossil configuration in ~/.fossil. Each user has their own clone URL.

There's a valuable thought in there, though: the DB table in the parent repo should have the user names stripped from it, and Fossil should do URL matching without considering the user name. If I clone something as https://wyoung@example.com/subrepo, my local Fossil knows my user name and may optionally know my password, but if I then sync that parent repo up to a public Internet server, it should only declare that the subrepo source is https://example.com/subrepo.

When someone clones that parent project, Fossil will do user name guessing just as it does in all other conditions.

There's a complication where the subrepo needs to have a different user name than the parent repo for some users. I propose the following fallback procedure:

If you clone the parent project anonymously, clone the subrepos anonymously, too. If you give a user name with the parent project's repo URL, use that name for the subrepos, too.
If the parent clone succeeds but a subrepo clone fails, try an anonymous clone. It might be a read-only dependency, especially if it's a third-party dependency.
If that fails, it's a private subrepo, so prompt for an alternate user name.
If that fails, assume the password the user typed in response to step 1 is incorrect and re-prompt for the password. On failure, goto 4, not to 3, since the user name is in cleartext, so presumably it wasn't mistyped.

Most of the time, this should give the correct result in the first step or two. The only time you need to get down to step 3 is if you're cloning a repo with subrepos and you need to use two+ different names, perhaps because you're depending on a repo managed by someone who won't give you your preferred user name. That'll be a rare occurrence.

If you're in that situation but an anonymous clone works, so you get out early at step 2, you can later upgrade it to a "nymous" clone with a fossil sync command, just as with normal clones today. Even then, you might get away with an anonymous subrepo clone for quite some time, if you happen to not need to modify the subrepo very often.

...gone missing altogether.

That's no different from the alternative where you depend on a third-party tarball.

If your argument is that you should just host the third-party source in your main repo, then you don't need either third-party tarballs or submodules at all.

(4) By skywalk on 2018-09-08 00:27:55 in reply to 2 [link] [source]

Fossil was created to support SQLite, a smaller project than an os or other vastly complex control system.
Linking many Fossil repos by version seems a robust way to attack the scaling issue of 1 massive Fossil repo.