Fossil User Forum

Splitting a repository
Login

Splitting a repository

Splitting a repository

(1) By Michael Durian (durian) on 2024-07-02 18:42:31 [link] [source]

In a previous post, I asked about splitting a repository and was informed there wasn't a built-in command to do it. I was also advised to look into reposurgeon, to see if might provide a way to do it. I did check reposurgeon, but it will only split at a given check-in, it won't split based on directories. So now I'm getting ready to split the repository manually.

I'd like to reconstruct many of the existing branches and tags. Does anyone have any advice to offer on the best way to reproduce them? For branches, I guess that means determining which check-in originated the branch and recreating it at that change in the parent branch (which must also be determined). Tags seem more straightforward, though I'd want to make sure I recreate them in the correct branch.

Thanks, mike

(2) By Warren Young (wyoung) on 2024-07-03 02:42:48 in reply to 1 [link] [source]

it will only split at a given check-in

Nonsense. If reposurgeon allows keeping the history of a single file, you can certainly use it to keep the history of a single directory. Delete all paths you don't want to keep to keep them from appearing in the output.

There's an example of this in the program's Quick Start: 1..$ delete path /documents/.*.pdf/

(3) By Michael Durian (durian) on 2024-07-03 04:17:24 in reply to 2 [link] [source]

Warren,

Thank you for the reply. When I last looked into this, I saw there was an explicit 'reposurgeon split' command and when I determined it did not do what I needed, I did not investigate further. I will do so now, with your advice.

However, I do have concerns about reposurgeon in general. I believe it uses the fossil git import/export feature to work its magic. When I first transitioned my git repository to fossil, I encountered problems that had lasting effects, so I'm a bit hesitant to use that feature again.

Maybe it won't be a continuing problem any, now that I've moved past the co-mingled branches problem. I will be going from fossil into reposurgeon (via git format) this time, instead of git into fossil. That might be a smoother transition. Still, if anyone has advice on how I could do the split manually, I'd be interested in hearing it, as an alternative.

mike

(4) By hanche on 2024-07-03 09:33:46 in reply to 3 [source]

However, I do have concerns about reposurgeon in general. I believe it uses the fossil git import/export feature to work its magic.

What makes you think so? I find this in the reposurgeon docs (my emphasis):

Fully supported systems (those for which reposurgeon can both read and write repositories and the support has been tested) include git, hg, bzr, brz, fossil, darcs, RCS, and SRC.

As far as I understand reposurgeon, it works by editing git-fast-import files. That is just a textual file format so named presumably because it was a git feature first.

So yes, you are of course limited to what that file format supports. If that is what you meant to say, I agree with your reservation. But I don't think it uses a git repository as an intermediate, which is how I interpreted your statement at first. But now I am not so sure.

(6) By Michael Durian (durian) on 2024-07-03 15:35:55 in reply to 4 [link] [source]

Hi hanche, Looking at vcs.go in the reposurgeon source, I find the following:

                {
                        // Styleflags may need tweaking for round-tripping
                        name:         "fossil",
                        subdirectory: "", // There's a special case in manages()
                        requires:     newStringSet("fossil"),
                        exporter:     "fossil export --git",
                        quieter:      "",
                        styleflags:   newOrderedStringSet(),
                        extensions:   newOrderedStringSet(),
                        initializer:  "fossil init .fossil && fossil open .fossil",
                        pathlister:   "", // fossil extras is the inverse of this
                        taglister:    "fossil tag list",
                        branchlister: "fossil branch list", // Should we list with --all? Unclear...
                        importer:     "fossil import --git",
                        checkout:     "",
                        viewer:       "", // fossil ui looks tempting but has no clean exit.
                        prenuke:      newOrderedStringSet(),
                        preserve:     newOrderedStringSet(),
                        authormap:    "",
                        ignorename:   ".fossil-settings/ignore-glob",
                        dfltignores:  "", // ignore-glob is empty by default
                        cookies:      nil,
                        project:      "https://fossil-scm.org/",
                        notes:        "",
                        idformat:     "%s",
                        flags:        ignGLOB | ignQUES | ignCARET | ignESC | ignGSTAR,
                },
It looks to me like it interfaces with fossil via fossil's git import/export feature. That's what I meant. I did not mean that it creates an intermediate git repository.

As I mentioned on one of my other posts I linked to, there appears to be a case where fossil will dereference a null pointer when doing a git import/export. I have no knowledge of the git exchange format, so I wasn't able to track down the problem any further. I don't know what was in my git repository that triggered the bug, but it has made me a bit leery of using reposurgeon since it appears to use the same git import/export feature.

Here's what I said in the linked post. That post is a couple years old now, so it might not be true any more:

Taking a quick look at import.c and export.c, I see a potential problem. export_mark() in export.c calls mark_name_from_rid() to retrieve a mark before writing it out. It does not check that return value against NULL before using it.

The comment for mark_name_from_rid() says NULL is returned if the rid does not have an associated UUID (i.e. is not valid). Determining why that might be the case is a bit beyond me.

mike

(5) By Warren Young (wyoung) on 2024-07-03 09:38:22 in reply to 3 [link] [source]

Okay, new plan: fossil deconstruct your repo, then throw grep/sed/awk/perl/holy-hand-grenadeā€¦ at the manifests until thine foes be vanquished. Reconstruct it from the battle debris and pray to $DEITY that it comes back together properly.

(7) By Michael Durian (durian) on 2024-07-03 15:37:20 in reply to 5 [link] [source]

Thanks Warren,

I will investigate fossil deconstruct, too.

mike

(8) By Michael Durian (durian) on 2024-07-05 18:55:32 in reply to 7 [link] [source]

To follow-up, I gave reposurgeon a try, but it panics and crashes trying to read the repository. Whatever problem was introduced when I first imported the git repository must still be present.

With the reposurgeon option eliminated, I'll need to do it the hard way. I think I'll start with a new, empty repository and re-introduce key events from the original repository (omitting anythings from the directories I want to exclude). I'll use the root: and start: tag modifiers to reproduce the start of branches. I'll grep the output of fossil timeline for MERGE to locate when merges occurred. Reproducing tags should be straight-forward.

I haven't decided yet if I want to try to reproduce every individual check-in. I might just aim for the key events mentioned above. I suppose it depends on how much I can automate and how much I need to do manually.

(9) By Andy Bradford (andybradford) on 2024-07-05 21:21:06 in reply to 8 [link] [source]

> I think I'll  start with a new, empty repository  and re-introduce key
> events  from  the original  repository  (omitting  anythings from  the
> directories I want to exclude).

Keep  in  mind that  both  "fossil  init"  and  "fossil commit"  have  a
--date-override argument that you can  use to preserve the timestamps of
the commits. So if/when you start  with a new, empty repository, you can
make the  "initial empty commit"  have a  date that predates  your first
actual commit, e.g.:

fossil new --date-override "2019-12-31 00:00:00" project.fossil

And when committing:

fossil commit --date-override "2019-12-31 03:14:15" -m "real commit"

Andy

(10) By Michael Durian (durian) on 2024-07-05 23:38:58 in reply to 9 [link] [source]

Hi Andy, Thanks. That will be useful.

The fossil timeline command can output a phase value. I can't find where this is documented. It looks like interesting events like tagging and branching happen in check-ins with phase LEAF. But there are also MERGE and BRANCH phases. I'm wondering if I can streamline things by only looking at check-in artifacts with these phases. Is this a valid approach (if I'm not trying to capture every individual file change), or do need to inspect everything?