fossil repos: lump or split?

(1) By matt w. (maphew) on 2024-04-30 21:09:51 [source]

How do folks like to manage their fossil projects? (Particularly small to mid-size ones) In general, is lumping or splitting your preferred default and why/how?

As my use of fossil for this, that, and the other thing is slowly growing I find myself wondering how to best manage the plethora of .fossil files I'm incubating. I'm asking myself if it's a good long term habit to lay a new one for each new idea, that may or may not prove out. Perhaps it's better to have one or few master repos that contain a lot of projects.

I fully expect the answer is "it depends" on a whole host of things that are idosyncratic to each person. That said, if you have the time and inclination to share I'm interested in what your experience is and what works for you.

thanks!

(2) By Stephan Beal (stephan) on 2024-04-30 21:19:41 in reply to 1 [link] [source]

I'm asking myself if it's a good long term habit to lay a new one for each new idea, that may or may not prove out.

FWIW, my current collection has very close to 70 repositories and the maintenance involved is about the same as it would be for a handful of "mono-repos." Every independent set of files has its own repo, without exception.

As you say, it's largely a matter of personal taste, and there's no accounting for personal taste.

(3) By Andy Bradford (andybradford) on 2024-05-01 05:11:44 in reply to 1 [link] [source]

>  I'm asking myself if it's a good long term habit to lay a new one for
> each new idea, that may or may not prove out.

That's primarily how  I organize them. I have one  Fossil file per group
of related files. Sometimes that means  just one file per Fossil. Notice
here the  breakdown. In  this data  set, the first  number is  the total
number of files in a project, and the second number is how many projects
have that number of files in aggregate:

$ fossil all dbstat | grep files: | awk '{ a[$2]++ } END { for (b in a) printf("%7s\t%3d\n", b, a[b]) }' | sort -k 2nr
      1  35
      2  17
      3   5
      4   3
      0   2
      9   1
     11   1
     13   1
     37   1
    159   1
    185   1
    243   1
    450   1
    788   1
    822   1
  1,274   1
  1,321   1
  1,392   1
  1,875   1
  1,876   1
528,030   1

From this you  can see that of  the 78 or so Fossil  repositories that I
regularly use, the majority (35) of my Fossil files have 1 file in them.
The last one, with 528,030 files, is the NetBSD SRC repository.

At  one point  in  time I  did  have  more files  in  a single  "master"
repository, but  then I found  that I was  making changes and  the files
that were "grouped" within didn't  really make sense which made tracking
changes more challenging and I  ended up creating branches for unrelated
changes---if that makes any sense.

As you say, "it depends".

Andy

(4) By Doug (doug9forester) on 2024-05-01 15:18:56 in reply to 1 [link] [source]

I use a separate subfolder for my "little" projects and one repo for all of them. I have individual makefiles for each project and one outside that runs them all. I can build individual projects or the whole lot. I have the whole thing in ~/Documents/AutoHotkey_fossil/trunk and my applications run the code from ../trunk/dss/build, for example. For major changes to a project, I make a test folder one level up and check out into it, make and test changes, then update the trunk. Been using it for a couple of years and love it. I set up a mintty.exe shortcut under Cygwin.

(5) By Warren Young (wyoung) on 2024-05-01 16:46:15 in reply to 1 [link] [source]

In general, is lumping or splitting your preferred default and why/how?

Yes, for a reason sufficiently simple that I'm willing to treat it as an axiom of software development: if two different directories contain files with independent lifetimes, they need to live in independent repos.

Corrollary: The only time two or more files should live in a single repo is when they need to be versioned in lockstep.

This does not always occur here, but that's because it's an ideal I strive to uphold, not a law I'm bound by. The only penalties I pay for my failure to achieve the ideal are personal.

The primary exception is "junk drawer" repos, like the one for the locally-written contents of ~/bin. Each file has an independent lifetime, for the most part, since few of these programs interoperate, or even, for that matter, cooperate. Yet, I put them all together because I want them checked out together and updated together, in a single step.

That wish does not apply between separate code repos. If you want "update together," Fossil gives us "fossil all push/pull/sync", though not "fossil all update". You do still have to "fossil up" in all repos, but that's habit by now. Combined with a modern shell configured to flag changes in repo checkout directories, I even get a visual warning when I don't update after a cd.

how to best manage

If the concern is keeping their skins and such in sync, Fossil gives you the "fossil conf export/import" pair to copy configuration changes from one to the other.

If you're trying to keep user table changes in sync, I suggest setting up login groups.

Perhaps it's better to have one or few master repos

In my extremely arrogant and personal opinion, monorepos suck.

The fact that there's a word for what I rail against should tell you my opinion isn't universally held, however. Go do a search if you want rationalizations for the opposing opinion.

(6) By matt w. (maphew) on 2024-05-01 22:30:02 in reply to 5 [link] [source]

Thank you for your thoughts everyone.

Combining those with my hot new discovery yesterday of fossil ui / courtesy of Stupid Fossil Tricks thread have me continuing down the path I started: manyrepos instead of monorepo. Only now my shoulders are lighter, it doesn't look dark ahead afterall. ;-)

(7) By patmaddox on 2024-05-06 18:29:09 in reply to 1 [link] [source]

I am a big fan of monorepo for my personal stuff. I have 200 github repos - and that doesn't count however many hundreds of others I made locally and never published. There's overhead in keeping track of all of them - and critically, no good way of getting an overview of everything.

With Fossil, I stick any non-private stuff in a single repo, and have a separate repo for private stuff. I don't have to think of which repo something goes in, whether I need to create a repo, clone an existing one, etc. I make the file I want and commit. I've got config files, example files, scripts, source code for personal projects, and blog posts / articles. I frequently link to these from mailing lists, discussion forums, and chat rooms. I was having dinner with some friends the other night, we were talking about some configuration settings and I was able to pull it up on my website. I generally want to make my work as shareable as possible, and keeping it in a single repo published as a Fossil website makes that really easy to do.

Even if I did use multiple repos, I would still want a single higher level repo that treats the other repos as dependencies. In fact, my monorepo has some tooling to interact with git repos from open source projects. That approach is inspired by freebsd ports which is pretty much a bunch of makefiles, but fetches tarballs / git repos as dependencies.

I really like that I can view a single Fossil timeline to get an overview of the work I've been doing. I also tag commits to create timelines focused around a particular project or theme.

(8) By Warren Young (wyoung) on 2024-05-07 09:16:24 in reply to 7 [link] [source]

no good way of getting an overview of everything.

A single timeline for the --repolist and fossil ui / modes might be fun to have, though it feels more like a stunt than a necessity to daily operation. The only practical case I can think of where I might want to see all timelines merged is if I'm trying to find out where I've been idle, not working on any of my repos.¹

The fact is, when I'm looking at a timeline, I'm more concerned with what has been going on in that single project than what is going on in other projects, even when they're related in some manner.

The closest thing I have to a monorepo is my company's main work product, which is monolithic largely because it started in CVS, where the single $CVSROOT environment variable made having more than one repo a hassle. That translated into our migration to Subversion, which in principle would've freed us from this single-root tyranny, but the tools didn't exist at the time of conversion to dice it up. Migrating from there to Fossil, we were stuck with this decades-old decision; this occurred long before ESR created reposurgeon, and although we could use that now at this late date, it feels too risky.

My company does have other repos, created after the migration from Subversion, and I can't imagine any argument good enough to make me want to merge them into the main one.

I don't have to think of which repo something goes in

This is the notion of "junk drawer repos" I brought up above. Much as every kitchen needs a drawer where all the random stuff goes, any long-term Fossil user is expected to end up with at least one of these.

Long-term users are also expected to end up with coherent projects which either have zero strong ties to any other, or at least communicate along well-defined boundaries. A good example is a C library installed to /usr/local/{include,lib} to produce an API/ABI that other programs can link to without being involved with the commits that led to the latest "sudo make install" call. Before that point, testing is done internally to that project via "make test" and such.

You bring up FreeBSD, which is a borderline case. In principle, it's a good thing that a change to libc can propagate instantly to all of the /bin utilities — to pick an example — but I'd still rather do that in a controlled fashion via "make install" than cause a complete-tree rebuild merely because someone inadvertently touched a central header like stdio.h while doing other things.

Even FreeBSD has multiple core repos, and as far as I can tell, they haven't vendored either GCC or Clang even though…

$ pdp11 /path/to/unix/v7/simh.conf
…
login: dmr
$ wc -l /usr/src/cmd/cc.c
467 cc.c

Things were different back then!

If even FreeBSD's main /src repo doesn't include core tools like the C compiler, you have to ask how much value monorepos have in practice.

I frequently link to these from mailing lists, discussion forums, and chat rooms.

As do I for the manyrepos backing my public web site. This is why I have a complicated nginx-based setup which I documented for the Fossil project: it lets me tie the dozen Fossil repos backing it into a single URL scheme, interspersed among the static portions, which are stored in a thirteenth repo, separate from the others.

If you want to advocate for monorepos, tell me why there needs to be a unified timeline mixing commits to my MikroTik Solutions and PiDP-8/I repos. There is zero common basis between them. The only place I can conceive of where those two repos' contents come together is at my Internet gateway router, where the PiDP-8/I may create network traffic that crosses it. They are otherwise utterly independent.

Even then, the MikroTik Solutions site is primarily a blogging and software distribution platform. I've published elements of my gateway router's configuration there, but only the public parts. The full configuration contains private details held in still another repo, and it is that repo's content that is most likely to affect the operation of my PiDP-8/I, such as by the firewall rules it applies. Even then, I still don't need a single timeline to correlate why my PiDP-8/I suddenly can't talk to the Internet, or whatever.

I generally want to make my work as shareable as possible

I have much the same sentiment, but I think it's a fair guess that I am a union set size of ONE who cares about all 13 repos backing my public web site. If even I don't care to see all timelines at once, then who else would?

I expect it's a rare case for any single other person on the planet to care about more than a few of them, and even then, when they visit one of my public Fossil timelines, they care to see what I've been doing on that single project, not everything I've been up to at once.

I really like that I can view a single Fossil timeline to get an overview of the work I've been doing.

Me, too, but only within a single project.

^{^} And if that is the case, then I was idle for a reason. To the extent that I need to justify gaps in my global timeline due to spending time playing games or shopping for groceries or sleeping or visiting people, Fossil is not the tool for doing it!

(9) By patmaddox on 2024-05-07 18:06:07 in reply to 8 [link] [source]

Even FreeBSD has multiple core repos

Yes, three: src (the operating system), ports (third-party software), and doc (documentation).

Each repo represents a broad-ranging "product" I suppose you could say... and I think each one is more monorepo-like than not.

Perhaps it's less useful to think about monorepo-vs-manyrepo, and instead think about properties of repos - similar to test desiderata rather than arguing over whether a particular test is in fact a unit test.

and as far as I can tell, they haven't vendored either GCC or Clang even though ... If even FreeBSD's main /src repo doesn't include core tools like the C compiler, you have to ask how much value monorepos have in practice

clang is vendored.

That is a key property of BSD systems: kernel and world are part of the same source tree and are released in lockstep. Check out the src repo, make buildworld && make buildkernel and you have a working operating system.

It's one of the things that attracted me to FreeBSD in the first place. With two repos (src and ports), I have an index to nearly every line of code that runs on my system (wifi and GPU driver blobs being the main exception). If it's part of the operating system, I look at src. If it's third-party software, I look at ports, from which I can find the exact tarball versions used to build that third-party software. I value that cognitive simplicity of only having two entry points to every line of code that runs on my computer. Well three, when you include my monorepo ;)

Thus I came to the opposite conclusion as you: if the various BSDs work well as monorepos, then perhaps that approach would work well for me too. It has turned out to be true in practice - I've been more creative in the limited time I have available, which is ultimately what I care about.

I have a complicated nginx-based setup

I have a simple haproxy-based setup.

I am aiming for simplicity in my setup, and minimal context required to operate it. I may not work on a part of it for weeks or months at a time - so I want to minimize the time required to re-load context. I want to write the code, back it up, and ship it. fossil commit handles the last two for me.

If you want to advocate for monorepos, tell me why there needs to be a unified timeline mixing commits to my MikroTik Solutions and PiDP-8/I repos.

They're your repos man, rock on :)

As for me: I've already stated that I value having a single repo for my personal work, and timeline to provide context.

Some other things I like that result from this setup:

I don't have to track projects and tasks in a separate tool - open leaves represent my WIP, and branch/tag wiki pages work well for documentation and task tracking.
Tags let me create multiple timelines around themes of interest, such as my freebsd tags which pull together examples, config files, notes, articles, and utilities. I don't necessarily know what those themes will be upfront, and so fossil amend <commit> --tag <tagname> comes in handy.
Fossil's built-in Markdown support makes it trivial to publish articles, no static site generator required. I can publish complete source code and build scripts alongside the articles.
I don't have to classify ideas upfront - I can capture and evolve them. An idea might begin life as an experiment and then grow to become a code project or an article.

I think it's a fair guess that I am a union set size of ONE who cares about all 13 repos backing my public web site. If even I don't care to see all timelines at once, then who else would?

Fair point - equally as fair as https://patmaddox.com primarily serving the interests of Pat Maddox :) who happens to really, really like having a single over-arching timeline that can be subdivided into more focused timelines. Couple that with fossil clone https://patmaddox.com to get the bulk of source code and content that I care about, and I'm a happy camper.

When they visit one of my public Fossil timelines, they care to see what I've been doing on that single project

Sure, and for the person who somehow happens to be interested in my password-trainer tool, they can view the history because it's all in one subdir. I can also direct them to specific tags if need be.

(10) By ArchieT on 2024-05-08 21:49:44 in reply to 7 [link] [source]

"no good way of getting an overview of everything."

fossil all ui

that gives me nifty clickable list of all the repos on my machine whereever they are, and as importantly, the columns are sortable and I sort by recent changes so I can see what I'd been working on lately, or I need to get back to. Effectively, it is a "Timeline" for all my work on all my projects.

I open a repo for every project idea I have and begin throwing notes in Forums, sometimes open Tickets, sometimes digest my thoughts into a Wiki pages with some Pikchr diagrams... sometimes before I write any code or CAD files or PCB files.

(11) By Imran Sher Rafique (Imran786) on 2024-05-09 09:09:19 in reply to 8 [link] [source]

Long time lurker, but felt compelled to stick my nose in here. Excuse the interruption ...

The only practical case I can think of where I might want to see all timelines merged is if I'm trying to find out where I've been idle, not working on any of my repos

Like ergonomics, working styles are very subjective & individualistic. As someone who often juggles multiple projects at any moment, I found that keeping a daily logbook of what I had done that day (not a TODO list, strictly what was actually done) was a great help. It unburdened the mind, and allowed one to quickly grasp what was going on in a particular area, days or even weeks later.

And a great storage area (with context) for scratch notes which did not make it into my personal wiki.

The 1 bug bear I always had was the repetition between commit msgs & my logbook. So much so that I took to just dumping the commit msgs for that day into the logbook daily entry.

A fossil monorepo brings with it a unified timeline which basically becomes your daily logbook.

That's really addictive, if you like hacking on different things as your fancy takes you (guilty as charged)

(12) By Kees Nuyt (knu) on 2024-05-09 10:07:16 in reply to 10 [link] [source]

fossil all ui

... and if you like the terminal, lynx can be your web browser:

fossil settings --global web-browser lynx

(13) By Thomas Hess (luziferius) on 2024-05-10 09:55:03 in reply to 1 [link] [source]

Both.

My "proper" projects are in individual repositories. That is software with higher complexity, larger scope, multiple files, etc.

Then I have a bunch of one-file shell scripts or Python scripts that do one singular thing, like fix syntax errors in timestamp formats in SRT subtitles, wrap the CLI scanning tool to output post-processed PDFs, use ffmpeg to turn a network camera into a webcam as if it were a USB camera, etc.

Those live in a single repository named "Shell Tools".

(14) By brickviking on 2024-05-10 10:35:19 in reply to 12 [link] [source]

I tried that. Viewing diffs in lynx is... well, it didn't look good, let's put it that way. Most of the rest of the stuff actually appears to work well enough, but diffs didn't appear to look good at all, and I couldn't tell what lines went with what other bits on the screen.

It's a mess. But I guess that's what you get if you get a program that doesn't support javascript or even CSS.

Still, yes it'd be a great option for anything that doesn't involve viewing the actual patches that go along with the timeline.

Cheers, brickviking
(Post 29)