Fossil User Forum

FreeBSD Ports conversion to Fossil (mostly successful)
Login

FreeBSD Ports conversion to Fossil (mostly successful)

FreeBSD Ports conversion to Fossil (mostly successful)

(1.3) By Alastair Hogge (alastair) on 2025-06-24 08:23:56 edited from 1.2 [link] [source]

Hello,

I recently tried the Git to Fossil conversion of the FreeBSD Ports Tree:

$ time (git fast-export --all | fossil import --git ../../../fossil/ports/ports.fossil) 8037m56.25s real 7704m10.18s user 278m32.19s system

So ≅5.25 days to convert, and the Fossil repository is ≅2.8GiB; during the conversion process, the size of the repository increased to ≅6GiB.

The system I ran this on is a Ryzen 9 3950X, running FreeBSD-15-CURRENT, with both the Git, and Fossil repositories backed by tmpfs. I used cpuset, to configure Fossil to use one core, and the remaining 15 cores for the rest of the system.

After the conversion, the fossil process reported: fatal: encountered signed tag 5ad7189351dc2f643f8152cdeac97d686f16a2fe; use --signed-tags=<mode> to handle it Rebuilding repository meta-data... 100.0% complete... Vacuuming... ok project-id: 1cd459054c5dcb97d1d8179448f32ec8a83c9221 server-id: 9f9eee11b2dee4b3a310539558e53a922484ec9d

I do not know what this means.

Opening the Fossil repository was quick:

$ time (fossil open ./ports.fossil) 0m10.65s real 0m06.96s user 0m03.67s system

I played around with fossil ui (it is so refreshing to be using Fossil again over Git) and I noticed that the commit messages are missing the author. In FreeBSD, a non-committer might submit a patch to Bugzilla (becoming the author), and then a Committer will commit the patch, so in FreeBSD VCS logs, two people are recorded, one the author, the other, the Committer—I only see the Committer in the log. My plan is to eventually have a Fossil replication of the FreeBSD Ports Tree Git repository, for personal use, tho, I would like to make it publicly accessible too. I am not sure how feasible this is, how long sync times are, or even if it is at all possible.

(2) By Stephan Beal (stephan) on 2025-06-24 07:19:20 in reply to 1.1 [link] [source]

I do not know what this means.

We'll need to wait for someone who knows git to enlighten us on that part, but...

I am not sure how feasible this is, how long sync times are, or even if it is at all possible.

Fossil does not scale well to projects with tens- or hundreds of thousands of files. Every checkin has to record a list of every file in the checkin, and for large projects those lists are both huge (a meg or more each) and relatively slow to process1. You may have heard of that the pkgsrc repository once attempted to use fossil, and that was the (or one of the) significant discoveries from that experiment.


  1. ^ They effectively scale linearly on the number of files, if that means anything to you. e.g. if a manifest with N files takes X milliseconds and Y amount of RAM to process, processing a manifest with 5*N files will take around 5*X as long and require about 5*Y as much memory.

(3.1) By Alastair Hogge (alastair) on 2025-06-24 11:53:27 edited from 3.0 in reply to 2 [link] [source]

Fossil does not scale well to projects with tens- or hundreds of thousands of files. Every checkin has to record a list of every file in the checkin, and for large projects those lists are both huge (a meg or more each) and relatively slow to process.

$ find ${PORTSDIR} -type f -print | wc -l 167182

That is a lot of files.

You may have heard of that the pkgsrc repository once attempted to use fossil, and that was the (or one of the) significant discoveries from that experiment.

I remember the NetBSD pkgsrc expriment. I also remember a NetBSD committer/user from #tendra on IRC was looking into this many many years ago.

(4) By Stephan Beal (stephan) on 2025-06-24 08:46:40 in reply to 3.0 [link] [source]

That is a lot of files.

Just out of curiosity, what does this say:

fossil artifact tip -R that-repo | wc

e.g. fossil's own repo says:

$ f artifact tip | wc 
    980    2958   83882

(6.2) By Alastair Hogge (alastair) on 2025-06-24 12:25:49 edited from 6.1 in reply to 4 [link] [source]

Just out of curiosity, what does this say:

fossil artifact tip -R that-repo | wc

$ fossil artifact tip -R ports.fossil | wc 165712 662838 17761094

Also, $ time (fossil dbstat -R ports.fossil) repository-size: 3,059,585,024 bytes artifact-count: 3,343,805 (stored as 725,756 full text and 2,618,049 deltas) artifact-sizes: 2,642,228 average, 23,431,265 max, 8,835,094,373,531 total compression-ratio: 2887:1 check-ins: 732,386 files: 452,277 across all branches wiki-pages: 0 (0 changes) tickets: 0 (0 changes) events: 0 tag-changes: 162 latest-change: 2025-06-18 05:44:50 - about 6 days ago project-age: 11,265 days or approximately 30.84 years. project-id: 1cd459054c5dcb97d1d8179448f32ec8a83c9221 schema-version: 2015-01-24 fossil-version: 2025-04-30 16:57:32 [1205ec86cb] [2.26] (clang-19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2)) sqlite-version: 2025-04-30 14:37:00 [20abf1ec10] (3.50.0) database-stats: 746,969 pages, 4096 bytes/pg, 0 free pages, UTF-8, delete mode 0m01.17s real 0m00.68s user 0m00.49s system $ time (fossil status) repository: /tmp/ports/ports.fossil local-root: /tmp/ports/ports/ config-db: /home/agh/.fossil checkout: 1770de579e2973acf7b90ab5aacef6c1271a26ca 2025-06-18 05:44:50 UTC parent: 542da823673c706678c420cc3b478a9c1c3c8a0e 2025-06-18 05:09:51 UTC tags: trunk, origin/2014Q3, origin/2015Q2, origin/2015Q4, origin/2016Q2, origin/2016Q4, origin/2017Q2, origin/2018Q1, origin/2018Q3, origin/2019Q1, origin/2019Q3, origin/2020Q1, origin/2020Q3, origin/2021Q3, origin/2022Q1, origin/2022Q3, origin/2023Q2, main comment: dns/openresolv: update to 3.16.5 Changes: https://github.com/NetworkConfiguration/openresolv/releases/tag/v3.16.5 Changes: https://github.com/NetworkConfiguration/openresolv/compare/v3.14.0...v3.16.5 (user: driesm@FreeBSD.org) 0m00.41s real 0m00.13s user 0m00.27s system

The newlines in the commit log/comment get dropped above, the same with fossil ui.

A one line change: $ time (fossil ci -m "Very small test") New_Version: d35bbf4ee0b6fbb68fd4c1505845f19ca3741008db1d0ba8caec6b9ca001c2cd 0m11.36s real 0m09.01s user 0m02.33s system

I am happy to experiment, gather data, further if there are suggestions. I will also look at the src repository in some time.

(8) By Stephan Beal (stephan) on 2025-06-24 12:44:14 in reply to 6.2 [link] [source]

compression-ratio: 2887:1

Ratios of 50:1, even 100:1, are not terribly uncommon, but 1000+:1 is amazing.

time (fossil status)

"status" isn't really affected by the manifest size because it's comparing the checkout db's state to the filesystem. Where you'll probably see the biggest slowdowns is checkins. And a rebuild would take ages.

The db size and the number of checkins are not truly significant factors in scaling-related slowdowns. Parsing and traversing of the huge manifests are (AFAIK/IMO) the single biggest factors. Commands/pages which don't have to do that won't slow down noticeably for day-to-day operations.

I am happy to experiment, gather data, further if there are suggestions.

None from me, was just curious about how many files you had.

(9) By Stephan Beal (stephan) on 2025-06-24 13:01:51 in reply to 6.2 [link] [source]

By the way:

artifact-sizes: 2,642,228 average, 23,431,265 max, 8,835,094,373,531 total

Running fossil deconstruct on that repo would probably cause some grief.

(10) By Alastair Hogge (alastair) on 2025-06-24 14:17:12 in reply to 9 [link] [source]

Running fossil deconstruct on that repo would probably cause some grief.

64GiB of system memory, 104GiB of swap-space, 4% progress of the deconstruction, the kernel killed the host; I only saw 47GiB of active memory reported by top; the host could not even shutdown properly. I could try again on the AMD EPYC with 128GiB system memory, however, it is a slower Zen2 than the Ryzen, and I am not keen on bring that host down the same way.

(7) By Konstantin Khomutov (kostix) on 2025-06-24 12:24:13 in reply to 2 [link] [source]

After the conversion, the fossil process reported:
fatal: encountered signed tag 5ad7189351dc2f643f8152cdeac97d686f16a2fe; use --signed-tags=<mode> to handle it
<…>
I do not know what this means.

We'll need to wait for someone who knows git to enlighten us on that part, but...

Git has three types of tag:

  • A simple tag is just a name pointing at a commit; it is like a Git branch but cannot move.
  • An annotated tag is a name pointing at an object resembling a commit object in its structure — as it includes the tagger's identity, a tag date and the hash name of an object (usually a commit but not necessarily) which is tagged.
  • A signed tag is an annotated tag whose message includes an ASCII-armoured PGP signature calculated over that part of the tag object which does not incude the signature itself.

The error message supposedly comes from git fast-export whish is used to produce "a portable data stream" out of a Git repository, and the documentation on --signed-tags is sort-of self-explanatory:

--signed-tags=(verbatim|warn|warn-strip|strip|abort)
Specify how to handle signed tags. Since any transformation after the export can change the tag names (which can also happen when excluding revisions) the signatures will not match.

When asking to abort (which is the default), this program will die when encountering a signed tag. With strip, the tags will silently be made unsigned, with warn-strip they will be made unsigned but a warning will be displayed, with verbatim, they will be silently exported and with warn, they will be exported, but you will see a warning.

Basically, when one exports a signed tag it's expected that the exported data will somehow be transformed in a way that any authentication provided by the signature gets lost because such authentication is based on verifiable chaining of various bits of data based on the cryptographic hashes calculated over the pieces of these data. If the pieces get transformed, the hashes change.

Looks like what to do depends on whether it could be useful to keep the original signatures or not, so the choice appears to be between anything but abort which appears to be used.

(5.1) By Eduardo on 2025-06-24 10:41:09 edited from 5.0 in reply to 1.3 [link] [source]

Deleted

(11) By pjm (PhilMaker) on 2025-07-02 00:46:52 in reply to 1.3 [source]

Alistair,

This is just a note/experiment which is related to yours but just keeping only the current and future state of the tracked ports collection. Why well thats sometimes what people want to do.

One line summary: its plausible to track current using either a pile of fossils or one big one for something as large a FreeBSD ports.

Its just a data point for someone who wants to:

  1. Keep track of some large open source collection (e.g. OpenBSD ports, BLT, etc).
  2. Do local updates/changes or at least have a manifest and be append only for them (whence fossil)
  3. And review any changes from the source before pulling them into the local collection.
  4. History prior to the import remaining on the source site is acceptable.

So rough numbers on a normal laptop on a real filesystem are:

ports.tar.xz from FreeBSD.org is around 43M

Using 1 fossil per port component it took 78m to build the fossil repos.This was generated using:

for f in */*; do fossil init $f.fossil ; fossil open $f.fossil --workdir $f -f ; (cd $f; fossil add .; echo "a" | fossil commit -m "In the beginning"); done ;

Alternately using 1 fossil for the entire tree == 265M repo

    time fossil add . -f == 3.3s
    time fossil commit -m "XX" == 26m
    change a single file
    time fossil commit -m "Mod" == 50s