Splitting forum out of existing repository?
(1.1) By silasdb on 2021-02-18 18:53:32 edited from 1.0 [link] [source]
Hello.
I was wondering if it is possible to split forum (or ticket, or wiki, etc.) out of a "master" repository. I ask this because I was wondering if a small project could become so popular that its forum turns to be many times bigger than the project source code, so the developer can change to separate them. It seems that SQLite and Fossil projects have predicted it so you choose to separate forum and everything else.
I've been reading the wiki for a while and I suppose, if not impossible, it might be difficult.
IIUC, forum, wiki, tickets and check-ins artifacts types (edit) are independent of each other (or not?) so it might be possible with some SQL wizardry, but I was wondering if there is an easier way (if it is possible at all).
Thanks!
(2) By Stephan Beal (stephan) on 2021-02-18 20:28:14 in reply to 1.1 [link] [source]
Very briefly, one-handed from a tablet...
I ask this because I was wondering if a small project could become so popular that its forum turns to be many times bigger than the project source code, so the developer can change to separate them
This came up recently when deciding to set up libfossil's forum. i believe (untested) that it is possible, the only potential painful part being references in forum posts to non-forum content in that same repo (such links would point to non-existent content afterwards).
In short, you could harvest the IDs of all forum posts and edits from the timeline, export the full contents of those blobs, then stuff them into another repo db file.
so it might be possible with some SQL wizardry
The db cannot actually be sensibly manipulated via SQL:
https://fossil-scm.org/home/doc/trunk/www/fossil-is-not-relational.md
It would hypothetically only need 1-2 read-only queries on the origin repo, one(?) of which writes to an attached target repo. Afterwards, rebuild the target repo and it "should" work.
Hypothetically.
Doing the opposite, merging the two, would be much easier.
(3) By Stephan Beal (stephan) on 2021-02-18 21:22:21 in reply to 2 [link] [source]
It would hypothetically only need 1-2 read-only queries on the origin repo, one(?) of which writes to an attached target repo. Afterwards, rebuild the target repo and it "should" work.
That said, my initial attempts at doing so have failed, resulting in what appear to be incomplete blobs for reasons i don't yet understand. If i can ever get this to work i'll post how it's done. Worst case it might require first exporting the forum post as files then re-importing the, requiring an intermediary script.
(4) By Richard Hipp (drh) on 2021-02-18 21:30:43 in reply to 3 [link] [source]
Maybe we need a new Fossil command to help with this kind of thing - moving artifacts from one repository to another. Possible names:
- fossil transfer
- fossil extract
- Other ideas?
The new command would copy or move artifacts from one repository into another, creating the destination repository if it does not already exist. Command-line arguments would specify what content to move:
- Specific branches
- Specific check-ins
- Specific wiki pages
- All wiki pages
- Tickets meeting some parameter
- All tickets
- Specific forum threads
- All forum messages
- Chat content
(9) By jshoyer on 2021-02-19 00:04:31 in reply to 4 [source]
What I'd like is not the ability to copy or move artifacts from one repository into another but rather tools for doing things with two (or more) repositories at once.
- automatic detection of artifact-hash wiki links. The advantage over interwiki links would be the back-links that Fossil does so well.
- the ability to generate a timeline that spans two or more repositories.
- the ability to generate a finfo page that spans two or more repositories.
I appreciate that this type of thing can get complicated fast and I have not thought through the implementation details, but I'll try. I like the new ‘Not Relational/Intro to the data model’ document.
(10) By Stephan Beal (stephan) on 2021-02-19 00:22:12 in reply to 9 [link] [source]
What I'd like is not the ability to copy or move artifacts from one repository into another but rather tools for doing things with two (or more) repositories at once
Those are well outside of fossil's architectural reach. Its internals are very much designed for one app instance = one repository instance. Even the "all" command spawns new instances to deal with each repo.
Consider what happens if you open/attach 3 repository databases and then try to formulate queries which span all of them. Such queries would need to unambiguously address all of the same-named tables in each repository, leading to a huge spaghetti mess, if it could reasonably be made to work at all.
(16) By Kevin (KevinYouren) on 2021-02-19 03:17:08 in reply to 4 [link] [source]
It is already the case where the testing of the software repository can be separate from the software source repository.
I would suggest that "moving" of the forum would involve similar considerations.
Perhaps also re-instate "tickets" as a menu option?
(17) By Stephan Beal (stephan) on 2021-02-19 03:33:19 in reply to 16 [link] [source]
Perhaps also re-instate "tickets" as a menu option?
You are welcome to do so for your own repositories. In this project we prefer to promote the forum as the first place to report problems, falling back to a ticket if we need to track something long-term or have a need to specifically document it.
(18) By Dan Shearer (danshearer) on 2021-02-19 10:36:36 in reply to 4 [link] [source]
Richard Hipp (drh) said on 2021-02-18 21:30:43:
Maybe we need a new Fossil command to help with this kind of thing - moving artifacts from one repository to another
This sounds like it is a superset of a one-way (ie non-remergable) shallow clone. Am I right?
Even a one-way shallow clone would be highly useful.
Dan Shearer
(19.1) By george on 2021-02-19 12:02:05 edited from 19.0 in reply to 4 [link] [source]
This could also be seen as an improvement and extension of the bundle
command.
It would be terrific if the set of artifacts
could be defined not only by the types, but also
by the time-range and the author (values in the U-cards).
This would enable Fossil to operate in a true Sneakernet fashion
(which unfortunately becomes more and more useful in some parts of the world):
user generates a bundle with relevant artifacts (say, recently authored by this user), digitally signs it and copies to the transient media.
Later this command could be extended to purge from the repository all those artifacts that are in the bundle (to resist spam/vandalism).
(20) By Offray (offray) on 2021-02-19 16:18:11 in reply to 4 [link] [source]
I would go with fossil extract
or fossil split
from the previous discussions pointed by Warren.
I really like this development direction as it deals with the organic nature of several software and documentation projects in smaller communities that, because of modularity and/or size, need to be splat in different repositories.
(8.1) By Stephan Beal (stephan) on 2021-02-18 23:06:15 edited from 8.0 in reply to 3 [link] [source]
If i can ever get this to work i'll post how it's done.
Got it. My grief earlier was due to forgetting that the first 4 bytes of blob.content
are the encoded value of the blob's size. With that resolved...
This exports all forum-type artifacts from an open checkout of repository A into the new/empty repository blah.fossil
(but it "should" work just the same if the target repository already has content)...
# Create empty test repo:
$ rm blah.fossil ; f new blah.fossil
project-id: a7d817fa88899cda84c2e3640d67a001e03b26b8
...
# Copy all forum-post artifacts:
$ cat blah.sql
attach 'blah.fossil' as other;
INSERT INTO other.blob (uuid, size, content)
SELECT b.uuid, b.size, compress(content(b.uuid))
FROM repository.blob b, repository.event e
WHERE e.type='f' AND e.objid=b.rid;
$ f sql '.read blah.sql'
# Rebuild the repo to convert the "side-loaded" content
# into db-normalized data:
$ f rebuild blah.fossil
100.0% complete...
# Profit:
$ f timeline -t f -R blah.fossil
=== 2021-02-15 ===
04:32:03 [273d18e8f9] Edit: Milestone: lib-client-customizable timeline updates (user: stephan)
=== 2021-02-14 ===
03:04:49 [760be2ca7f] Reply: Milestone: lib-client-customizable timeline updates (user: stephan)
...
The part which eluded me was the call to compress()
. Without that, the content is missing its 4-byte size header and ends up coming out "all wrong" in the target repository. The content()
function takes a symbolic blob name (UUID or prefix, or 'rid:###'
) and expands the fossil-internal format of that content into its full uncompressed/undelta'd form.
Edit: hypothetically it would be "okay" to delete those transferred artifacts from the source repo (then rebuild that one). That would only work because the forum has no direct links (in a fossil-SCM-metadata sense) to non-forum content. The "worst that could happen" is that wiki pages, and similar, which link to forum posts would end up with dead links, but otherwise no SCM-side breakage. That would not work with checkin artifacts unless all of them (from the same DAG) were transferred, as they comprise an immutable chain in and of themselves.
(11) By silasdb on 2021-02-19 01:02:37 in reply to 8.1 [link] [source]
Thanks! It is great to know it is possible (although not trivial).
The very well designed storage format Fossil developers devised from the very beginning show how powerful it is. Thanks!
(13) By Stephan Beal (stephan) on 2021-02-19 01:06:41 in reply to 11 [link] [source]
The very well designed storage format Fossil developers devised from the very beginning show how powerful it is.
That was all Richard. The rest of us came along later.
(5.1) By Warren Young (wyoung) on 2021-02-18 21:48:15 edited from 5.0 in reply to 1.1 [link] [source]
I was wondering if a small project could become so popular...
You seem to be concerned about size, which I'll get to next, but let's be clear also about CPU time, memory, network bandwidth, and such as well. Everything on sqlite.org
(including the Fossil project) is running in parallel on a shared VPS that cost $40/month the last time I heard. Do you expect your project to become larger than SQLite + Fossil + Pikchr and all the rest of the stuff HWACI runs?
Fossil is uncommonly efficient. You probably aren't going to need even so big a host as the $40/month shared VPS. I run all of my projects on a $5/month VPS.
EDIT: …And its CPU is mostly idle! I checked bandwidth usage last month, and I'm taking up about 5% of my allotment.
that its forum turns to be many times bigger than the project source code
If we can go by the following data, the source code grows faster than the conversations about the code:
Repo | Size (MiB) | Age (years) | MiB/yr | Repo Description |
---|---|---|---|---|
fossil-forum | 16 | 2.6 | 6.15 | this forum |
fossil-scm | 146 | 13.6 | 10.7 | the Fossil SCM project |
sqlite-forum | 8.9 | 0.9 | 9.9 | the SQLite forum |
sqlite-src | 439 | 20.7 | 21.2 | the SQLite project |
This also means we cannot expect a 50/50 crossover point: forum data will always be less than all the other project data, combined.
It's possible your projects will be different, but "many times" different seems to stretch credulity.
(7) By Stephan Beal (stephan) on 2021-02-18 21:48:37 in reply to 5.0 [link] [source]
You seem to be concerned about size
FWIW, for libfossil it wasn't about size but organization. It simply feels odd to me to have to do a pull on my source repo to get the latest forum posts. If i can ever get over that inexplicable quirk it "should" (contrary to my current experiments) be straightforward to import the forum post artifacts and rebuild to merge the two. Joining them would be easier than separating them, and was the decisive factor in the decision.
(12) By silasdb on 2021-02-19 01:05:18 in reply to 5.1 [link] [source]
Indeed I was worried about size at first, but after other people replies, I can see different scenarios for this. It is not yet clear for me if chat/forum/wiki should share the same repository and I would like to change that if possible, without having to reset everything. Anyway, text compression (not diff so much, because it almost doesn't affects forums, right?) does a very good job at keeping things small.
(14) By Stephan Beal (stephan) on 2021-02-19 01:08:57 in reply to 12 [link] [source]
Anyway, text compression (not diff so much, because it almost doesn't affects forums, right?) does a very good job at keeping things small.
Delta compression typically doesn't come into play with most forum content because it applies to different versions of the same file/post, and most forum posts are never edited.
(15) By Warren Young (wyoung) on 2021-02-19 01:10:01 in reply to 12 [link] [source]
It is not yet clear for me if chat/forum/wiki should share the same repository
It is to me, because it completely avoids this problem.
Interwiki links help, but it's better if you can just put a hash in square brackets and not care whether it's a forum post, commit ID, wiki edit, ticket comment, or whatever. That's what Fossil was designed to do.
Breaking it up should only be done when you have some strong overriding concern, as with this forum, where almost all of the participants are not committers to the code repo, so you want a strong administrative wall between the two classes of users.
The cost of that wall is more difficult artifact linking.