By skywalk on 2018-07-31 17:57:27 and edited on 2018-07-31 18:04:51 [link]
Unsure about the maintenance aspect with single post deletions. Can this be treated as "unversioned" so deltas and bloat are minimized?
Is there a 5min delay to edit posts?
By drh on 2018-07-31 18:08:58 in reply to previous
The original version-1 implementation did that - kept the forum messages as unversioned content. But with forum messages outside of the block chain, syncing becomes a much bigger issue. And I was unable to leverage all of the other infrastructure already in Fossil for dealing with block-chain content. So, after messing with the unversioned content approach for a while, I started over with a new branch named "forum-v2" that stores each individual post as an element in the DVCS block chain.
There are still concerns about bloat. But then I checked the sqlite-users mailing list and found that all traffic on that list for the past 10 years amounts to only about 35MB of text, which is easily within the scope of a repository.
Also, though you can include the forum in the same repository as your code, I was planning to put the forums for Fossil and SQLite in separate repos, so that users do not have to clone all of the historical forum traffic at the same time as they clone the code. Longer-term I might later add features that allow cloning just the code from a repository that commingles the code with the forum traffic, but that is for another day.
I was planning to put the forums for Fossil and SQLite in separate repos
That's neat! I like that idea.
I for one will be keeping forum content with my normal repo content.
Think of it this way: source code, documentation, wikis, tickets...all have the same basic property that they're typed and maintained by humans. This means they're generally written in low information density languages, and there's an upper limit on how much text a given person can create in a day. Adding one more stream doesn't constitute "bloat."
If your repository is too big, it probably isn't because of the forum content. Look at your JPGs and ZIPs and EXEs and who knows what other big binary blobs you've got going on in there.
Yes, for projects of smaller sizes than fossil and sqlite, it might make sense for that. If I'm cloning sqlite and fossil, I might not care to have a copy of the forum post, especially if I needed to sync it up daily to stay in sync.
all traffic on that list for the past 10 years amounts to only about 35MB of text
When I finally build fossil with forum capability 5416287d18f94860, I was hoping that sync with main fossil repository will bring in fossil/sqlite mailing lists history for local perusal. As of this writing I realized, there is clonable https://www.fossil-scm.org/forum but only with new forum posts.
Is there a way to get that historical treasure trove preferably as syncable repository?
Is there a way to get that historical treasure trove
Step 1: Subscribe to the mailing list in 2008. Step 2: Don't delete anything. :)
I joke, but the trick then is...
preferably as syncable repository?
...how would you convert your carefully hoarded mailing list archive to a Fossil Forum?
I was thinking about this just last night, and I decided that you'd have to write a script that loads each mail up, parses it for
In-Reply-To headers to rebuild the temporal graph of postings. Then the script would have to traverse that tree in depth-first order to construct the Fossil post/reply commands that rebuild that structure in Fossil.
Now you've got to figure out how to get Fossil to let the script backdate each new post to match its original mailing list submission date. One way to do that would be to add an extension to the current
POST /forume1 and
/forume2 handlers, which can be turned on only at configure time by those who know they need it temporarily, lest others abuse it if left enabled on a public Fossil instance.
I'm not sure if Fossil Forums will accept forum posts claiming to be from email addresses belonging to neither full Fossil "users" nor to subscribers. That might require another compile-time override. I think Fossil will cope with the post once inserted into the blockchain, though, since it's currently possible for a mere "subscriber" (i.e. not a full Fossil "user") to unsubscribe without deleting all their prior postings.
I don't think it would be right to actually re-subscribe all of those people to your new forum: many will have unsubscribed from the mailing list and wouldn't appreciate being signed back up just because you imported your old mailing list archive.
There are also subtleties like making sure the
Content-Type header is suitable for the way Fossil Forums works, which currently assumes everything ends up as UTF-8. Your mailer probably copes with other encodings, so you wouldn't notice if one of the mailing list's subscribers sent in something else, like ISO 8859-x characters with the high bit set. This script should convert such things to UTF-8 before inserting each post.
I've spent all this time thinking about it because the problem is broader than just the old Fossil users' mailing list. I want to import the old and extensive mailing list for one of my public Fossil-based repos, too.
An alternate plan would be to build a scraper for one of the public mailing list archive services, but that basically just trades off the difficulty of constructing the graph — the archive service already figured that out for you — for the bandwidth required to re-traverse all of that data. You might have to battle your way past "features" in their software meant to prevent people from sucking down entire mailing list contents.
So, who's volunteering to write the script? 😯
Thanks Warren for broader perspective and analytic insight. My hopes for past fossil mailing list as a repo were based on drh's claims about amount of mail-list texts. I thought it might be already available.
Anyway, mail-list importer (and while we are at it perhaps GitHub Issues importer) to the fossil repo seems like really appealing idea.
One way to do that would be to add an extension to the current POST /forume1 and /forume2 handlers
I'd see an ML importer more as a direct to SQLite thing, than going through the WebApp. I assume the forum has an relational data-model for posts, and a command-line client to import an ML archive that inserts directly in SQLite makes more sense. My $0.02.
I assume the forum has an relational data-model for posts
Nope. Fossil forum posts are stored as artifacts in the Fossil blockchain, which is not relational. Fossil blockchain artifacts are stored in a SQLite table, specifically the Fossil
blob table — not to be confused with the SQLite
BLOB type — but the content of each
blob record is very much not plain SQL data.
That's one reason I was thinking you'd use the web API, since then you don't have to rewrite the code for modifying the block chain robustly, since presumably you'd write this script in a language other than C, so you couldn't just copy-paste or link to the existing code.
OK, I see what you mean. Thanks for the precisions.
I meant more in SQLite, but you're right, doing direct SQLite would be ill-advised, since adding artifacts to Fossil is not that simple.
Which reminds me that it's a shame Fossil cannot be used as an API. Obviously DRH is great at API design, and offering a high-level API allowing to add/browse/etc... Fossil artifacts programmatically would be fantastic.
Just like SQLite took over the world, so would Fossil's underlying tech.
Note that I'm talking about the "core" functionality of Fossil management of artifacts. Not the web-app, TH1, wiki, etc... on top of the "core". I'd love to be able to reuse those too, but that's a different story. And just like SQLite, DRH would expose a high-level API that still leaves him "free rein" to evolve the internals.
With a Fossil core API, ML archive import could and probably should be a C program then. My $0.02. --DD
On the other hand, if you did want to write something to format import candidates as forum post artifacts...
The bundle command is clearly not designed for this purpose, but the schema is simple and it looks like import will consume any valid artifacts that are in the bundle. Each forum thread starts a new hash tree, so I think anything generating such a bundle shouldn't need to know anything about the existing repository.