All projects I have been involved in seem to be migrating due to fossil instability
(1) By anonymous on 2018-10-08 18:11:11 [link] [source]
Hello! I've always loved and used fossil religiously. From the moment I converted from subversion, I felt empowered, in control, like I could focus on coding again instead of working with frustrating version control ideologies. When I started working contract for my first company, I was thrilled to know their many repositories were all selfhosting fossil repositories. The problems set in a few months ago, and I've posted about this before. My personal projects, video games, were becoming too large for fossil, causing it to crash randomly. My 3.7 GiB repository is not uncommon for game design repositories, as many GB of binary images, maps, audio files, and other binary files are commonplace for video game repositories. Though the contract work I am involved in is much different, they have their own reasons for including gigabytes worth of binary files. Eventually the productivity of the team was too compremized by these issues, so we switched to Git. After encountering my 6th nightmarish scenario on my own projects, meanwhile, I was becoming very frustrated. The only way I could fix the issue was making a copy of the working directory and recloning the repository from scratch, which took several hours for each broken repository instance. Today, I regretibly decided enough was enough, and that my own productivity was also being too impaired by these issues to continue maintaining fossil as a viable system for my needs. I wanted to write about this because it is a poignant discovery for me that two dedicated fossil users, one being a small team of 5 + users, have both essentially simultaneously abandoned an otherwise solid tool. I hope someday this can be resolved, and that I can perform the opposite of the fossil-to-git conversion I just completed an hour or two ago. Technical info: Repositories 3GB + in size. Os: Win 7 64-bit Professional Build: 64 bit with ssl. fossil version 2.6 [9718f3b078] 2018-05-04 12:56:42 UTC
(2) By Richard Hipp (drh) on 2018-10-08 18:18:04 in reply to 1 [link] [source]
I'm sorry Fossil didn't work out for you. It continues to work very well for the project I manage (and for which Fossil was originally written).
If you are encounter problems, perhaps you could share reproducible test cases with us so that we could fix them?
(3) By anonymous on 2018-10-08 18:39:44 in reply to 1 [link] [source]
Sorry for the rocky journey you're going through.
Were your problems related to the individual size of the binary objects that you managed with Fossil or to some other aspects?
There're limits to any good tool. Looks like your team tried to push Fossil to some limits. Fossil's user base is of course uncomparable in its size to Git's. Some user problems are harder to generalize on limited occurences.
That's why your experience is still valuable to this project. Hope in retrospect you'd find a chance to restate here the issues you encountered.
(4) By Warren Young (wyoung) on 2018-10-08 22:09:20 in reply to 1 [link] [source]
causing it to crash randomly.
How are you using that word, exactly? I can make several guesses:
The same operation done twice fails first, then succeeds the second time.
An operation that fails may succeed when attempted later, with no apparent difference between the two cases.
The operation fails reliably until some tiny change is made to the input, with no obvious connection between the input change and whether it fails.
Answering that is only the first step towards a repeatable test case, without which we're reduced to educated guesses about what needs fixing.
If the problem is reproducible on your end, but without any obvious way of recreating the problem short of shipping us your massive repository, how about you run Fossil under a debugger and produce a backtrace? If the problem disappears or changes in nature when run under a debugger, tell us that instead.
It amazes me how often software developers will fail to give other software developers the sort of bug report they'd like to receive themselves. Would you accept "crashes randomly" without a core dump, backtrace, or debug result as a useful bug report from a coworker?
At the very least, can we have the output from the /stat
page?
the fossil-to-git conversion I just completed an hour or two ago.
Regardless of how this turns out, I'd like to know if that actually solves the problem. I have no reason to believe Git likes large binary files any more than Fossil does.
(5) By Warren Young (wyoung) on 2018-10-08 22:24:39 in reply to 4 [link] [source]
One other thing: FOSS projects advance most rapidly when the person making the changes cares about the problem because fixing it addresses a selfish need.
The largest file sizes in the repositories of substantial Fossil-managed projects immediately available to me are:
Repo | Largest file size in MiB |
---|---|
SQLite | 16.2 |
Fossil | 7.4 |
PiDP-8/I | 3.2 |
MySQL++ | 1.3 |
(That data comes from the /stat
page, which is why I asked for that info on your repositories.)
I suspect these are typical results for most Fossil users, which means very few Fossil users would benefit if this problem was fixed. It also means very few Fossil users have a test case they can use in solving the problem.
If you aren't willing or able to provide debugging results — or better, reproducible test cases — then the best idea I have for attacking this problem is to just create dummy repos containing randomly generated data until it crashes. That sounds like an exercise in low-payoff disk-filling to me.
(6) By Stephan Beal (stephan) on 2018-10-09 00:03:37 in reply to 5 [link] [source]
Just to point out to anyone who doesn't know this already: "fixing" the limitation of "excessively large" files is not trivial (and, as Warren points out, is far removed from fossil's intended use cases). For many different operations, fossil loads the whole contents of a file into memory. For generating deltas, it can hold up to 3 copies in memory: the original copy, the new copy, and the delta. Because fossil stores most versions of a file as deltas, then it traverses older versions it has to unpack those, which means applying deltas, which means memory costs directly related to the sizes of the versions of the blobs. The delta generator also expects the whole body to be in memory as the delta algorithm scans it, and i cannot comment on how much effort it would be to port it to something which only keeps some portion of each blob in memory.
i.e. this is not something which can simply "be patched". It would require significant reworking in some of the deepest parts of fossil. Even so, i'm not certain if sqlite supports the i/o operations needed for fossil supporting blobs larger than the system memory.
(7.1) By sean (jungleboogie) on 2018-10-09 00:46:49 edited from 7.0 in reply to 1 [link] [source]
Just curious, have you tried using the unversioned feature? I don't necessarily know if this is the answer, or if multiple copies are still loaded into memory, as pointed out above.
(8) By Kevin (KevinYouren) on 2018-10-09 03:46:03 in reply to 7.1 [link] [source]
Having followed this thread, I thought I would run a couple of tests. I had a go with a test fossil repo of 29M, to test the "unversioned file" method. I backed up the repo first. First I tried adding a 3G movie - it took a few minutes, maxed out 7.7G of memory, went to Swap disk memory, then finished with no message. It DID NOT work, but it didn't corrupt the fossil repo. Second, I added a 700Mb movie successfully. I have included the commands and the stats for reference. kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ ls -lai /mnt/KCYPictures/101D3200/DSC_0147.MOV 4325539 -rwxrwxrwx 1 kevin kevin 3273877775 Jun 7 2015 /mnt/KCYPictures/101D3200/DSC_0147.MOV cd /mnt/KCY/KCYPrograms/Concordance/ fossil uv add /mnt/KCYPictures/101D3200/DSC_0147.MOV --as DSC_0147.MOV Name Age Size User SHA1 rcvid DSC_0147.MOV 18.0 minutes -1,021,089,521 bytes kevin 1c5db44c6ec7b05d5ce5df2369bedc2fe4a2a5eb 52 Total over 1 files -1,021,089,521 bytes Repository Size: 29,043,712 bytes Number Of Artifacts: 446 (132 fulltext and 314 deltas) Details Uncompressed Artifact Size: 517,123 bytes average, 110,990,434 bytes max, 230,637,197 total Compression Ratio: 7:1 Unversioned Files: 1 files, 0 bytes compressed, 0% of total repository space Number Of Check-ins: 50 Number Of Files: 148 Number Of Wiki Pages: 1 Number Of Tickets: 0 Duration Of Project: 996 days or approximately 2.73 years. Project ID: 9bd7a868b6c55cc8cfc30a13bea135d5fbffb25f Concordance and File Directory DB Fossil Version: 2018-09-22 00:54:37 [c285cd0812] (2.7) (details) SQLite Version: 2018-09-18 20:20:44 [2ac9003de4] (3.25.1) (details) Schema Version: 2015-01-24 Repository Rebuilt: 2018-09-16 10:13:07 By Fossil 2.6 [1bfd790352] 2018-08-17 01:27:46 UTC Database Stats: 28,363 pages, 1024 bytes/page, 10,158 free pages, UTF-8, delete mode Backoffice: Last run: 18 seconds ago Name Age Size User SHA1 rcvid DSC_0278.MOV 2.9 minutes 723.2MB kevin 9f26035149e86724df5a00b3b73f9a48b6b54276 54 DSC_0147.MOV 35.5 minutes -1,021,089,521 bytes kevin 1c5db44c6ec7b05d5ce5df2369bedc2fe4a2a5eb 52 Total over 2 files -297,864,195 bytes Repository Size: 744,705,024 bytes Number Of Artifacts: 446 (132 fulltext and 314 deltas) Details Uncompressed Artifact Size: 517,123 bytes average, 110,990,434 bytes max, 230,637,197 total Compression Ratio: 3:10 Unversioned Files: 2 files, 723.2MB compressed, 97% of total repository space Number Of Check-ins: 50 Number Of Files: 148 Number Of Wiki Pages: 1 Number Of Tickets: 0 Duration Of Project: 996 days or approximately 2.73 years. Project ID: 9bd7a868b6c55cc8cfc30a13bea135d5fbffb25f Concordance and File Directory DB Fossil Version: 2018-09-22 00:54:37 [c285cd0812] (2.7) (details) SQLite Version: 2018-09-18 20:20:44 [2ac9003de4] (3.25.1) (details) Schema Version: 2015-01-24 Repository Rebuilt: 2018-09-16 10:13:07 By Fossil 2.6 [1bfd790352] 2018-08-17 01:27:46 UTC Database Stats: 727,251 pages, 1024 bytes/page, 0 free pages, UTF-8, delete mode Backoffice: Last run: 53 seconds ago kevin@KCYDell:/mnt/KCY/KCYPrograms$ cp /mnt/KCYdata/KCYRepos/Concordance.fossil /mnt/KCYdata/KCYRepos/Concordance_20180922.fossil kevin@KCYDell:/mnt/KCY/KCYPrograms$ cd /mnt/KCY/KCYPrograms/Concordance/ kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil uv add /mnt/KCYPictures/101D3200/DSC_0147.MOV --as DSC_0147.MOV ... wait a few minutes, used gnome-system-monitor, maxed out 7.7G ram, went to swap finished, now message kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ sha256sum /mnt/KCYPictures/101D3200/DSC_0147.MOV 31d72cf679bfeba60c9bd9bac68a042fea38785180921273c466eee68309cf5a /mnt/KCYPictures/101D3200/DSC_0147.MOV kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil ui Listening for HTTP requests on TCP port 8080 ^C kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ sha3sum /mnt/KCYPictures/101D3200/DSC_0147.MOV 061e77deea580761d71f813c6b827e1b9a9d6b40a6b0ea4505e716ef /mnt/KCYPictures/101D3200/DSC_0147.MOV kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil commit -m "try to add a 3G unversioned file" New_Version: 26bdb5ef9d99f43f1bd4a6ce8f44fab1a869d3d5 kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil ui Listening for HTTP requests on TCP port 8080 ^C^C kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ sha1sum /mnt/KCYPictures/101D3200/DSC_0147.MOV 1c5db44c6ec7b05d5ce5df2369bedc2fe4a2a5eb /mnt/KCYPictures/101D3200/DSC_0147.MOV kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil version This is fossil version 2.7 [c285cd0812] 2018-09-22 00:54:37 UTC kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil ui Listening for HTTP requests on TCP port 8080 ^C kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ ls -lai /mnt/KCYPictures/101D3200/DSC_0147.MOV 4325539 -rwxrwxrwx 1 kevin kevin 3273877775 Jun 7 2015 /mnt/KCYPictures/101D3200/DSC_0147.MOV kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ date;fossil uv add /mnt/KCYPictures/101D3200/DSC_0278.MOV --as DSC_0278.MOV;date Tue 9 Oct 14:19:45 AEDT 2018 Tue 9 Oct 14:20:44 AEDT 2018 kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ ls -lai /mnt/KCYPictures/101D3200/DSC_0278.MOV 4325650 -rwxrwxrwx 1 kevin kevin 723225326 Jun 27 2015 /mnt/KCYPictures/101D3200/DSC_0278.MOV kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ ls -lai /mnt/KCYdata/KCYRepos/Concordance.fossil 1317255 -rw-r--r-- 1 kevin kevin 744705024 Oct 9 14:20 /mnt/KCYdata/KCYRepos/Concordance.fossil kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$ fossil ui Listening for HTTP requests on TCP port 8080 ^C kevin@KCYDell:/mnt/KCY/KCYPrograms/Concordance$
(9) By Warren Young (wyoung) on 2018-10-09 05:09:34 in reply to 8 [link] [source]
First I tried adding a 3G movie
That'll never work short of a SQLite format change.
The only thing Fossil should be doing here is getting the file size up front and refusing to even try if the size exceeds the limit compiled into that instance of Fossil. Arguably it should refuse for anything beyond the default, for broader compatibility.
(10) By Richard Hipp (drh) on 2018-10-09 12:32:10 in reply to 9 [link] [source]
If I make a new setting which is the maximum size of a managed file, what should the default value for that setting be? 25MB?
The upper bound for the setting will probably be 1GB.
(11) By jvdh (veedeehjay) on 2018-10-09 12:46:09 in reply to 10 [link] [source]
why would such setting help? to prevent accidental stupid checkins of large binaries? otherwise, as someone else already said, it would be good if fossil would simply check whether it can cope with the file(size) at all -- whatever the limit is (2 GB?) and refuse to add it to the list of managed files. and what about the situation, where the file is initially below the limit and than increases beyond it during the project's life time? meaning: would not the file size check have to be done for all checkins?
(12) By Stephan Beal (stephan) on 2018-10-09 12:50:05 in reply to 11 [link] [source]
The limit is essentially the amount of memory which fossil is allowed (by the system) to allocate, ignoring (for the moment) any 32-bit overflows which are likely a problem (those are solvable at the code level). Fossil cannot reliably calculate, in advance, whether a given file's size is a problem. It can only try and see what happens. Even specifying some hard limit (say, 2GB) would fail for small devices like Raspberry Pis with 512MB memory and 1GB or less of swap space. (Yes, some people use fossil on such systems.)
(13) By skywalk on 2018-10-09 13:59:36 in reply to 7.1 [link] [source]
This was a use case I hoped 'unversioned' would address. But, its focus is intentionally more narrow. I would appreciate an expanded scope and utility of 'unversioned' to support checkin and checkout of both tracked and untracked files.
(14) By anonymous on 2018-10-09 14:36:54 in reply to 12 [link] [source]
Maybe it would make sense to revisit Fossil error reporting so it would properly describe the failure in such case?
If it's out-of-memory condition, then fail with the message. Looks like presently Fossil silently fails to add the super-large file.
Most likely this means fixing those 32-bit overflows along the way.
(15.1) By Warren Young (wyoung) on 2018-10-09 17:46:14 edited from 15.0 in reply to 9 [link] [source]
First I tried adding a 3G movie
That'll never work short of a SQLite format change.
There may be a simple workaround for that. drh, what if the manifest parser is changed to allow more than one F card to refer to a given file name, with the meaning that each reference after the first just adds more content to the virtual file?
On checkout, Fossil could write a file out and close it when it finds the first F card for that file name, just as today, then on each subsequent F card for that file name, open the file in append mode instead. It could keep a hash table mapping file names to Boolean flags to determine whether to open with the append flag or not.
On diff, Fossil would have to keep the diff buffers around when it reaches the end of one file blob, completing the diff only after the last card in the manifest is processed.
Doing this would remove any motivation for a maximum file size setting, since you could stay well away from the SQLite BLOB limit. ZFS, for example, does just fine with a 1 meg maximum block size, with everything larger than that divided up as necessary. The limit only has to be large enough to allow for efficient file I/O, keeping DB fragmentation and manifest size down while not requiring massive I/O buffers.
This design might even let you get around the current 3× file size magnification problem, which in turn might allow repositories with huge files to work on operating systems for 32-bit CPUs which have filesystems that allow 64-bit file offsets. (e.g. ext3 on x86.)
EDIT: This idea would complicate the deltififcation algorithm, but I think the worst that would happen is that there would be hard barriers at the buffer boundaries beyond which deltas can't be computed. If only one byte in a 10 GiB file changes and the maximum blob size is set to 0.1 GiB, you'd end up with 99 zero-size blobs and one small blob to encode the 1-byte difference. (Plus 100 F cards referring to the pieces of the 10 GiB file.) This is still pretty efficient.
If one byte is inserted in that scenario, then I think you'd end up with 100 small delta blobs, with most of them encoding "added one byte at the beginning, removed one byte at the end" due to the shift-by-one nature of the diff. That means deltification only breaks down when you're inserting a large fraction of the maximum blob size, in which case you're probably going to end up with poor deltification even if you did the delta over the entire file in a single pass.
For backwards compatibility, the code shouldn't assume that incoming blobs are of any particular size. If the default maximum blob size is 1 MiB, all four repositories in my maximum blob size table above would have over-size blobs. The new code will have to be written to cope with this: I don't believe you could fix it with a rebuild, since that would change the hashes of any blobs it had to split.
(16) By Roy Keene (rkeene) on 2018-10-09 15:51:32 in reply to 10 [link] [source]
I currently store several large files as unversioned files for the Linux distribution I maintain as a Fossil repository. The unversioned files are greater than 1GB in size in many cases (installer image ISOs).
(17) By Kevin (KevinYouren) on 2018-10-10 23:16:55 in reply to 15.1 [link] [source]
The following are my personal opinions, but somewhat influenced by my experience of "software configuration management" since the 80s. There is a need for a message to indicate a proposed addition is above 2GB, or even above 100MB, for example. I would would suggest "unversioned" (with implicit deltas) be extended to "unversioned with no deltas". Also, "versioned with no deltas". For example, Microsoft Word and Excel documents, which change constantly, but binary "diffs" are not appropriate.
(18) By ddevienne on 2018-10-11 08:18:43 in reply to 17 [link] [source]
Also, "versioned with no deltas". For example, Microsoft Word and Excel documents, which change constantly, but binary "diffs" are not appropriate.
I don't see the point. You compute the delta, and decide whether you keep the delta or not, based on the size of the delta and some heuristics and thresholds. It shouldn't have anything to do with the type of the file. Office files are ZIPs with a bunch of files inside, and some edits touch only a small subsets of the files inside. It's possible the ZIP changes only in some places. Thus deltas may very well be appropriate. --DD
(19) By Kevin (KevinYouren) on 2018-10-11 12:03:43 in reply to 18 [link] [source]
DD, I see your point. I only used the 'track changes' in Excel & Word, and then over 4 years ago. Fossil's diff displays are far superior. I'm not sure about the "Collaborative software", "Workspaces", etc. The sales brochures looked good, but the guys running it kept on asking us to delete files... Some tools such as the job scheduler, Control-M, have version control built in to them, and they had extensions to allow you the keep the schedules in Oracle databases. However, our SCM guys wanted it put under Github control. I think I'll try adding two or three versions of Word and Excel-like docs into Fossil, and see what happens. I'll use Libreoffice instead of Microsoft though. On the other hand, if other products such as GIT/GitHub/SVN etc don't cater for Word and Excel, then maybe it's not an issue.
(20) By anonymous on 2018-10-23 16:54:32 in reply to 1 [link] [source]
Hey everyone! This is the OP. It's good to see the conversation this has sparked. I did want to drop in and say that Git is indeed working for me where fossil did not. I know one of the posters said they had no proof that Git would work any better in this usecase than fossil. It is running significantly faster, however, and so far without error. Commits could take 2-4 minutes, sometimes longer, on my local machine, not counting autosync. So far Git has never taken an entire minute. Of course this is, if I understand correctly, because fossil is safer? Anyway, I just wanted to give that quick update.
(21) By sean (jungleboogie) on 2018-10-23 20:18:05 in reply to 20 [link] [source]
Happy to hear about your experience. Keep in mind that you can disable the checksum feature in Fossil, which would allow your commits to be created much quicker.
Anyway, now that git is working well for you, how many times have you had to revert to a previous large binary file?
(22) By anonymous on 2018-10-23 21:20:08 in reply to 20 [link] [source]
Good to know that you've settled on a tool that works well for your purpose!
Now that you're back on track, could you list the the details of the Fossil issues that you expeirenced?
I understand from your posts that your Fossil repo size was rather large ~4GB, and you experienced long commit times ~4min. Also there were some 'catastrophic' occurences to your large repos.
- Were you versioning files that were large individually? How large?
- Would a commit take as long on any change (even to a single non-binary source)? Or the extra long waits were only when large binary files were being part of the committed changes?
- Were you experiencing long waits for the
'fossil status'
to complete as well, or it was not perceivable slower? - What sort of 'catastrophies' you had (sorry, for the memories)? Was it repo corruption caused by some failed operation (add, commit, merge)? If that was the case was a large binary file involved in the failed operation or the corruption was triggered by some simple non-binary file change?
- Did the gerenral experience/instability begin from the moment of initially populating of the repo? Or it has degraded as your project work progressed?
- Any other observation that may be specific to your project's use of Fossil
This sort of info would help both anyone with a similar intended use-case as yours and also for the Fossil devs to better see how Fossil dealt with it and possibly improve on that.
Thanks!
(23.1) By ddevienne on 2018-10-25 06:47:15 edited from 23.0 in reply to 20 [link] [source]
"[...] Git is running significantly faster [...]"
Not sure this is one the reasons for the performance differences, but DRH often touts for good reason the "pile of files" vs "database" nature of Git vs Fossil, and by DRH's own testing [1], beyond a certain size, SQLite just does not compete with the filesystem.
Also Git was recently made faster for very large repos, thanks to Microsoft's efforts [2].
Perhaps the doc should clearly state Fossil does not scale beyond certain sizes. At work with have a Perforce repo in the TBs, those do happen in the corporate world.
FWIW. --DD
[1] https://www.sqlite.org/intern-v-extern-blob.html
[2] https://blogs.msdn.microsoft.com/bharry/2017/05/24/the-largest-git-repo-on-the-planet/
(24) By anonymous on 2018-10-24 14:00:56 in reply to 20 [link] [source]
quote: I did want to drop in and say that Git is indeed working for me where fossil did not. If Git ceases to meet your needs, will you come back and report your experiences?
(25.2) By andygoth on 2018-10-24 22:02:56 edited from 25.1 in reply to 9 [source]
Unrelated to Fossil, I successfully stored very large files (many gigabytes in size) in an SQLite database by dividing large files into chunks. See this post for details. My point is that it's not necessary to change the SQLite format. Though maybe when you say "SQLite format change" you mean change Fossil's SQL schema.
(26) By ddevienne on 2018-10-25 07:01:05 in reply to 25.2 [link] [source]
Sure you can, but why would you though?
Given [1] and the fact the filesystem is faster for larger sizes starting around 1MB?
And the numbers from [1] are for a single-blob "file". Your manually chunked large files would fair worse I'm betting.
You are basically emulating a filesystem in SQLite, which I'm sure is useful and cleverly done, but the code to access the files can't be normal file-based code, but must go through a different schema-aware and SQLite-based layer. Put a FUSE filesystem on top of your layer, and compare the performance of your SQLite filesystem versus the native filesystem using normal system tools like cp, tar, etc... and see what factor you get.
I'm a huge fan of SQLite, yet I still think its inline blobs is one of its weak spots. And I've been saying this for a while on the SQLite list. Especially when one wants to update a blob, rewriting only a small portion for example. But blob rewriting is obviously not a use case Fossil cares much about :)
[1] https://www.sqlite.org/intern-v-extern-blob.html
(27) By andygoth on 2018-10-25 16:53:38 in reply to 26 [link] [source]
Chunking wasn't part of my original task. I discovered the file size limitation, along with the requirement for large files, as I went.
My original task was to archive many named/dated versions of a directory hierarchy in a single file. You'd think Fossil itself would be a good fit (ignoring the file size thing, which I didn't realize at the start), but there was the usual pushback against third-party code whilst simultaneously assuming the goodness of in-house code, yet also unquestioning acceptance of code that shipped with the OS, which included older versions of Tcl and SQLite.
Thus, I implemented a mini-Fossil using Tcl and SQLite, then I had to add chunking when I found it to be necessary.
Simply using the filesystem normally (keeping every directory being "archived") wasn't working well for us because of the massive duplication of identical files between directories.
My first version used zip files rather than SQLite, until I found that given the size of my zip files, the performance of making updates to the archive was utterly execrable. It was far faster to just make new zip files every time. Plus, I was losing many gigabytes in duplication between archives.
Having a specialized filesystem tree associating hashes with contents (e.g. the output of fossil deconstruct
) wouldn't work either because in the name of Security, they misconfigured their OS so that every file open, read, write, or attribute change was many orders of magnitude slower than you'd expect. Common tasks like untarring a directory (or even recursively changing its ownership!) went from taking seconds to taking hours. And no one is interested in fixing or even looking into why this is, since if the system is inconvenient, that's proof that it's Secure. I'm guessing it was trying to log every filesystem action, and the logger was timing out at every operation. Bonkers.
Thus, it made a lot of sense to put the filesystem inside an SQLite database. This solved all my issues nicely, and performance (both timing and disk utilization) was a rocketship compared to the baseline, but then again, anything would beat out what was being done before.
FUSE wasn't a requirement since I was working with entire archives at a time, never needing to extract just one file.