Performance Statistics
The questions will inevitably arise: How does Fossil perform? Does it use a lot of disk space or bandwidth? Is it scalable?
In an attempt to answers these questions, this report looks at five projects that use fossil for configuration management and examines how well they are working. The following table is a summary of the results. Explanation and analysis follows the table.
Project | Number Of Artifacts | Number Of Check-ins | Project Duration (as of 2009-08-23) |
Average Check-ins Per Day | Uncompressed Size | Repository Size | Compression Ratio | Clone Bandwidth |
---|---|---|---|---|---|---|---|---|
SQLite | 28643 | 6755 | 3373 days 9.24 yrs | 2.00 | 1.27 GB | 35.4 MB | 35:1 | 982 KB up 12.4 MB down |
Fossil | 4981 | 1272 | 764 days 2.1 yrs | 1.66 | 144 MB | 8.74 MB | 16:1 | 128 KB up 4.49 MB down |
SLT | 2062 | 67 | 266 days | 0.25 | 1.76 GB | 147 MB | 11:1 | 1.1 MB up 141 MB down |
TH3 | 1999 | 429 | 331 days | 1.30 | 70.5 MB | 6.3 MB | 11:1 | 55 KB up 4.66 MB down |
SQLite Docs | 1787 | 444 | 650 days 1.78 yrs | 0.68 | 43 MB | 4.9 MB | 8:1 | 46 KB up 3.35 MB down |
The Five Projects
The five projects listed above were chosen because they have been in existance for a long time (relative to the age of fossil) or because they have larges amounts of content. The most important project using fossil is SQLite. Fossil itself is built on top of SQLite and so obviously SQLite has to predate fossil. SQLite was originally versioned using CVS, but recently the entire 9-year and 320-MB CVS history of SQLite was converted over to Fossil. This is an important datapoint because it demonstrates fossil's ability to manage a significant and long-running project. The next-longest running fossil project is fossil itself, at 2.1 years. The documentation for SQLite (identified above as "SQLite Docs") was split off of the main SQLite source tree and into its own fossil repository about 1.75 years ago. The "SQL Logic Test" or "SLT" project is a massive collection of SQL statements and their output used to compare the processing of SQLite against MySQL, PostgreSQL, Microsoft SQL Server, and Oracle. Finally "TH3" is a proprietary set of test cases for SQLite used to give 100% branch test coverage of SQLite on embedded platforms. All projects except for TH3 are open-source.
Measured Attributes
In fossil, every version of every file, every wiki page, every change to every ticket, and every check-in is a separate "artifact". One way to think of a fossil project is as a bag of artifacts. Of course, there is a lot more than this going on in fossil. Many of the artifacts have meaning and are related to other artifacts. But at a low level (for example when synchronizing two instances of the same project) the only thing that matters is the unordered collection of artifacts. In fact, one of the key characteristics of fossil is that the entire project history can be reconstructed simply by scanning the artifacts in an arbitrary order.
The number of check-ins is the number of times that the "commit" command has been run. A single check-in might change a 3 or 4 files, or it might change several dozen different files. Regardless of the number of files changed, it still only counts as one check-in.
The "Uncompressed Size" is the total size of all the artifacts within the fossil repository assuming they were all uncompressed and stored separately on the disk. Fossil makes use of delta compression between related versions of the same file, and then uses zlib compression on the resulting deltas. The total resulting repository size is shown after the uncompressed size.
On the right end of the table, we show the "Clone Bandwidth". This is the total number of bytes sent from client to server ("uplink") and from server back to client ("downlink") in order to clone a repository. These byte counts include HTTP protocol overhead.
In the table and throughout this article, "GB" means gigabytes (109 bytes) not gibibytes (230 bytes). Similarly, "MB" and "KB" means megabytes and kilobytes, not mebibytes and kibibytes.
Analysis And Supplimental Data
Perhaps the two most interesting datapoints in the above table are SQLite and SLT. SQLite is a long-running project with long revision chains. Some of the files in SQLite have been edited close to a thousand times. Each of these edits is stored as a delta, and hence the SQLite project gets excellent 35:1 compression. SLT, on the other hand, consists of many large (megabyte-sized) SQL scripts that have one or maybe two versions. There is very little delta compression occurring and so the overall repository compression ratio is much lower. Note also that quite a bit more bandwidth is required to clone SLT than SQLite.
For the first nine years of its development, SQLite was versioned by CVS. The resulting CVS repository measured over 320MB in size. So, the developers were pleasently surprised to see that this entire project could be cloned in fossil using only about 13MB of network traffic. The "sync" protocol used by fossil has turned out to be surprisingly efficient. A typical check-in on SQLite might use 3 or 4KB of network bandwidth total. Hardly worth measuring. The sync protocol is efficient enough that, once cloned, fossil could easily be used over a dial-up connection.