Fossil

View Ticket
Login

View Ticket

Ticket Hash: 6c5471445173a17fffa6aaf867f2f7d3da8f8151
Title: fossil clone is slow
Status: Open Type: Code_Defect
Severity: Important Priority:
Subsystem: Resolution: Open
Last Modified: 2011-05-10 12:42:40
Version Found In: c492eab395
Description:
"fossil clone" over wire is extremely slow for large repositories.

Compare:

117.76s user 3.48s system 90% cpu 2:13.33 total

for clone from local file to local file and

349.86s user 17.71s system 59% cpu 10:16.45 total

for clone from http://localhost:8080.

Total network traffic is around 20MB (for a 40MB repository), so that would be 8min for the data transfer or 42KB/s, which is clearly not acceptable.


drh added on 2010-10-03 18:04:00:
Once again, this is completely outside of our 3-year experience deploying Fossil. It suggests that your repository has a very different structure that what we are used to seeing. If you could share your repository with us, that would be helpful in getting to the root of the problem.


anonymous added on 2010-10-05 16:16:54:
For the record, with the recent small stack fix, I see:

% time ~/work/fossil/fossil clone http://localhost:8080 test-clone
                Bytes      Cards  Artifacts     Deltas
Send:              49          1          0          0
Received:     1236613      26882          0          0
Send:           10025        225          0          0
Received:      104025        206         34        166
Send:           18849        405          0          0
Received:      430475        402         47        353
Send:           37649        801          0          0
Received:      391721        802         53        747
Send:           75249       1601          0          0
Received:     1401506       1602        133       1467
Send:          150449       3201          0          0
Received:     8329951       3202        527       1132
Send:          155995       3319          0          0
Received:     5152386       3320         19          0
Send:            9449        201          0          0
Received:    11672693        202         51          4
Send:            9449        201          0          0
Received:     5011269        202         19          0
Send:            9449        201          0          0
Received:     4747929        202         91        109
Send:           18849        401          0          0
Received:     2544348        402        246        154
Send:           37649        801          0          0
Received:     3850220        802        323        477
Send:           75249       1601          0          0
Received:     2532722       1602        317       1283
Send:          150449       3201          0          0
Received:     2519081       3202        216       2984
Send:          300849       6401          0          0
Received:    11316003       6402        577       2470
Send:          286467       6095          0          0
Received:     5879597       6096          2          3
Send:            9449        201          0          0
Received:     2362448        202         98        102
Send:           18849        401          0          0
Received:     1664619        402        156        244
Send:           37649        801          0          0
Received:     3444746        802        368        432
Send:           75249       1601          0          0
Received:     5068184       1602        144         21
Send:           15559        331          0          0
Received:     2669945        332        106        224
Send:           31069        661          0          0
Received:      236198        662          0        660
Send:           62089       1321          0          0
Received:      506061       1322          0       1320
Send:          124129       2641          0          0
Received:      988722       2642          0       2640
Send:          248209       5281          0          0
Received:     2033552       5282          0       5280
Send:           50903       1083          0          0
Received:     1696392       1084          4       1078
Total network traffic: 1066680 bytes sent, 28570580 bytes received
Rebuilding repository meta-data...
26881 (100%)...
project-id: 72f6eddf4e758c90816c58765010339df04f22f9
server-id:  006b3b3f853f6333ad52939d9d2643dd2980db40
admin-user: joerg (password is "***")
11633 (99%)...
~/work/fossil/fossil clone http://localhost:8080 test-clone  388.29s user 17.44s system 79% cpu 8:31.89 total

E.g. better, but still a lot of time. Another observation is the space usage of the new clone:

Database Stats: 197170 pages, 1024 bytes/page, 149894 free pages, UTF-8, delete mode

The clone is four times the necessary size.


wolfgang added on 2010-10-05 17:13:36:
i did a small performance test on my windows machine with a DMC compiled version [63d91f0b87] 2010-10-05 16:37:40 UTC:

  • Using localhost http clone for sqlite(~20MB transfered) 3 minutes running against a local "fossil server" service, CPU load goes up to 90%
  • Using file copy clone for sqlite 1 minutes
    which is infact a filecopy followed by a rebuild

I think, these times are OK(2 minutes for 20MB), if we remember, that running the http-variant uses 2 fossil instances and all traffic is checked by firewall and virus scanner and that fossil isn't designed as a high-speed html server.


anonymous added on 2010-10-05 20:25:56:
For fossil to be a practical alternative to cvs/svn/git/..., it has to perform well for at least two network related tasks: (1) Initial clone (2) Update of a recent enough clone.

The test case here is not effected by network bandwidth or latency. The test case is not bound by disk bandwidth either, since it is small enough to fit comfortably into memory. As such, it should be a lot faster. For example, I don't understand why it needs so many round trips. The initial clone is the easy problem, both side should know exactly what the other side has.

I wouldn't be surprised if this kind of problems don't happen with a less skewed distribution of the commits (1/3 of all commits to one file), but that doesn't mean it can't be done better. I'm going to try later with a repository that I can publish, so hopefully it can be sorted out.


anonymous claiming to be Alexey Pechnikov added on 2010-10-06 18:08:11:
On my Atom netbook 1,6Ggh I have:

$ time fossil clone http://sqlite.org/src sqlite-src.fossil
                Bytes      Cards  Artifacts     Deltas
Send:              49          1          0          0
Received:     1607865      34952          0          0
Send:           10025        225          0          0
Received:      138945        244          7        193
Send:           18849        415          0          0
Received:      153088        430          3        397
Send:           37649        801          0          0
Received:      876957        830         71        729
Send:           75249       1601          0          0
Received:     1018521       1630         63       1537
Send:          150449       3201          0          0
Received:     1896100       3230         63       3137
Send:          300849       6401          0          0
Received:     3211644       6430         91       6309
Send:          601649      12801          0          0
Received:     5286633      12830        176       6867
Send:          662091      14087          0          0
Received:     5726286      14116        183       5387
Send:          457735       9739          0          0
Received:     5348552       9768        166       2971
Send:          294927       6275          0          0
Received:     5248224       6304        281        629
Send:           85589       1821          0          0
Received:     5006033       1850        486       1323
Send:          170095       3619          0          0
Received:     5147248       3648        132        472
Send:           56825       1209          0          0
Received:     3077764       1238        123       1085
Send:           97339       2071          0          0
Received:     5100687       2100        291       1027
Send:           35393        753          0          0
Received:     5010920        782        170        445
Send:            6488        138          0          0
Received:     3018371        167         66         71
Total network traffic: 1607303 bytes sent, 19503853 bytes received
Rebuilding repository meta-data...
34951 (100%)...
project-id: 2ab58778c2967968b94284e989e43dc11791f548
server-id:  fc36ac9c5028182ec9aa22eed764f8f253d0db91
admin-user: veter (password is "******")

real	7m3.915s
user	2m22.817s
sys	0m10.765s

$ ls -hl sqlite-src.fossil 
-rw-r--r-- 1 veter veter 32M Окт  6 21:58 sqlite-src.fossil

I think the speed of 77 kB/s is really slow. Well, the "Rebuilding repository meta-data..." state can be slow... but why is slow the artifacts downloading and inserting?

See this test:

$ time sqlite3 sqlite-src.fossil '.dump'>sqlite.dump

real	0m3.276s
user	0m2.780s
sys	0m0.460s

$ time sqlite3 sqlite-src2.fossil '.read sqlite.dump'

real	0m14.839s
user	0m13.357s
sys	0m0.396s

Yes, SQLite is fast and Fossil is very slow.


anonymous added on 2010-10-08 10:24:59:
Trying the clone the initial conversion of the pkgsrc repository:

fossil: *** time skew *** server time differs by 4.0 minutes

after the initial "Send". That's when talking with localhost.


anonymous added on 2010-10-08 12:43:55:
The initial trace of the pkgsrc repository can be found at ftp://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/pkgsrc-fossil for download. fossil clone from localhost has consumed 2h of CPU time now with no visual progress. I'll upload a newer version later with all branches, but that's still work-in-progress.


drh added on 2010-10-08 14:31:42:
A very impressive repository - thanks for sharing it.

pkgsrc.fossil is about 10x or 100x larger than any repository we have dealt with before. SQLite is really the biggest repository that Fossil deals with on a regular basis. SQLite has 10.3 years of history compared to 13.1 years of history with pkgsrc. But there have been an average of just 2.2 checkins per day to SQLite for a total of 8446 checkins, whereas pkgsrc has had on average almost 40 checkins per day for a total of 190281 checkins. Fossil likes to store the original content of the latest checkin as full-text and then store prior versions as deltas from the latest. (Most other VCSes do the same - RCS in particular.) In SQLite, the longest delta chain is therefore 8446 deep - which is deep but not unmanageable. With pkgsrc, the longest change is a whopping 190281 deep - 22x deeper. It makes me wonder if we shouldn't trick fossil into storing a full-text version of each file after some fixed number of deltas - for example store full text after each 500 or 1000 deltas. That will make the respository larger, but it will also make a 100x performance improvement when trying to access files that are very deep in the delta stack.

SQLite has 1057 separate files. We host a few other repositories with more than this, but never more than a couple thousand. pksrc, on the other hand has over 100,000 separate files. Many of the file browsing links scan the entire list of files. With only a thousand or so, this is not problem. But with 100x as many, those pages are slow. We might need to reconsider the way some of the file browsing pages are rendered.

The huge number of files also makes for large manifests. I looked at one of the more recent manifests in pkgsrc and it was 60936 lines long and contains nearly 5MB of text. There are about 340MB of content in a checkout. Fossil does lots of consistency checks for each check-in, which involves computing MD5 checksums over the entire checkout, twice. Running MD5 over 720MB takes some time. A simple checkin on my 4-year-old desktop used 18 seconds of CPU time and 55 seconds of real-time. (Note that a clone of the repository was running in parallel while doing the performance test of the previous sentence.) Checkins would be very fast if we turned off the error checks. Maybe that needs to be an option.

My clone (via localhost) has made substantial progress in 33 minutes of CPU time. But it seems to be stuck now.


anonymous claiming to be urmil added on 2010-10-08 14:42:53:
I was thinking of getting some help to import our ~10yr old cvs repository to fossil. After reading above post, I'm thinking other way... My repository, when checked out, takes 1.7 GB. I'm scared to import it to fossil - wonder how long will it take to commit :(


anonymous added on 2010-10-09 07:48:14:
urmil: The pkgsrc tree has a checkout size of 500MB. The CVS repository is around 4GB and the temporary database that contains all revisions undeltafied (but compressed) is around 8GB. I plan to make my tools public at some point to help others with migrations, but you might want to wait a bit if you have many files.


drh added on 2010-10-09 10:21:06:
The problem with pkgsrc is not the size of the checkout. The SQL Logic Test program for SQLite is in a Fossil repository where each checkout is about 1.3GB in size and it works fine. The issue Fossil has with pkgsrc is the huge number of files in a check-out, about 60k. SQL Logic Test is a much larger checkout in total size, but it only has 645 files. In other words, pkgsrc has a huge number of very small files and SQL Logic Test has a small number of very large files. Fossil currently handles the latter case fine - it is the former case that is giving trouble.

I have a strategy, which I will describe separately, to address the pkgsrc scaling issue of a large number of small files.


renez added on 2010-10-10 11:07:53:
Pkgsrc is a very special application. I don't think there many more like that. Not your typical run of the mill source code. I wonder has it been migrated to GIT? What are there the performance figures. I hope we are not going to optimize fossil for this special purpose!


anonymous added on 2010-10-10 12:44:41:
The pkgsrc repository is not that special. The very same kind of issue you will find for pretty much any large scale repository to some extend. I strongly suspect that the fixes for the issues found here are going to help smaller repositories too. It just won't be that visible in them.


renez added on 2010-10-10 18:50:17:
No it is special because in essence it are all small unrelated projects

  • x11/kde is unrelated to x11/gtk
  • editor/abiword not related editor/xxxx

It is not about source code control. I have the impression that pkgsrc is using the wrong tool for the job.


anonymous added on 2010-10-10 23:35:53:
Given that there are a lot of commits that span all top-level directories and up to 2000 files or so, your assumptions obviously don't hold. The repository is also only one of two majors modules on (anon)cvs.netbsd.org and the main tree (src) is even larger in almost every regard.


anonymous added on 2010-10-11 19:25:05:
I've replaced ftp://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/pkgsrc-fossil with a newer trace that contains all non-vendor branches.


anonymous claiming to be Joerg Sonnenberger added on 2010-12-21 14:04:19:
With [d16ccde510] and a fresh trace of pkgsrc, fossil clone http://localhost:8080 now takes:

real	5m41.922s
user	1m52.690s
sys	0m15.750s

...until the rebuild starts. IO delays can be neglected and this is completely dominated by insufficient asynchronous processing between client and server. This time is not masked by network speed either.


anonymous added on 2011-05-10 12:42:40 UTC:
The DragonFly BSD project maintains a git mirror of pkgsrc at git://git.dragonflybsd.org/pkgsrcv2.git

Here are some git clone times for comparison. First from the network (slow connection 389 KB/s) and then from local.

$ cd /tmp
$ time git clone --mirror git://git.dragonflybsd.org/pkgsrcv2.git pkgsrcgit
Cloning into bare repository pkgsrcgit...
remote: Counting objects: 1590487, done.
remote: Compressing objects: 100% (470270/470270), done.
remote: Total 1590487 (delta 1117330), reused 1564721 (delta 1096191)
Receiving objects: 100% (1590487/1590487), 473.92 MiB | 389 KiB/s, done.
Resolving deltas: 100% (1117330/1117330), done.
real	26m57.789s
user	2m35.646s
sys	0m45.874s
$ time git clone --mirror file:///tmp/pkgsrcgit pkgsrcgit2
Cloning into bare repository pkgsrcgit2...
remote: Counting objects: 1590487, done.
remote: Compressing objects: 100% (449131/449131), done.
Receiving objects: 100% (1590487/1590487), 473.92 MiB | 8.93 MiB/s, done.
remote: Total 1590487 (delta 1117330), reused 1590487 (delta 1117330)
Resolving deltas: 100% (1117330/1117330), done.
real	7m0.143s
user	3m24.122s
sys	0m47.543s

Then the equivalent of a "fossil open" which actually checked out 63027 files into the local directory.

$ time git clone /tmp/pkgsrcgit pkgsrcopen
Cloning into pkgsrcopen...
done.
real	0m56.812s
user	0m5.374s
sys	0m15.589s

Then a "git status"

$ cd pkgsrcopen
$ time git status
# On branch master
nothing to commit (working directory clean)
real	0m10.397s
user	0m0.818s
sys	0m3.721s