"fossil clone" over wire is extremely slow for large repositories.
Compare:
117.76s user 3.48s system 90% cpu 2:13.33 total
for clone from local file to local file and
349.86s user 17.71s system 59% cpu 10:16.45 total
for clone from http://localhost:8080.
Total network traffic is around 20MB (for a 40MB repository), so that would be 8min for the data transfer or 42KB/s, which is clearly not acceptable.
<hr /><i>drh added on 2010-10-03 18:04:00:</i><br />
Once again, this is completely outside of our 3-year experience deploying
Fossil. It suggests that your repository has a very different structure
that what we are used to seeing. If you could share your repository with
us, that would be helpful in getting to the root of the problem.
<hr /><i>anonymous added on 2010-10-05 16:16:54:</i><br />
For the record, with the recent small stack fix, I see:
<verbatim>
% time ~/work/fossil/fossil clone http://localhost:8080 test-clone
Bytes Cards Artifacts Deltas
Send: 49 1 0 0
Received: 1236613 26882 0 0
Send: 10025 225 0 0
Received: 104025 206 34 166
Send: 18849 405 0 0
Received: 430475 402 47 353
Send: 37649 801 0 0
Received: 391721 802 53 747
Send: 75249 1601 0 0
Received: 1401506 1602 133 1467
Send: 150449 3201 0 0
Received: 8329951 3202 527 1132
Send: 155995 3319 0 0
Received: 5152386 3320 19 0
Send: 9449 201 0 0
Received: 11672693 202 51 4
Send: 9449 201 0 0
Received: 5011269 202 19 0
Send: 9449 201 0 0
Received: 4747929 202 91 109
Send: 18849 401 0 0
Received: 2544348 402 246 154
Send: 37649 801 0 0
Received: 3850220 802 323 477
Send: 75249 1601 0 0
Received: 2532722 1602 317 1283
Send: 150449 3201 0 0
Received: 2519081 3202 216 2984
Send: 300849 6401 0 0
Received: 11316003 6402 577 2470
Send: 286467 6095 0 0
Received: 5879597 6096 2 3
Send: 9449 201 0 0
Received: 2362448 202 98 102
Send: 18849 401 0 0
Received: 1664619 402 156 244
Send: 37649 801 0 0
Received: 3444746 802 368 432
Send: 75249 1601 0 0
Received: 5068184 1602 144 21
Send: 15559 331 0 0
Received: 2669945 332 106 224
Send: 31069 661 0 0
Received: 236198 662 0 660
Send: 62089 1321 0 0
Received: 506061 1322 0 1320
Send: 124129 2641 0 0
Received: 988722 2642 0 2640
Send: 248209 5281 0 0
Received: 2033552 5282 0 5280
Send: 50903 1083 0 0
Received: 1696392 1084 4 1078
Total network traffic: 1066680 bytes sent, 28570580 bytes received
Rebuilding repository meta-data...
26881 (100%)...
project-id: 72f6eddf4e758c90816c58765010339df04f22f9
server-id: 006b3b3f853f6333ad52939d9d2643dd2980db40
admin-user: joerg (password is "***")
11633 (99%)...
~/work/fossil/fossil clone http://localhost:8080 test-clone 388.29s user 17.44s system 79% cpu 8:31.89 total
</verbatim>
E.g. better, but still a lot of time. Another observation is the space usage of the new clone:
Database Stats: 197170 pages, 1024 bytes/page, 149894 free pages, UTF-8, delete mode
The clone is four times the necessary size.
<hr /><i>wolfgang added on 2010-10-05 17:13:36:</i><br />
i did a small performance test on my windows machine with a DMC compiled version [63d91f0b87] 2010-10-05 16:37:40 UTC:
* Using localhost http clone for sqlite(~20MB transfered) 3 minutes running against a local "fossil server" service, CPU load goes up to 90%
* Using file copy clone for sqlite 1 minutes<br>which is infact a filecopy followed by a rebuild
I think, these times are OK(2 minutes for 20MB), if we remember, that running the http-variant uses 2 fossil instances and all traffic is checked by firewall and virus scanner and that fossil isn't designed as a high-speed html server.
<hr /><i>anonymous added on 2010-10-05 20:25:56:</i><br />
For fossil to be a practical alternative to cvs/svn/git/..., it has to perform well for at least two network related tasks:
(1) Initial clone
(2) Update of a recent enough clone.
The test case here is not effected by network bandwidth or latency. The test case is not bound by disk bandwidth either, since it is small enough to fit comfortably into memory. As such, it should be a lot faster. For example, I don't understand why it needs so many round trips. The initial clone is the easy problem, both side should know exactly what the other side has.
I wouldn't be surprised if this kind of problems don't happen with a less skewed distribution of the commits (1/3 of all commits to one file), but that doesn't mean it can't be done better. I'm going to try later with a repository that I can publish, so hopefully it can be sorted out.
<hr /><i>anonymous claiming to be Alexey Pechnikov added on 2010-10-06 18:08:11:</i><br />
On my Atom netbook 1,6Ggh I have:
<verbatim>
$ time fossil clone http://sqlite.org/src sqlite-src.fossil
Bytes Cards Artifacts Deltas
Send: 49 1 0 0
Received: 1607865 34952 0 0
Send: 10025 225 0 0
Received: 138945 244 7 193
Send: 18849 415 0 0
Received: 153088 430 3 397
Send: 37649 801 0 0
Received: 876957 830 71 729
Send: 75249 1601 0 0
Received: 1018521 1630 63 1537
Send: 150449 3201 0 0
Received: 1896100 3230 63 3137
Send: 300849 6401 0 0
Received: 3211644 6430 91 6309
Send: 601649 12801 0 0
Received: 5286633 12830 176 6867
Send: 662091 14087 0 0
Received: 5726286 14116 183 5387
Send: 457735 9739 0 0
Received: 5348552 9768 166 2971
Send: 294927 6275 0 0
Received: 5248224 6304 281 629
Send: 85589 1821 0 0
Received: 5006033 1850 486 1323
Send: 170095 3619 0 0
Received: 5147248 3648 132 472
Send: 56825 1209 0 0
Received: 3077764 1238 123 1085
Send: 97339 2071 0 0
Received: 5100687 2100 291 1027
Send: 35393 753 0 0
Received: 5010920 782 170 445
Send: 6488 138 0 0
Received: 3018371 167 66 71
Total network traffic: 1607303 bytes sent, 19503853 bytes received
Rebuilding repository meta-data...
34951 (100%)...
project-id: 2ab58778c2967968b94284e989e43dc11791f548
server-id: fc36ac9c5028182ec9aa22eed764f8f253d0db91
admin-user: veter (password is "******")
real 7m3.915s
user 2m22.817s
sys 0m10.765s
$ ls -hl sqlite-src.fossil
-rw-r--r-- 1 veter veter 32M Окт 6 21:58 sqlite-src.fossil
</verbatim>
I think the speed of 77 kB/s is really slow. Well, the "Rebuilding repository meta-data..." state can be slow... but why is slow the artifacts downloading and inserting?
See this test:
<verbatim>
$ time sqlite3 sqlite-src.fossil '.dump'>sqlite.dump
real 0m3.276s
user 0m2.780s
sys 0m0.460s
$ time sqlite3 sqlite-src2.fossil '.read sqlite.dump'
real 0m14.839s
user 0m13.357s
sys 0m0.396s
</verbatim>
Yes, SQLite is fast and Fossil is very slow.
<hr /><i>anonymous added on 2010-10-08 10:24:59:</i><br />
Trying the clone the initial conversion of the pkgsrc repository:
<verbatim>
fossil: *** time skew *** server time differs by 4.0 minutes
</verbatim>
after the initial "Send". That's when talking with localhost.
<hr /><i>anonymous added on 2010-10-08 12:43:55:</i><br />
The initial trace of the pkgsrc repository can be found at ftp://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/pkgsrc-fossil for download. fossil clone from localhost has consumed 2h of CPU time now with no visual progress. I'll upload a newer version later with all branches, but that's still work-in-progress.
<hr /><i>drh added on 2010-10-08 14:31:42:</i><br />
A very impressive repository - thanks for sharing it.
pkgsrc.fossil is about 10x or 100x larger than any repository we have dealt
with before. SQLite is really the biggest repository that Fossil deals with
on a regular basis. SQLite has 10.3 years of history compared to 13.1 years
of history with pkgsrc. But there have been an average of just 2.2 checkins
per day to SQLite for a total of 8446 checkins, whereas pkgsrc has had on
average almost 40 checkins per day for a total of 190281 checkins.
Fossil likes to store the original content of the latest checkin as
full-text and then store prior versions as deltas from the latest. (Most
other VCSes do the same - RCS in particular.) In SQLite, the longest delta
chain is therefore 8446 deep - which is deep but not unmanageable. With
pkgsrc, the longest change is a whopping 190281 deep - 22x deeper. It
makes me wonder if we shouldn't trick fossil into storing a full-text
version of each file after some fixed number of deltas - for example
store full text after each 500 or 1000 deltas. That will make the
respository larger, but it will also make a 100x performance improvement
when trying to access files that are very deep in the delta stack.
SQLite has 1057 separate files. We host a few other repositories with
more than this, but never more than a couple thousand. pksrc, on the
other hand has over 100,000 separate files. Many of the file browsing
links scan the entire list of files. With only a thousand or so, this
is not problem. But with 100x as many, those pages are slow. We might
need to reconsider the way some of the file browsing pages are rendered.
The huge number of files also makes for large manifests. I looked at
one of the more recent manifests in pkgsrc and it was 60936 lines long
and contains nearly 5MB of text. There are about 340MB of content in
a checkout. Fossil does lots of consistency checks for each check-in,
which involves computing MD5 checksums over the entire checkout, twice.
Running MD5 over 720MB takes some time. A simple checkin on my
4-year-old desktop used 18 seconds of CPU time and 55 seconds of
real-time. (Note that a clone of the repository was running in parallel
while doing the performance test of the previous sentence.) Checkins
would be very fast if we turned off the error checks. Maybe that
needs to be an option.
My clone (via localhost) has made substantial progress in 33 minutes
of CPU time. But it seems to be stuck now.
<hr /><i>anonymous claiming to be urmil added on 2010-10-08 14:42:53:</i><br />
I was thinking of getting some help to import our ~10yr old cvs repository to fossil. After reading above post, I'm thinking other way... My repository, when checked out, takes 1.7 GB. I'm scared to import it to fossil - wonder how long will it take to commit :(
<hr /><i>anonymous added on 2010-10-09 07:48:14:</i><br />
urmil: The pkgsrc tree has a checkout size of 500MB. The CVS repository is around 4GB and the temporary database that contains all revisions undeltafied (but compressed) is around 8GB. I plan to make my tools public at some point to help others with migrations, but you might want to wait a bit if you have many files.
<hr /><i>drh added on 2010-10-09 10:21:06:</i><br />
The problem with pkgsrc is not the size of the checkout. The
[http://www.sqlite.org/slt/timeline | SQL Logic Test] program for SQLite
is in a Fossil repository where each checkout is about 1.3GB in size
and it works fine. The issue Fossil has with pkgsrc is the huge number
of files in a check-out, about 60k. SQL Logic Test is a much larger
checkout in total size, but it only has 645 files. In other words,
pkgsrc has a huge number of very small files and SQL Logic Test has
a small number of very large files. Fossil currently handles the latter
case fine - it is the former case that is giving trouble.
I have a strategy, which I will describe separately, to address the
pkgsrc scaling issue of a large number of small files.
<hr /><i>renez added on 2010-10-10 11:07:53:</i><br />
Pkgsrc is a very special application. I don't think there many more like that. Not your typical run of the mill source code. I wonder has it been migrated to GIT?
What are there the performance figures.
I hope we are not going to optimize fossil for this special purpose!
<hr /><i>anonymous added on 2010-10-10 12:44:41:</i><br />
The pkgsrc repository is not that special. The very same kind of issue you will find for pretty much any large scale repository to some extend. I strongly suspect that the fixes for the issues found here are going to help smaller repositories too. It just won't be that visible in them.
<hr /><i>renez added on 2010-10-10 18:50:17:</i><br />
No it is special because in essence it are all small unrelated projects
* x11/kde is unrelated to x11/gtk
* editor/abiword not related editor/xxxx
It is not about source code control. I have the impression that pkgsrc is using the wrong tool for the job.
<hr /><i>anonymous added on 2010-10-10 23:35:53:</i><br />
Given that there are a lot of commits that span all top-level directories and up to 2000 files or so, your assumptions obviously don't hold. The repository is also only one of two majors modules on (anon)cvs.netbsd.org and the main tree (src) is even larger in almost every regard.
<hr /><i>anonymous added on 2010-10-11 19:25:05:</i><br />
I've replaced ftp://ftp.netbsd.org/pub/NetBSD/misc/repositories/fossil/pkgsrc-fossil with a newer trace that contains all non-vendor branches.
<hr /><i>anonymous claiming to be Joerg Sonnenberger added on 2010-12-21 14:04:19:</i><br />
With [d16ccde510] and a fresh trace of pkgsrc, fossil clone http://localhost:8080 now takes:
<verbatim>
real 5m41.922s
user 1m52.690s
sys 0m15.750s
</verbatim>
...until the rebuild starts. IO delays can be neglected and this is completely dominated by insufficient asynchronous processing between client and server. This time is not masked by network speed either.
<hr /><i>anonymous added on 2011-05-10 12:42:40 UTC:</i><br />
The DragonFly BSD project maintains a git mirror of pkgsrc at <tt>git://git.dragonflybsd.org/pkgsrcv2.git</tt>
Here are some git clone times for comparison. First from the network (slow connection 389 KB/s) and then from local.
<pre>
$ cd /tmp
</pre>
<pre>
$ time git clone --mirror git://git.dragonflybsd.org/pkgsrcv2.git pkgsrcgit
</pre>
<pre>
Cloning into bare repository pkgsrcgit...
remote: Counting objects: 1590487, done.
remote: Compressing objects: 100% (470270/470270), done.
remote: Total 1590487 (delta 1117330), reused 1564721 (delta 1096191)
Receiving objects: 100% (1590487/1590487), 473.92 MiB | 389 KiB/s, done.
Resolving deltas: 100% (1117330/1117330), done.
</pre>
<pre>
real 26m57.789s
user 2m35.646s
sys 0m45.874s
</pre>
<pre>
$ time git clone --mirror file:///tmp/pkgsrcgit pkgsrcgit2
</pre>
<pre>
Cloning into bare repository pkgsrcgit2...
remote: Counting objects: 1590487, done.
remote: Compressing objects: 100% (449131/449131), done.
Receiving objects: 100% (1590487/1590487), 473.92 MiB | 8.93 MiB/s, done.
remote: Total 1590487 (delta 1117330), reused 1590487 (delta 1117330)
Resolving deltas: 100% (1117330/1117330), done.
</pre>
<pre>
real 7m0.143s
user 3m24.122s
sys 0m47.543s
</pre>
Then the equivalent of a "fossil open" which actually checked out 63027 files into the local directory.
<pre>
$ time git clone /tmp/pkgsrcgit pkgsrcopen
</pre>
<pre>
Cloning into pkgsrcopen...
done.
</pre>
<pre>
real 0m56.812s
user 0m5.374s
sys 0m15.589s
</pre>
Then a "git status"
<pre>
$ cd pkgsrcopen
</pre>
<pre>
$ time git status
</pre>
<pre>
# On branch master
nothing to commit (working directory clean)
</pre>
<pre>
real 0m10.397s
user 0m0.818s
sys 0m3.721s
</pre>
|