Ticket Hash: | 6c5471445173a17fffa6aaf867f2f7d3da8f8151 | |||
Title: | fossil clone is slow | |||
Status: | Open | Type: | Code_Defect | |
Severity: | Important | Priority: | ||
Subsystem: | Resolution: | Open | ||
Last Modified: | 2011-05-10 12:42:40 | |||
Version Found In: | c492eab395 | |||
Description: | ||||
"fossil clone" over wire is extremely slow for large repositories.
Compare: 117.76s user 3.48s system 90% cpu 2:13.33 total for clone from local file to local file and 349.86s user 17.71s system 59% cpu 10:16.45 total for clone from http://localhost:8080. Total network traffic is around 20MB (for a 40MB repository), so that would be 8min for the data transfer or 42KB/s, which is clearly not acceptable. drh added on 2010-10-03 18:04:00: anonymous added on 2010-10-05 16:16:54: % time ~/work/fossil/fossil clone http://localhost:8080 test-clone Bytes Cards Artifacts Deltas Send: 49 1 0 0 Received: 1236613 26882 0 0 Send: 10025 225 0 0 Received: 104025 206 34 166 Send: 18849 405 0 0 Received: 430475 402 47 353 Send: 37649 801 0 0 Received: 391721 802 53 747 Send: 75249 1601 0 0 Received: 1401506 1602 133 1467 Send: 150449 3201 0 0 Received: 8329951 3202 527 1132 Send: 155995 3319 0 0 Received: 5152386 3320 19 0 Send: 9449 201 0 0 Received: 11672693 202 51 4 Send: 9449 201 0 0 Received: 5011269 202 19 0 Send: 9449 201 0 0 Received: 4747929 202 91 109 Send: 18849 401 0 0 Received: 2544348 402 246 154 Send: 37649 801 0 0 Received: 3850220 802 323 477 Send: 75249 1601 0 0 Received: 2532722 1602 317 1283 Send: 150449 3201 0 0 Received: 2519081 3202 216 2984 Send: 300849 6401 0 0 Received: 11316003 6402 577 2470 Send: 286467 6095 0 0 Received: 5879597 6096 2 3 Send: 9449 201 0 0 Received: 2362448 202 98 102 Send: 18849 401 0 0 Received: 1664619 402 156 244 Send: 37649 801 0 0 Received: 3444746 802 368 432 Send: 75249 1601 0 0 Received: 5068184 1602 144 21 Send: 15559 331 0 0 Received: 2669945 332 106 224 Send: 31069 661 0 0 Received: 236198 662 0 660 Send: 62089 1321 0 0 Received: 506061 1322 0 1320 Send: 124129 2641 0 0 Received: 988722 2642 0 2640 Send: 248209 5281 0 0 Received: 2033552 5282 0 5280 Send: 50903 1083 0 0 Received: 1696392 1084 4 1078 Total network traffic: 1066680 bytes sent, 28570580 bytes received Rebuilding repository meta-data... 26881 (100%)... project-id: 72f6eddf4e758c90816c58765010339df04f22f9 server-id: 006b3b3f853f6333ad52939d9d2643dd2980db40 admin-user: joerg (password is "***") 11633 (99%)... ~/work/fossil/fossil clone http://localhost:8080 test-clone 388.29s user 17.44s system 79% cpu 8:31.89 total E.g. better, but still a lot of time. Another observation is the space usage of the new clone: Database Stats: 197170 pages, 1024 bytes/page, 149894 free pages, UTF-8, delete mode The clone is four times the necessary size. wolfgang added on 2010-10-05 17:13:36:
I think, these times are OK(2 minutes for 20MB), if we remember, that running the http-variant uses 2 fossil instances and all traffic is checked by firewall and virus scanner and that fossil isn't designed as a high-speed html server. anonymous added on 2010-10-05 20:25:56: The test case here is not effected by network bandwidth or latency. The test case is not bound by disk bandwidth either, since it is small enough to fit comfortably into memory. As such, it should be a lot faster. For example, I don't understand why it needs so many round trips. The initial clone is the easy problem, both side should know exactly what the other side has. I wouldn't be surprised if this kind of problems don't happen with a less skewed distribution of the commits (1/3 of all commits to one file), but that doesn't mean it can't be done better. I'm going to try later with a repository that I can publish, so hopefully it can be sorted out. anonymous claiming to be Alexey Pechnikov added on 2010-10-06 18:08:11: $ time fossil clone http://sqlite.org/src sqlite-src.fossil Bytes Cards Artifacts Deltas Send: 49 1 0 0 Received: 1607865 34952 0 0 Send: 10025 225 0 0 Received: 138945 244 7 193 Send: 18849 415 0 0 Received: 153088 430 3 397 Send: 37649 801 0 0 Received: 876957 830 71 729 Send: 75249 1601 0 0 Received: 1018521 1630 63 1537 Send: 150449 3201 0 0 Received: 1896100 3230 63 3137 Send: 300849 6401 0 0 Received: 3211644 6430 91 6309 Send: 601649 12801 0 0 Received: 5286633 12830 176 6867 Send: 662091 14087 0 0 Received: 5726286 14116 183 5387 Send: 457735 9739 0 0 Received: 5348552 9768 166 2971 Send: 294927 6275 0 0 Received: 5248224 6304 281 629 Send: 85589 1821 0 0 Received: 5006033 1850 486 1323 Send: 170095 3619 0 0 Received: 5147248 3648 132 472 Send: 56825 1209 0 0 Received: 3077764 1238 123 1085 Send: 97339 2071 0 0 Received: 5100687 2100 291 1027 Send: 35393 753 0 0 Received: 5010920 782 170 445 Send: 6488 138 0 0 Received: 3018371 167 66 71 Total network traffic: 1607303 bytes sent, 19503853 bytes received Rebuilding repository meta-data... 34951 (100%)... project-id: 2ab58778c2967968b94284e989e43dc11791f548 server-id: fc36ac9c5028182ec9aa22eed764f8f253d0db91 admin-user: veter (password is "******") real 7m3.915s user 2m22.817s sys 0m10.765s $ ls -hl sqlite-src.fossil -rw-r--r-- 1 veter veter 32M Окт 6 21:58 sqlite-src.fossil I think the speed of 77 kB/s is really slow. Well, the "Rebuilding repository meta-data..." state can be slow... but why is slow the artifacts downloading and inserting? See this test: $ time sqlite3 sqlite-src.fossil '.dump'>sqlite.dump real 0m3.276s user 0m2.780s sys 0m0.460s $ time sqlite3 sqlite-src2.fossil '.read sqlite.dump' real 0m14.839s user 0m13.357s sys 0m0.396s Yes, SQLite is fast and Fossil is very slow. anonymous added on 2010-10-08 10:24:59: fossil: *** time skew *** server time differs by 4.0 minutes after the initial "Send". That's when talking with localhost. anonymous added on 2010-10-08 12:43:55: drh added on 2010-10-08 14:31:42: pkgsrc.fossil is about 10x or 100x larger than any repository we have dealt with before. SQLite is really the biggest repository that Fossil deals with on a regular basis. SQLite has 10.3 years of history compared to 13.1 years of history with pkgsrc. But there have been an average of just 2.2 checkins per day to SQLite for a total of 8446 checkins, whereas pkgsrc has had on average almost 40 checkins per day for a total of 190281 checkins. Fossil likes to store the original content of the latest checkin as full-text and then store prior versions as deltas from the latest. (Most other VCSes do the same - RCS in particular.) In SQLite, the longest delta chain is therefore 8446 deep - which is deep but not unmanageable. With pkgsrc, the longest change is a whopping 190281 deep - 22x deeper. It makes me wonder if we shouldn't trick fossil into storing a full-text version of each file after some fixed number of deltas - for example store full text after each 500 or 1000 deltas. That will make the respository larger, but it will also make a 100x performance improvement when trying to access files that are very deep in the delta stack. SQLite has 1057 separate files. We host a few other repositories with more than this, but never more than a couple thousand. pksrc, on the other hand has over 100,000 separate files. Many of the file browsing links scan the entire list of files. With only a thousand or so, this is not problem. But with 100x as many, those pages are slow. We might need to reconsider the way some of the file browsing pages are rendered. The huge number of files also makes for large manifests. I looked at one of the more recent manifests in pkgsrc and it was 60936 lines long and contains nearly 5MB of text. There are about 340MB of content in a checkout. Fossil does lots of consistency checks for each check-in, which involves computing MD5 checksums over the entire checkout, twice. Running MD5 over 720MB takes some time. A simple checkin on my 4-year-old desktop used 18 seconds of CPU time and 55 seconds of real-time. (Note that a clone of the repository was running in parallel while doing the performance test of the previous sentence.) Checkins would be very fast if we turned off the error checks. Maybe that needs to be an option. My clone (via localhost) has made substantial progress in 33 minutes of CPU time. But it seems to be stuck now. anonymous claiming to be urmil added on 2010-10-08 14:42:53: anonymous added on 2010-10-09 07:48:14: drh added on 2010-10-09 10:21:06: I have a strategy, which I will describe separately, to address the pkgsrc scaling issue of a large number of small files. renez added on 2010-10-10 11:07:53: anonymous added on 2010-10-10 12:44:41: renez added on 2010-10-10 18:50:17:
It is not about source code control. I have the impression that pkgsrc is using the wrong tool for the job. anonymous added on 2010-10-10 23:35:53: anonymous added on 2010-10-11 19:25:05: anonymous claiming to be Joerg Sonnenberger added on 2010-12-21 14:04:19: real 5m41.922s user 1m52.690s sys 0m15.750s ...until the rebuild starts. IO delays can be neglected and this is completely dominated by insufficient asynchronous processing between client and server. This time is not masked by network speed either. anonymous added on 2011-05-10 12:42:40 UTC: Here are some git clone times for comparison. First from the network (slow connection 389 KB/s) and then from local. $ cd /tmp $ time git clone --mirror git://git.dragonflybsd.org/pkgsrcv2.git pkgsrcgit Cloning into bare repository pkgsrcgit... remote: Counting objects: 1590487, done. remote: Compressing objects: 100% (470270/470270), done. remote: Total 1590487 (delta 1117330), reused 1564721 (delta 1096191) Receiving objects: 100% (1590487/1590487), 473.92 MiB | 389 KiB/s, done. Resolving deltas: 100% (1117330/1117330), done. real 26m57.789s user 2m35.646s sys 0m45.874s $ time git clone --mirror file:///tmp/pkgsrcgit pkgsrcgit2 Cloning into bare repository pkgsrcgit2... remote: Counting objects: 1590487, done. remote: Compressing objects: 100% (449131/449131), done. Receiving objects: 100% (1590487/1590487), 473.92 MiB | 8.93 MiB/s, done. remote: Total 1590487 (delta 1117330), reused 1590487 (delta 1117330) Resolving deltas: 100% (1117330/1117330), done. real 7m0.143s user 3m24.122s sys 0m47.543s Then the equivalent of a "fossil open" which actually checked out 63027 files into the local directory. $ time git clone /tmp/pkgsrcgit pkgsrcopen Cloning into pkgsrcopen... done. real 0m56.812s user 0m5.374s sys 0m15.589s Then a "git status" $ cd pkgsrcopen $ time git status # On branch master nothing to commit (working directory clean) real 0m10.397s user 0m0.818s sys 0m3.721s |