Fossil Forum

Repository sizes - experience reports appreciated
Login

Repository sizes - experience reports appreciated

Repository sizes - experience reports appreciated

(1) By MBL (RoboManni) on 2020-09-02 15:05:26 [link] [source]

I now have a use case where I expect growth of about 1 MB per day. I started just gaining experience with my new feature for productive usage. The expected storage duration should span a time frame of about 10 years or more.

My case will make heavy use of delta-compression.

  1. Which sizes the fossil repositories have grown to all over the world so far?
  2. Were any noticeable performance issues recognized?
  3. How could a long living huge repository be split into smaller history portions and a recent living one? Longer access time to history portions would be acceptable.

(2) By Stephan Beal (stephan) on 2020-09-02 15:14:11 in reply to 1 [link] [source]

I now have a use case where I expect growth of about 1 MB per day. ... My case will make heavy use of delta-compression.

If you're changing the same text files (even large ones) repeatedly, you may be surprised how well delta compression, together with zlib compression, works. If you're making lots of changes to poorly-compressible data, e.g. image files, encrypted files, zips, and whatnot, 1MB/day will add up very quickly.

Which sizes the fossil repositories have grown to all over the world so far?

Fossil's core repo, cloned, is approx. 72MB. The original core one is larger because of unversioned content.

sqlite3's repo was, about 2-3 weeks ago, 88MB. That's one of the single most active repos, though tcl's seems to surpass it:

https://core.tcl-lang.org/tcl/timeline

The stat page says that it's ... it's... still calculating... HTTP error 520.

i haven't cloned that one in years, so am not sure how big it is. Apparently big enough that /stat takes long enough for it to time out.

(3) By Richard Hipp (drh) on 2020-09-02 15:16:25 in reply to 1 [link] [source]

Some example:

(4) By MBL (RoboManni) on 2020-09-02 15:28:18 in reply to 3 [link] [source]

That's really impressive figures. From last few hours here are my figures so far:

Repository Size:	6,356,992 bytes
Number Of Artifacts:	204 (29 fulltext and 175 deltas)
Uncompressed Artifact Size:	3,384,431 bytes average, 7,836,603 bytes max, 690,423,962 total
Compression Ratio:	108:1
Number Of Check-ins:	37
Number Of Files:	21
Number Of Wiki Pages:	0
Duration Of Project:	1 days or approximately 0.00 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Device-files
Fossil Version:	2020-08-25 00:01:25 [88ff2642d3] (2.13) (details)
SQLite Version:	2020-08-14 13:23:32 [fca8dc8b57] (3.33.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2020-09-02 08:27:45 By Fossil 2.13 [88ff2642d3] 2020-08-25 00:01:25 UTC
Database Stats:	1,552 pages, 4096 bytes/page, 102 free pages, UTF-8, delete mode

(6) By Stephan Beal (stephan) on 2020-09-02 15:43:23 in reply to 4 [link] [source]

The most interesting thing about:

Compression Ratio: 108:1

... is that that metric tends to improve over time! That is to say, in the few repos i've watched that number in, the ratio has tended to get a larger left-hand value as the repo ages, a sign that the delta compression is doing its thing.

Fossil's is 68:1, sqlite3's is 84:1, and my single most active old repo (8+ years, 3338 commits) has a modest 37:1.

Your 108 is an outstanding value which seems to indicate a tendency of small changes on large (and/or highly compressible) files.

(10) By MBL (RoboManni) on 2020-09-02 16:18:59 in reply to 6 [link] [source]

Yes, you are right, one hour and few commits later the value increased already:

Repository Size:	6,455,296 bytes
Number Of Artifacts:	233 (30 fulltext and 203 deltas)
Uncompressed Artifact Size:	3,332,065 bytes average, 7,836,603 bytes max, 776,371,176 total
Compression Ratio:	120:1
Number Of Check-ins:	45
Number Of Files:	21
Number Of Wiki Pages:	0
Duration Of Project:	1 days or approximately 0.00 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Device-files
Fossil Version:	2020-08-25 00:01:25 [88ff2642d3] (2.13) (details)
SQLite Version:	2020-08-14 13:23:32 [fca8dc8b57] (3.33.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2020-09-02 08:27:45 By Fossil 2.13 [88ff2642d3] 2020-08-25 00:01:25 UTC
Database Stats:	1,576 pages, 4096 bytes/page, 99 free pages, UTF-8, delete mode

My files are most of the time similar structured and time wise overlapping, logfiles which grow on one end and shrink on the other.

(13) By MBL (RoboManni) on 2020-09-02 18:26:40 in reply to 10 [link] [source]

Compression really becomes better with each hourly run: 142:1

Repository Size:	6,651,904 bytes
Number Of Artifacts:	290 (32 fulltext and 258 deltas)
Uncompressed Artifact Size:	3,268,697 bytes average, 7,836,603 bytes max, 947,922,301 total
Compression Ratio:	142:1
Number Of Check-ins:	61
Number Of Files:	21
Number Of Wiki Pages:	0
Duration Of Project:	1 days or approximately 0.00 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Device-files
Fossil Version:	2020-08-25 00:01:25 [88ff2642d3] (2.13) (details)
SQLite Version:	2020-08-14 13:23:32 [fca8dc8b57] (3.33.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2020-09-02 08:27:45 By Fossil 2.13 [88ff2642d3] 2020-08-25 00:01:25 UTC
Database Stats:	1,624 pages, 4096 bytes/page, 99 free pages, UTF-8, delete mode

(14) By Stephan Beal (stephan) on 2020-09-02 18:42:53 in reply to 13 [link] [source]

Compression really becomes better with each hourly run: 142:1

That's the compression ratio, not the compression itself. It's essentially a comparison of "how big is this data right now in our db?" vs "how much space would we need to unpack all of these data in their entirety?" Looking at the numbers in the first 3 lines of your output:

947,922,301 (undelta'd/uncompressed data)

divided by

6,651,904 (db size)

= 142.5

If you deconstruct that repository you'll have nearly 1GB of files.

For repositories which mostly edit the same files (which is most repos), that ratio will slowly climb over time, but your numbers are true outliers. i don't recall ever having seen a "genuine source repository" with a ratio over 90-ish to 1, and certainly not a repo where that value rises so quickly.

(15) By MBL (RoboManni) on 2020-09-09 16:09:00 in reply to 13 [link] [source]

Here a short update after 8 days of data collection:

Repository Size:	11,718,656 bytes
Number Of Artifacts:	808 (162 fulltext and 646 deltas)
Uncompressed Artifact Size:	3,091,737 bytes average, 7,836,603 bytes max, 2,498,124,266 total
Compression Ratio:	213:1
Number Of Check-ins:	207
Number Of Files:	21
Number Of Wiki Pages:	0
Duration Of Project:	8 days or approximately 0.02 years.

(16) By Stephan Beal (stephan) on 2020-09-09 16:24:23 in reply to 15 [link] [source]

Compression Ratio: 213:1

That is an unheard of compression ratio. You've packed 2.5GB of data into less than 12MB.

Even if this repo grows to 20GB under your current usage patternsmisref, you won't have the slowdowns associated with the behemoth repos mentioned in a recent thread because your active file count is small ("active" meaning "number of files in the tip checkin"). A rebuild would almost certainly take a good long while, and a deconstruct (should you ever want to, for whatever reason) would become impossible because...

misref = with a compression ratio of 213:1 that would equate to 4260GB of SCM'd data. Deconstructing the db would require that much free drive space and would take an age and half to finish.


  1. ^ a b Misreference

(17) By MBL (RoboManni) on 2020-09-27 17:23:39 in reply to 16 [link] [source]

A short update on my long-running project of daily captured logfiles.

Compression Ratio has gone down to 198:1 but is still very high. After 26 days only 23MB of repository size. My initial estimate was 1MB per day.

Repository Size:	23,838,720 bytes
Number Of Artifacts:	1582 (271 fulltext and 1,311 deltas) Details
Uncompressed Artifact Size:	2,998,318 bytes average, 7,836,603 bytes max, 4,743,339,712 total
Compression Ratio:	198:1
Number Of Check-ins:	414
Number Of Files:	23
Number Of Wiki Pages:	0
Duration Of Project:	26 days or approximately 0.07 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Device-files
Fossil Version:	2020-09-16 13:58:31 [449ab5d600] (2.13) (details)
SQLite Version:	2020-09-15 20:48:30 [3d35fa0be8] (3.34.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2020-09-17 11:58:52 By Fossil 2.13 [449ab5d600] 2020-09-16 13:58:31 UTC
Database Stats:	2,910 pages, 8192 bytes/page, 31 free pages, UTF-8, wal mode
Backoffice:	Last run: never

(5) By Alfred M. Szmidt (ams) on 2020-09-02 15:34:42 in reply to 1 [link] [source]

I have some strange outliers; where fossil works well enough. Neither of these are very old repositories, but they are on the extreme side of size. Fossil does take significantly more time to handle them, specially on the 8.6G one where a commit takes a few good seconds, but it is a extreme case of me having been very lazy.

Repository Size:	591,265,792 bytes
Number Of Artifacts:		28326 (24,208 fulltext and 4,118 deltas) Details
Uncompressed Artifact Size:	110,364 bytes average, 40,166,123 bytes max, 3,126,072,743 total
Compression Ratio:	5:1
Unversioned Files:	2 files, 763.7KB compressed, 0% of total repository space
Number Of Check-ins:	1,139
Number Of Files:	45,405
Number Of Wiki Pages:	9
Duration Of Project:	612 days or approximately 1.68 years.
Repository Size:	8,602,718,208 bytes
Number Of Artifacts:	713 (428 fulltext and 285 deltas) Details
Uncompressed Artifact Size:	12,232,368 bytes average, 919,055,670 bytes max, 8,721,678,596 total
Compression Ratio:	10:10
Number Of Check-ins:	105
Number Of Files:	1,504
Number Of Wiki Pages:	1
Number Of Tickets:	0
Duration Of Project:	27 days or approximately 0.07 years.

(7) By Stephan Beal (stephan) on 2020-09-02 15:48:53 in reply to 5 [source]

Fossil does take significantly more time to handle them, specially on the 8.6G one where a commit takes a few good seconds, but it is a extreme case of me having been very lazy.

Number Of Files: 45,405

Your case is somewhat pathological. The number of files is very likely the main culprit for the slowness.

Every time you commit a change, fossil builds a manifest of that version's content, which include a list of every file in that version. You can probably speed up that process with:

fossil set repo-cksum off

That will disable the so-called R-card part of each manifest, which is a corruption-detection mechanism but is 3rd(?) in a line of such mechanisms and is exceedingly costly to calculate, especially for repos with large and/or very many files. It is the only one of fossil's self-protections which is optional.

(9) By MBL (RoboManni) on 2020-09-02 15:52:44 in reply to 7 [link] [source]

especially for repos with large and/or very many files

many differently named files or many file-change checkin's of few file names?

(11) By Stephan Beal (stephan) on 2020-09-02 16:19:11 in reply to 9 [link] [source]

many differently named files or many file-change checkin's of few file names?

Many distinct files (i.e. file names). The manifest has to list all of the files contained in that version, and if you have 45k files then that manifest is going to take appreciable time to generate and validate, especially if the repo-cksum setting is on (which it is by default), especially if storage is relatively slow.

You can quickly get the number of files in a commit with:

[pi@pi4b8:~/fossil/fossil]$ f ls | wc -l
1012

(12) By Alfred M. Szmidt (ams) on 2020-09-02 16:31:18 in reply to 7 [link] [source]

Yep it was very very much pathological and due to mainly laziness geared to just getting stuff done and no care for proper setup. It was a short lived project that spun of several other separate projects that now live in separate proper and sensible repositories.

Good to know about the repo-cksum option for the next time I do something :-)

(8.1) By MBL (RoboManni) on 2020-09-03 09:46:10 edited from 8.0 in reply to 5 [link] [source]

Some good seconds does not matter in my use case, the commitments from several sources I can spread over time. The block-chaining is more important for me.

The commitments arriving once per hour into their own branch (one separate checkout branch per device) and at the end I merge them together into some kind of a summary branch (or the trunk at the end of all summaries). Because there are no changes except the merges done into the summary branches there are also no merge-conflicts to handle. That makes it very easy for me to do that automated per command script.

One of the bigger files contains about 20000 long lines and during 12 hours there were 1000 lines disappearing and there are 1000 new lines ... means that there are 18000 lines without changes .. that's why the compression ratio can be very high ... just 1000 lines need to be delta-compressed for that 12 hours in that one file. This file spans around a 14 days periode. To obtain overlap I could reduce to one commit each 2 weeks and nothing would get lost time wise. Any commit rate faster than this is just to give users more real time feeling; the number of delta blob artifacts will increase of cause.

Is fossil grep possible also by using the web interface?

Annotate and Blame buttons are existing but the More... button does not give such search and filter capability.

What could be the best way to get something like this into the web enabled interface?

(18) By MBL (RoboManni) on 2020-10-27 16:18:45 in reply to 1 [link] [source]

Some update about my growing repository use-case

Compression ratio has gone down to 154:1 but is still very good. Less than 50 MB instead of 7.5 GB uncompressed.

D:\Data-Server>fossil dbstat
project-name:      Data-files
repository-size:   48,762,880 bytes
artifact-count:    2,544 (stored as 196 full text and 2,348 deltas)
artifact-sizes:    2,965,460 average, 7,836,603 max, 7,544,131,710 total
compression-ratio: 154:1
check-ins:         684
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            3
tag-changes:       1
latest-change:     2020-10-27 15:48:50 - about 0 days ago
project-age:       56 days or approximately 0.15 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-10-20 10:08:01 [67a4c1d313] [2.13] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-10-19 20:49:54 [75a0288871] (3.34.0)
database-stats:    11,905 pages, 4096 bytes/pg, 0 free pages, UTF-8, wal mode
chkin 1: 1 - 1000 200 - 1100 (100 new) 460 - 1200 (100 new) 680 - 1300 (100 new) 920 - 1400 (100 new) 1080 - 1500 (100 new) 1170 - 1600 (100 new) min and max of serial numbers per check-in chkin 2: chkin 3: chkin 4: chkin 5: chkin 6: chkin 7: 1120 (the value to find) bad +25% bad +5% best 80% good 60% good 40% good 10% bad -10%
L1: "chkin 1:"
B1: box "1 - 1000" wid 500% ht 50% fill 0xc6e2ff thin
move down
B2: box same as B1 "200 - 1100" "(100 new)" at -1 right of previous
move down
B4: box same as B1 "460 - 1200" "(100 new)" at -1 right of previous fill 0xC0C010 thick thick color green
move down
B6: box same as B4 "680 - 1300" "(100 new)" at -1 right of previous fill 0xE0E010 thin
move down
B8: box same as B6 "920 - 1400" "(100 new)" at -1 right of previous
move down
B10: box same as B6 "1080 - 1500" "(100 new)" at -1 right of previous
move down
B11: box same as B1 "1170 - 1600" "(100 new)" at -1 right of previous

TITLE: "min and max of serial numbers per check-in" big big big ljust at (B1.s.x,B1.n.y+0.2)
L2:  "chkin 2:" at (L1.s.x,B2.c.y)
L4:  "chkin 3:" at (L1.s.x,B4.c.y)
L6:  "chkin 4:" at (L1.s.x,B6.c.y)
L8:  "chkin 5:" at (L1.s.x,B8.c.y)
L10: "chkin 6:" at (L1.s.x,B10.c.y)
L11: "chkin 7:" at (L1.s.x,B11.c.y)

A1: arrow from first box.ne right 220%
line color red down until even with last box.s
"1120 (the value to find)" ljust bold at A1.e + (0.05,0)

T1: "bad +25%" ljust color red at B1.e + (0.05,0)
T2: "bad +5%" ljust color red at B2.e + (0.05,0)
T3: "best 80%" ljust bold color green at B4.e + (0.05,0)
T4: "good 60%" ljust color blue at B6.e + (0.05,0)
T6: "good 40%" ljust color blue at B8.e + (0.05,0)
T8: "good 10%" ljust color blue at B10.e + (0.05,0)
T10: "bad -10%" ljust color red at B11.e + (0.05,0)

Each daily check-in will span a range of running serial numbers. One serial number will be contained in usually more than one check-in. The oldest numbers will disappear and new numbers will get in. They will always increase during the next years. Each check-in spans some few days timewise. Delta-compression only has to add the new portion. Each check-in spans 23 files like this. Automated commitments run once each day.

When searching a serial number then each bisect step will show if the search number is left or right of the number range, hence the fossil bisect bad/good decision is deterministic. I have to do it in shell on command line and until now using a synchronized repository on my local notebook.

So far so standard.

A new fossil bisect good/bad --command SCRIPTFILE (or with a bisect option stored hook) could make further steps fully automated; but such options and parameters do not exist yet, no such support by native fossil.exe 2.13 . - However, my plan for some future is an enhancement per /ext and CGI feature (--extroot) to allow the search web based for non-experienced users as well.

(19) By MBL (RoboManni) on 2020-11-26 18:41:49 in reply to 18 [link] [source]

**Another update after ** Duration Of Project: 86 days or approximately 0.24 years

Less than 80MB repository size for 10GB checked in logfiles.

D:\Data-Server>fossil dbstat
project-name:      Data-files
repository-size:   79,233,024 bytes
artifact-count:    3,501 (stored as 253 full text and 3,248 deltas)
artifact-sizes:    2,944,852 average, 7,836,603 max, 10,309,929,677 total
compression-ratio: 130:1
check-ins:         954
files:             23 across all branches
...

(20) By MBL (RoboManni) on 2020-12-22 17:03:59 in reply to 19 [link] [source]

Another dbstat update after about one month more of dayly updated runtime

project-name:      Data-files
repository-size:   150,081,536 bytes
artifact-count:    4,313 (stored as 634 full text and 3,679 deltas)
artifact-sizes:    2,926,704 average, 7,836,603 max, 12,622,877,228 total
compression-ratio: 84:1
check-ins:         1,182
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2020-12-22 11:20:17 - about 0 days ago
project-age:       112 days or approximately 0.31 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-12-04 18:35:21 [815b4fc493] [2.14] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-12-01 16:14:00 [a26b6597e3] (3.34.0)
database-stats:    36,641 pages, 4096 bytes/pg, 0 free pages, UTF-8, wal mode


D:\File-Folder>fossil rebuild --compress
  100.0% complete...
Extra delta compression... done
Vacuuming the database... done

D:\File-Folder>fossil dbstat
project-name:      Data-files
repository-size:   107,991,040 bytes
artifact-count:    4,313 (stored as 306 full text and 4,007 deltas)
artifact-sizes:    2,926,704 average, 7,836,603 max, 12,622,877,228 total
compression-ratio: 116:1
check-ins:         1,182
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2020-12-22 11:20:17 - about 0 days ago
project-age:       112 days or approximately 0.31 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-12-04 18:35:21 [815b4fc493] [2.14] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-12-01 16:14:00 [a26b6597e3] (3.34.0)
database-stats:    26,365 pages, 4096 bytes/pg, 0 free pages, UTF-8, wal mode

Activity to compress the local copy has reduced the size by 28% and increased the compression-ratio by 32 to 116:1

(21) By aue oiae (cregox) on 2020-12-24 19:35:06 in reply to 20 [link] [source]

i just want to point to page size, since it haven't been mentioned here and i hope it's relevant to you too.

cheers! 😘

(22) By MBL (RoboManni) on 2020-12-27 09:58:08 in reply to 21 [link] [source]

Thanks for the hint, I just tried it and amazingly neither the page size nor the repository-size changed ... like ignoring the parameter --pagesize 512.

D:\File-Folder>fossil rebuild --pagesize 512
  100.0% complete...
Vacuuming the database... done

D:\File-Folder>fossil dbstat
project-name:      Data-files
repository-size:   107,995,136 bytes
artifact-count:    4,313 (stored as 306 full text and 4,007 deltas)
artifact-sizes:    2,926,704 average, 7,836,603 max, 12,622,877,228 total
compression-ratio: 116:1
check-ins:         1,182
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2020-12-22 11:20:17 - about 4 days ago
project-age:       117 days or approximately 0.32 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-12-23 18:27:12 [e8ba89b168] [2.14] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-12-16 14:20:45 [31cd1bbfa5] (3.35.0)
database-stats:    26,366 pages, 4096 bytes/pg, 0 free pages, UTF-8, wal mode

Can it be that the wal-mode does not allow any page size change? Or did I do something wrong when calling like shown in you referred thread? Or is it a bug, which I found?

As you can see on the statistics there are neither wikis nor tickets versioned in my project and the related tables could shrink as described (or even omitted?) - but they didn't.

(23) By MBL (RoboManni) on 2020-12-30 13:05:07 in reply to 22 [link] [source]

After switching from wal into Pragma journal_mode=delete the pagesize changed during the rebuild process - but the repository size did not change very much. After pagesize change the journal-mode could be set back to wal mode...

For me it looks like the parameter pagesize for the rebuild process does NOT work in wal mode but does work in mode delete. If this required dependency is described somewhere in documentation I do not know.

D:\File-Folder>fossil sql
SQLite version 3.35.0 2020-12-16 14:20:45
Enter ".help" for usage hints.
sqlite> Pragma journal_mode;
'wal'
sqlite> Pragma journal_mode=delete;
'delete'
sqlite> .exit

D:\File-Folder>fossil rebuild --pagesize=512 --compress -R D:/REPO/Data-files.fossil
  100.0% complete...
Extra delta compression... done
Vacuuming the database... done

D:\File-Folder>fossil dbstat
project-name:      Data-files
repository-size:   106,994,688 bytes
artifact-count:    4,313 (stored as 306 full text and 4,007 deltas)
artifact-sizes:    2,926,704 average, 7,836,603 max, 12,622,877,228 total
compression-ratio: 117:1
check-ins:         1,182
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2020-12-22 11:20:17 - about 8 days ago
project-age:       120 days or approximately 0.33 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-12-23 18:27:12 [e8ba89b168] [2.14] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-12-16 14:20:45 [31cd1bbfa5] (3.35.0)
database-stats:    208,974 pages, 512 bytes/pg, 0 free pages, UTF-8, delete mode

D:\File-Folder>fossil rebuild --compress --wal
  100.0% complete...
Extra delta compression... done
Vacuuming the database... done

D:\File-Folder>fossil dbstat
project-name:      Data-files
repository-size:   106,994,688 bytes
artifact-count:    4,313 (stored as 306 full text and 4,007 deltas)
artifact-sizes:    2,926,704 average, 7,836,603 max, 12,622,877,228 total
compression-ratio: 117:1
check-ins:         1,182
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2020-12-22 11:20:17 - about 8 days ago
project-age:       120 days or approximately 0.33 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2020-12-23 18:27:12 [e8ba89b168] [2.14] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2020-12-16 14:20:45 [31cd1bbfa5] (3.35.0)
database-stats:    208,974 pages, 512 bytes/pg, 0 free pages, UTF-8, wal mode

Finally I set the repository to pagesize 512 bytes per page but in wal mode, which is recommended for backoffice, hooks and CGI-accesses to repository.

(24) By MBL (RoboManni) on 2021-02-17 18:15:09 in reply to 23 [link] [source]

another 1 1/2 month later a short update of status to whom it may be of some interest:

#The original running repository, still 135:1 compression ratio:

Repository Size:	127,639,552 bytes
Number Of Artifacts:	6024 (415 fulltext and 5,609 deltas) Details
Uncompressed Artifact Size:	2,872,049 bytes average, 7,836,603 bytes max, 17,301,227,920 total
Compression Ratio:	135:1
Number Of Check-ins:	1,670
Number Of Files:	23
Number Of Wiki Pages:	0
Number Of Chat Messages:	1 (1 still alive, 27 bytes in size)
Duration Of Project:	169 days or approximately 0.46 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Data-Cron-files
Fossil Version:	2021-01-20 15:34:40 [487776dc45] (2.14) (details)
SQLite Version:	2021-01-18 12:35:16 [c1862abb44] (3.35.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2021-01-28 08:37:50 By Fossil 2.14 [487776dc45] 2021-01-20 15:34:40 UTC
Database Stats:	15,581 pages, 8192 bytes/page, 29 free pages, UTF-8, wal mode
Backoffice:	Last run: never

#and the clone after several times of synchronization with "only" 86:1 compression ratio:

Repository Size:	199,739,392 bytes
Number Of Artifacts:	6025 (973 fulltext and 5,052 deltas) Details
Uncompressed Artifact Size:	2,871,578 bytes average, 7,836,603 bytes max, 17,301,262,871 total
Compression Ratio:	86:1
Unversioned Files:	5 files, 806.3KB compressed, 0% of total repository space
Number Of Check-ins:	1,670
Number Of Files:	23
Number Of Wiki Pages:	0
Number Of Chat Messages:	3 (3 still alive, 74 bytes in size)
Duration Of Project:	169 days or approximately 0.46 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Data-Cron-files
Fossil Version:	2021-01-20 15:34:40 [487776dc45] (2.14) (details)
SQLite Version:	2021-01-18 12:35:16 [c1862abb44] (3.35.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2020-12-30 12:57:34 By Fossil 2.14 [e8ba89b168] 2020-12-23 18:27:12 UTC
Database Stats:	390,116 pages, 512 bytes/page, 62 free pages, UTF-8, wal mode
Backoffice:	Last run: never

Any suggestion of how to improve the size of the clone repository? Should I change from 512 also to 8192 bytes per page? That is the biggest difference.

(25) By Stephan Beal (stephan) on 2021-02-17 18:25:07 in reply to 24 [link] [source]

Any suggestion of how to improve the size of the clone repository?

You can try:

fossil rebuild --compress-only -R theclone.fossil

As far as the real effect of changing the page size goes, that's above my pay grade.

The original's 135:1, while much lower than your initial results, is still extremely high. This particular (forum) repo has a ratio of 4:10 (weird). The main fossil source repo is currently 35:1 and my local clone of it is 39:1. Runing --compress-only on it changes that to... 93:1(!!!).

(26) By Stephan Beal (stephan) on 2021-02-17 18:30:48 in reply to 25 [link] [source]

This particular (forum) repo has a ratio of 4:10 (weird).

It actually makes sense that the forum has a much lower compression ratio: the majority of posts are never edited, and delta compression happens only between two versions of the same "file" (wiki page, forum post, etc.). Though the underlying delta system supports creating deltas between arbitrary versions of arbitrary files/posts, there's no good/efficient/fast way of determining which combinations of deltas would give the best results, whereas applying the rule "always delta from the previous version of yourself" is easy/efficient and is almost always a win in terms of compression.

(27) By MBL (RoboManni) on 2021-05-10 06:24:41 in reply to 18 [link] [source]

It is time for another short update after 250 days running live with checkins being done once on each day.

The compression ratio is high and stays stable at 134:1

The repository is still reasonably small with 170 MB

Repository Size:	170,598,400 bytes
Number Of Artifacts:	8295 (500 fulltext and 7,795 deltas) Details
Uncompressed Artifact Size:	2,757,440 bytes average, 7,836,603 bytes max, 22,872,970,651 total
Compression Ratio:	134:1
Number Of Check-ins:	2,323
Number Of Files:	23
Number Of Wiki Pages:	0
Number Of Chat Messages:	1 (1 still alive, 27 bytes in size)
Duration Of Project:	250 days or approximately 0.68 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b Data-files
Fossil Version:	2021-01-20 15:34:40 [487776dc45] (2.14) (details)
SQLite Version:	2021-01-18 12:35:16 [c1862abb44] (3.35.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2021-01-28 08:37:50 By Fossil 2.14 [487776dc45] 2021-01-20 15:34:40 UTC
Database Stats:	20,825 pages, 8192 bytes/page, 28 free pages, UTF-8, wal mode
Backoffice:	Last run: never

Unfortunately I still have not had time to work on a bisect supported search per cgi interface, yet … but "postponed is not canceled".

(28) By MBL (RoboManni) on 2021-07-22 15:48:34 in reply to 27 [link] [source]

Some suspect occupation when I tried to rebuild by local clone with compression, change to wal mode and page size change from 512 to 8192 bytes/page.

Compression and change to wal mode has worked as expected but the pagesize did NOT.

D:\Files-Folder>fossil dbstat
project-name:      Data-files
repository-size:   308,322,816 bytes
artifact-count:    10,541 (stored as 1,434 full text and 9,107 deltas)
artifact-sizes:    2,665,324 average, 7,836,603 max, 28,095,183,467 total
compression-ratio: 91:1
check-ins:         2,961
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2021-07-22 07:59:18 - about 0 days ago
project-age:       324 days or approximately 0.89 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    602,193 pages, 512 bytes/pg, 0 free pages, UTF-8, wal mode

D:\Files-Folder>fossil rebuild --pagesize 8192 --compress --wal
  100.1% complete...
Extra delta compression... done
Vacuuming the database... done

D:\Files-Folder>fossil dbstat
project-name:      Data-files
repository-size:   227,387,904 bytes
artifact-count:    10,541 (stored as 616 full text and 9,925 deltas)
artifact-sizes:    2,665,324 average, 7,836,603 max, 28,095,183,467 total
compression-ratio: 123:1
check-ins:         2,961
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2021-07-22 07:59:18 - about 0 days ago
project-age:       324 days or approximately 0.89 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    444,117 pages, 512 bytes/pg, 0 free pages, UTF-8, wal mode

D:\Files-Folder>fossil rebuild --pagesize 8192
  100.0% complete...
Vacuuming the database... done

D:\Files-Folder>fossil dbstat
project-name:      Data-files
repository-size:   228,562,944 bytes
artifact-count:    10,541 (stored as 619 full text and 9,922 deltas)
artifact-sizes:    2,665,324 average, 7,836,603 max, 28,095,183,467 total
compression-ratio: 122:1
check-ins:         2,961
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2021-07-22 07:59:18 - about 0 days ago
project-age:       324 days or approximately 0.89 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    446,412 pages, 512 bytes/pg, 0 free pages, UTF-8, wal mode

D:\Files-Folder>

But the help says the pagesize can be a value up to 65536:

  --pagesize N      Set the database pagesize to N. (512..65536 and power of 2)

Question

How to obtain a compressed repository clone with a pagesize of 8192 bytes/page?

It seems to be impossible or what do I do wrong or do not understand? - As you could see I did that on Window OS. The main repository file is based on Linux OS and there it was possible; the compression ratio there is a lot bigger.

(30) By Florian Balmer (florian.balmer) on 2021-07-23 11:45:37 in reply to 28 [link] [source]

How to obtain a compressed repository clone with a pagesize of 8192 bytes/page?

You already got it in post 23, and also see here:

"... the new page size is remembered and is used ... at the next VACUUM command that is run on the same database connection while not in WAL mode."

... and the clone after several times of synchronization with "only" 86:1 compression ratio: ... Any suggestion of how to improve the size of the clone repository? Should I change from 512 also to 8192 bytes per page? That is the biggest difference.

Also see from src/stat.c:

The Compression Ratio represents the total size of all blobs (artifacts) vs. their total size with delta-encoding and zlib-compression applied, and is independent of the repository database page size.

The disk storage space required for the repository database file to hold all the delta-encoded and zlib-compressed blobs (plus the other auxiliary tables) may (or may not) vary with the page size.

So two repository database files with different page sizes should have the same compression ratio, but with different disk file sizes. (But maybe that delta-encoding and zlib-compression rely on some unstable conditions, such as different per-clone RIDs resulting in different artifact processing order, or available memory on the system during the compression process, possibly explaining the different compression ratios for each of your repository clones.)

(31) By Florian Balmer (florian.balmer) on 2021-07-23 13:42:42 in reply to 30 [link] [source]

Sorry, the second part is wrong, the compression ratio is calculated relative to the repository database file size.

So thre must be other factors to account for the difference, maybe that the clone-specific RID sequence affects parent (full) → child (delta) determination, even across rebuilds? No idea ...

(32) By MBL (RoboManni) on 2021-07-23 14:40:37 in reply to 31 [link] [source]

Thanks for the hints. So first I switched back from journal_mode=wal mode to journal_mode=delete and then I restarted the rebuild just for the pagesize change to 8192 bytes/page:

sandbox>fossil sql
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> pragma journal_mode=delete;
'delete'
sqlite> .exit
sandbox>fossil rebuild --pagesize 8192
  100.0% complete...
Vacuuming the database... done

sandbox>fossil dbstat
project-name:      Data-files
repository-size:   234,184,704 bytes
artifact-count:    10,541 (stored as 618 full text and 9,923 deltas)
artifact-sizes:    2,665,324 average, 7,836,603 max, 28,095,183,467 total
compression-ratio: 119:1
check-ins:         2,961
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2021-07-22 07:59:18 - about 1 days ago
project-age:       325 days or approximately 0.89 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    28,587 pages, 8192 bytes/pg, 0 free pages, UTF-8, delete mode

sandbox>

That has changed the pagesize and proves that pagesize changes are NOT done by rebuild in wal mode regardless of the given parameter --pagesize but in journal_mode=delete it worked as expected.

Once I got my 8192 bytes/page then I switched back in --wal mode during the rebuild process.

--vacuum does not need to be supplied, it will be done implicitely, as you can see:

sandbox>fossil rebuild --wal --compress
  100.0% complete...
Extra delta compression... done
Vacuuming the database... done

sandbox>fossil dbstat
project-name:      Data-files
repository-size:   233,431,040 bytes
artifact-count:    10,541 (stored as 616 full text and 9,925 deltas)
artifact-sizes:    2,665,324 average, 7,836,603 max, 28,095,183,467 total
compression-ratio: 120:1
check-ins:         2,961
files:             23 across all branches
wiki-pages:        0 (0 changes)
tickets:           0 (0 changes)
events:            4
tag-changes:       1
latest-change:     2021-07-22 07:59:18 - about 1 days ago
project-age:       325 days or approximately 0.89 years.
project-id:        3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b
schema-version:    2015-01-24
fossil-version:    2021-07-02 12:46:01 [7aedd56758] [2.16] (mingw32-3022004L-gcc-5.3.0)
sqlite-version:    2021-06-18 18:36:39 [5c9a6c0687] (3.36.0)
database-stats:    28,495 pages, 8192 bytes/pg, 0 free pages, UTF-8, wal mode

sandbox>

Now I got what I wanted at the beginning: bigger page size and wal mode - however, the compression rate is still not as high as the master repository - but that I can accept.

(29) By MBL (RoboManni) on 2021-07-22 16:06:28 in reply to 27 [link] [source]

For comparison here the stat info from the main repository, it has 8192 bytes/page:

Repository Size:	195,198,976 bytes
Number Of Artifacts:	10541 (616 fulltext and 9,925 deltas) Details
Uncompressed Artifact Size:	2,665,324 bytes average, 7,836,603 bytes max, 28,095,183,467 total
Compression Ratio:	143:1
Number Of Check-ins:	2,961
Number Of Files:	23
Number Of Wiki Pages:	0
Number Of Chat Messages:	1 (1 still alive, 27 bytes in size)
Duration Of Project:	324 days or approximately 0.89 years.
Project ID:	3c03bf7ad4cc6f57c062bae6355ab111b28c5f0b LCC-Cron-files
Fossil Version:	2021-04-29 12:52:16 [9322a0bc20] (2.15) (details)
SQLite Version:	2021-04-28 17:37:26 [65ec39f0f0] (3.36.0) (details)
Schema Version:	2015-01-24
Repository Rebuilt:	2021-05-19 13:55:00 By Fossil 2.15 [9322a0bc20] 2021-04-29 12:52:16 UTC
Database Stats:	23,828 pages, 8192 bytes/page, 25 free pages, UTF-8, wal mode
Backoffice:	Last run: never