Fossil Forum

importing huge git repo is ok, but the same SVN repo triggers OOM
Login

importing huge git repo is ok, but the same SVN repo triggers OOM

importing huge git repo is ok, but the same SVN repo triggers OOM

(1.1) Originally by anonymous with edits by Stephan Beal (stephan) on 2024-06-03 05:54:10 from 1.0 [source]

hello.

i am still trying to import Sauerbraten repository to Fossil (just4fun, tbh), and i have a strange problem.

this repo contains a lot of binary files, and it is quite huge. yes, i know that this is not the best scenario for Fossil, but...

if i'm trying to import it with svnrdump dump https://svn.code.sf.net/p/sauerbraten/code/ | fossil import --svn s-svn.fossil, Fossil OOMs around revision 450 or so (i am using 32-bit GNU/Linux).

but if i first import it as git repo with git svn clone https://svn.code.sf.net/p/sauerbraten/code/, and then import with git fast-export --all | fossil import --git s-git.fossil, it works! no OOMs, and the result is ~3GB repo file.

it's not something urgent, but i'd like to know why it is so. both repos contain the same history, and full of binary files, so why SVN import fails? can it be fixed/improved? maybe you have some directions for me to dig deeper into the issue?

((edit by admin: subject updated at OP's request.))

(2) By Stephan Beal (stephan) on 2024-06-02 15:44:45 in reply to 1.0 [link] [source]

it's not something urgent, but i'd like to know why it is so.

Though i cannot answer your question, this might be a workaround until it's resolved:

Reposurgeon can convert from svn to git, or even possibly between svn and fossil (probably using git as the intermediary), but its "recent changes" list says:

  • Fossil import/export from checkouts is supported; --format=fossil is gone.

Whether that means that it can now only import/export fossil via a checkout, and what exactly the implications of that are, is unclear to me.

PS: that's not a product recommendation, per se, as i've never used it.

(3) By anonymous on 2024-06-02 16:08:55 in reply to 2 [link] [source]

thank you! simple importing SVN repo via git svn clone works too. i've never used Reposurgeon too, looks like it could do it without full conversion.

but i am interested in what is so different between two kinds of importing, and why Fossil constantly eats memory with SVN import, but not with git import. something strange is going on there.

my wild guess (not checked, just guessing) is that git exports each commit as a set of new files (this fact is not checked), and SVN exports as deltas (this fact is checked, SVN does exactly that when dumping a repo) to some previous commit. so Fossil eats up more and more memory while reconstructing files from SVN dumps as deltas goes further, and eventually runs out of memory.

but shouldn't content cache take care of that? as we only need to reconstruct one file at a time, and we don't need to keep the old file version after applying a delta, it shouldn't be a problem... unless all file versions are kept until the very final delta applied. this may be the case, but i'm not yet familiar with that part of Fossil code.

(4) By anonymous on 2024-06-03 04:35:38 in reply to 2 [link] [source]

btw. is there any way to log memory allocations in Fossil? i mean, some define or something as easy as this. ;-) as far as i remember, Fossil don't bother to free all resources, but it definitely does free some temp ones. and it would be great to see what kind of resource "accumulates" more and more in any case.

i tried to force content cache cleanup after processing each new SVN revision, but that didn't help. not that i expected it to help, just wanted to rule out "inter-revision cache issues".

now i need to dive much deeper into Fossil code, and it would be great to have some tool to log all allocations. if not by type, then at least by source file position. ;-)

(7) By Stephan Beal (stephan) on 2024-06-03 05:59:47 in reply to 4 [link] [source]

and it would be great to have some tool to log all allocations. if not by type, then at least by source file position.

We don't currently have such a thing but a patch which replaces fossil_malloc() and friends with a macro which passes on __FILE__ and __LINE__ to their real impls, such that we could add that info to the oom crash, would be thoughtfully considered.

(5) By anonymous on 2024-06-03 04:40:45 in reply to 2 [link] [source]

p.s.: could you please edit the topic title to "OOMs" instead of "segfaults" (or something)? because this is not technically a segfault bug, it just manifests itself this way.

(6) By anonymous on 2024-06-03 05:55:13 in reply to 2 [link] [source]

ok, it seems that i found it! ;-)

looks like the problem is not that hard: svn import never frees delta and target blobs, and they simply accumulate. you only need two small changes to fix it.

find two svn_apply_svndiff() invocations in import.c, and insert:

blob_reset(&deltaSrc);
blob_reset(&target);

right after the call to svn_handle_symlinks(), and before the closing }.

this seems to fix memory leaks, and 600 revisions are imported without unbound memory consumption. i will try to import the whole Sauerbraten repo now (6000+ revisions), and report the results. but i believe that this is it.

call to svn_handle_symlinks() inserts target content into the database and we don't need to have it in memory anymore. and we don't need deltaSrc after callig svn_apply_svndiff(). it is easier to free them both after calling svn_handle_symlinks(), though. and just in case: do not free rec.content there, it is properly freed at the end of the loop.

i am not providing the patch, so you don't have to invent some way to rewrite two trivial calls. ;-) but just in case, i donate my analysis to public domain too (if it even matters from the legal standpoint ;-), you may use it as you like, without any credits.

(8) By Stephan Beal (stephan) on 2024-06-03 06:10:22 in reply to 6 [link] [source]

you only need two small changes to fix it.

Thank you very much for the debugging and report. We'll get this fixed Real Soon Now (very possibly today).

i am not providing the patch, so you don't have to invent some way to rewrite two trivial calls. ;-)

No worries, we excel at reformulating patches ;).

(9.1) By Stephan Beal (stephan) on 2024-06-03 06:14:33 edited from 9.0 in reply to 6 [link] [source]

find two svn_apply_svndiff() invocations in import.c, and insert:

Just to confirm, is this what you meant?

Index: src/import.c
==================================================================
--- src/import.c
+++ src/import.c
@@ -1584,12 +1584,16 @@
             }else{
               blob_zero(&deltaSrc);
             }
             svn_apply_svndiff(&rec.content, &deltaSrc, &target);
             rid = svn_handle_symlinks(zPerm, &target);
+            blob_reset(&deltaSrc);
+            blob_reset(&target);
           }else if( rec.contentFlag ){
             rid = svn_handle_symlinks(zPerm, &rec.content);
           }else if( zSrcPath ){
             if ( zPerm==0 ){

(10) By anonymous on 2024-06-03 06:25:06 in reply to 9.1 [link] [source]

yes, exactly! and there is another such code snipped around the line 1630.

(11) By Stephan Beal (stephan) on 2024-06-03 06:48:59 in reply to 10 [link] [source]

yes, exactly! and there is another such code snipped around the line 1630.

Thank you for the report and analysis. After looking those sections over closely, the leak is now obvious and both blocks have been patched on the trunk.

(12) By anonymous on 2024-06-03 08:39:50 in reply to 11 [link] [source]

thank you! just to finish with this, the promised report: i succesfully imported 6860 revisions from Sauerbraten SVN, with Fossil comfortably sitting in 20-40 MB of RAM. it took slightly more for the final metadata rebuild and deltification, though (sometimes peaked at 100 MB due to huge binary files), but Fossil released memory properly. final repository size is about 6GB (without vacuuming).