Git importing questions
(1) By Julian Heinken (Schneckers) on 2023-09-10 15:50:33 [link] [source]
I would like to migrate some of my repositories from Git to Fossil. However, it seems that git-submodules or Git-LFS aren't supported, right? (I could workaround LFS, but I use submodules quite a lot.)
Also: The suggested git fast-export --all | fossil import --git new-repo.fossil
doesn't work with Powershell on Windows. It does however work fine with the Windows Git-Bash for example. That wasn't really obvious to me and I think it's helpful to be clear about this in the docs.
(2) By Marcelo Huerta (richieadler) on 2023-09-10 18:14:30 in reply to 1 [link] [source]
I think difficulties and the need to use Git for Windows are more or less generally implied in the relevant part of the "Fossil vs. Git" document.
(4) By Warren Young (wyoung) on 2023-09-11 07:21:25 in reply to 2 [link] [source]
I suppose you mean this part? I don't see how that explains the PowerShell problem.
The only problem I'm aware of in PowerShell is the lack of a "redirect in from" operator, <
. Piping between a stdio type source and sink should work fine.
Me not being a Windows user except under duress leaves me with a nearly uninformed guess that is that this is due to someone playing games with temporary files under an obsolete perception that "Windows doesn't have pipes." Fossil itself does this in certain cases.
If there's a good reason why this should not work and cannot be expected to work in the future, I'll be happy to explain the problem and the workarounds in the docs.
(3.2) By Warren Young (wyoung) on 2023-09-11 07:15:31 edited from 3.1 in reply to 1 [source]
git-submodules…aren't supported, right?
Right.
The reason I didn't mention this lack in our Git to Fossil translation guide is that there is no philosophical reason for it, thus no "better way" for me to suggest.1 Submodules are simply missing until someone decides to add the feature.
Will you be the contributor who provides this?
Git-LFS…I could workaround [the lack]…
This one's less clear-cut. There is a philosophical problem involved, but I have no "better way" to suggest, having never run myself into the problem solved by Git-LFS or its various competitors. I am therefore curious what your workaround would be. That might spark a new section in the translation guide.
I can confirm that out of the box, Fossil is indeed a poor way to store lots of huge files. Part of it is due to the SQLite blob size limit, and part of it is inherent to DVCS operation, where every user gets a copy of every historical version of every tracked file. Add atop that the binary file delta problem, and you do end up with a philosophical problem. DVCSes fall apart when you try to treat them as a general-purpose distributed filesystem.
The trick is, what to suggest instead? A purpose-engineered alternative like Syncthing? I'm so far out of this problem area that I don't even know the shape and size of the solution space.
Help?
- ^ Contrast rebase or the staging area/index, where Fossil has such a superior alternative that, in our opinion, it completely obviates the lack. The problem in these cases isn't the lack of a feature, it's the perceived need to have the bogus affordance in the first place, motivating the guide's authors (me, primarily) to attempt the reader's reeducation. I don't see that submodules fall into this category.
(19) By Srikumar (srikumarks) on 2024-07-17 04:17:49 in reply to 3.2 [link] [source]
Colin Percival's bsdiff can perhaps be useful to add binary file support in fossil with minimal delta growth. For projects that involve image assets and such, tracking versions is still useful. Of course, very large files would still a problem and though bsdiff might still be useful for those (not sure about memory usage), the sqlite blob limits would kick in.
(20) By Warren Young (wyoung) on 2024-07-17 04:40:20 in reply to 19 [link] [source]
Fossil already has a binary diff algorithm. Switching to bsdiff wouldn't solve the problem described in the linked document.
(5) By Florian Balmer (florian.balmer) on 2023-09-11 07:42:11 in reply to 1 [link] [source]
This problem has already come up before, see:
(6.1) By Warren Young (wyoung) on 2023-09-11 08:19:21 edited from 6.0 in reply to 5 [link] [source]
Ouch!
This being an inherent design issue with PowerShell that Fossil cannot fix, I've expanded the relevant doc, adding the new "Converting Repositories on Windows" section.
Does this work, or are improvements needed?
(7) By Florian Balmer (florian.balmer) on 2023-09-11 18:59:36 in reply to 6.1 [link] [source]
For me, it works!
I was about to mention that since PowerShell is available for other platforms, this could affect other systems as well -- but it looks like at least PowerShell on Ubuntu doesn't have this problem.
(8.1) By Warren Young (wyoung) on 2023-09-11 19:31:45 edited from 8.0 in reply to 7 [link] [source]
While that does mean the "on Windows" qualifier on my section name covers us here, doesn't that undermine my theory of the core problem? It cannot be that it inherently pulls everything into RAM, decides how to process it, then sends it out if the Ubuntu port doesn't exhibit the same symptom.
While slagging on Windows has a certain amusement value, I prefer to be correct when I indulge in this pleasure. 😛
(10) By Florian Balmer (florian.balmer) on 2023-09-11 20:59:08 in reply to 8.1 [link] [source]
Yes, this seems somewhat strange, indeed.
But the only case where output is delayed until all input is read is PowerShell on Windows piping to an external program.
This seems not the case on Ubuntu, where the external program starts processing
immediately (verified this by piping though sh -c cat -n
to make sure more
and similar are not aliases to internal PowerShell functionality).
(11) By Warren Young (wyoung) on 2023-09-11 21:40:53 in reply to 10 [link] [source]
I wish I knew the why of this, but regardless, I've decided to dial back the strength of the new prose, to speak only of facts actually in evidence.
(12) By Daniel Dumitriu (danield) on 2023-09-11 21:58:16 in reply to 10 [link] [source]
This would some time soon also work on Windows. Of course, you will need to install the latest PS 7 then...
(13) By Warren Young (wyoung) on 2023-09-11 22:27:31 in reply to 12 [link] [source]
So…it's not related to data volume at all, but to data encoding? It's possible to make the conversion choke even with an all-but-empty repository?
(14) By Florian Balmer (florian.balmer) on 2023-09-12 19:05:38 in reply to 13 [link] [source]
So…it's not related to data volume at all, but to data encoding?
Yes, to my very surprise! Following two tests with
https://github.com/drhsqlite/fossil-mirror.git
and
PowerShell 5.1.22000.282 on Windows 11.
A: Export all from Git → Fossil: full pipe buffering causes graceful OOM.
PS C:\...> git fast-export --all | fossil import --git test1.fossil
fatal: Out of memory, malloc failed (tried to allocate 4203555 bytes)
Exception of type 'System.OutOfMemoryException' was thrown.
B: Export only part from Git → Fossil: data encoding problem.
PS C:\...> git fast-export master~1..master | fossil import --git test2.fossil
]ad fast-import line: [blob
So my earlier assumption that PowerShell's full pipe buffering may cause the limit proved wrong. The strange error message "]ad fast-import line: [blob", quite similar to the earlier example "]ad fast-import line: [JSON", tricked me.
(15) By Warren Young (wyoung) on 2023-09-12 21:08:54 in reply to 14 [link] [source]
I dunno; it looks like there are two failure cases here.
Thanks for testing and reporting. I would have never bothered to fire up a Windows VM and try it myself, but I am glad to know the answer.
(16) By Florian Balmer (florian.balmer) on 2023-09-13 06:09:13 in reply to 15 [link] [source]
I dunno; it looks like there are two failure cases here.
Yes, there are, but so far I think only case B was reported on this forum.
Anyway, as soon as PowerShell will connect pipelined processes directly, either problem should be solved.
(17) By Florian Balmer (florian.balmer) on 2023-09-13 07:21:29 in reply to 16 [link] [source]
Now I see where the strange error message comes from: PowerShell injects CR before LF, which get trimmed to CR alone, and instruct the terminal to move the cursor back to the start of the current line.
(18) By Konstantin Khomutov (kostix) on 2023-09-13 07:22:17 in reply to 5 [link] [source]
(9) By anonymous on 2023-09-11 19:13:17 in reply to 1 [link] [source]
... I use submodules quite a lot
Not to discourage your choice of migrating from Git, however I wonder if you considered just transitioning to Fossil instead of full-history migration.
Basically, select a list of recent releases and just port them by checking out from Git and re-commit them into Fossil repo using some custom scripts. It may be possible to maintain some sort of xref to Git commit-id via Fossil tags.
This approach should also take care of submodules naturally per version integrated into the main repo release.
In case more granularity needed, the Git repo(s) are still there for continuity.
Just an idea.
(21) By John Horn (jx0horn) on 2024-07-17 14:04:07 in reply to 1 [link] [source]
Julian, there is an app called git-annnex, which specifically addresses the git large file issue. https://git-annex.branchable.com/