Beginner Help: `backup`

(1) By anonymous on 2023-06-21 17:04:59 [link] [source]

Hi! I'm new to fossil, because I can't get git to do what I expect to happen with clone. (Although reading about having the bug tracking, etc. of fossil is making me into a fan even before I've done anything.) It seems like it MAY not work here either, but I think it can get much close.

I know I'm not using git nor fossil for their intended purpose, but I'm trying to use version control to manage all my personal local files. So, YES, I can use a "normal" backup program; however, what those backup programs DON'T do (so far as I'm aware) is help me know WHAT changed between backups.

For instance, say I accidentally mis-click and delete a folder in my 2005 year folder. I shouldn't be touching this folder, but it happens by accident and I don't realize it. I want my backup system to help me KNOW this accident occurred, so that if I'm running out of space and decide to DELETE old backups, I don't accidentally delete the only uncorrupted copy of my data. Instead, it can help me SAVE it to my most recent copy.

Anyway, I have been trying to read the following page:

https://fossil-scm.org/home/doc/trunk/www/backup.md

I'm thinking that the backup command is what I want, but I'm still not quite sure if it is truly EVERYTHING. Since I'm not sharing this repo with anyone and only trying to back it up, I would really like to have absolutely everything, user settings, sensitive information, etc.

I just wanted to confirm that backup is what I want.

The reason I can't do this with git is because it has this thing with handling submodules. I also have real (code) repositories in my personal files, and I just want to blindly copy and backup all of it, I don't want the version control system to pick and choose what is "too much" code and leave my "backup" copy EMPTY (which is what git will do SILENTLY, equaling complete data loss!).

FYI: I haven't done anything in fossil yet, I think I want to convert all my git directories into it. I would be grateful for every "dumbed down" explanation of the process. I'm a person who has used a command line before, but prefers GUIs because it provides my brain with visual efficiency.

(2) By Stephan Beal (stephan) on 2023-06-21 18:36:23 in reply to 1 [link] [source]

I know I'm not using git nor fossil for their intended purpose, but I'm trying to use version control to manage all my personal local files.

Neither git nor fossil are well-suited to that, primarily because certain files require specific permissions or they won't work (like anything under ~/.ssh). Additionally, it's not uncommon (is it?), to have tens or hundreds of thousands of files under a home directory¹, and fossil gets unbearably slow with certain operations for repositories with such long lists of files. Additionally, fossil has built-in limits on individual blob sizes of somewhere just shy of 2GB, so it literally cannot be used for storing such huge files.

For instance, say I accidentally mis-click and delete a folder in my 2005 year folder. I shouldn't be touching this folder, but it happens by accident and I don't realize it.

Cloud storage to the rescue!

I just wanted to confirm that backup is what I want.

FWIW, i don't think fossil is really what you want for what you're trying to do with it. Fossil is made for, and very much geared for, maintaining small-to-medium source-code projects. It is a proverbial square peg for your use's proverbial round hole.

^{^} This laptop has 398k files in my home dir totaling 97gb.

(3) By James Cook (falsifian) on 2023-06-21 20:31:48 in reply to 2 [link] [source]

As a slight counterpoint, I'll mention I've been using DVCS (git, darcs, now trying Fossil) to manage some of my personal files for over a decade.

However, I am quite careful about it. I certainly don't let a VCS loose on my entire home directory. See stephan's point about ~/.ssh for example. Instead, I deliberately choose to put certain files in my repository, and I deliberately choose when to commit changes, and even write semi-coherent commit messages. I also generally only track plain text files, with limited exceptions.

I am especially careful not to add large files, since that tends to result in storage space permanently taken up, even if you delete them later. (Actually I manage my large files with git-annex, but that's another story.)

I think overall stephan is right, but some people may find that something like Fossil is indeed a good way to manage their miscellaneous files.

There are actually projects designed to wrap a version control system as a user-friendly file synchronization tool. SparkleShare is an example, and I think git-annex has an assistant mode. However you'll still have all the caveats mentioned by stephan and myself.

(4) By Stephan Beal (stephan) on 2023-06-21 20:50:17 in reply to 3 [source]

As a slight counterpoint, I'll mention I've been using DVCS (git, darcs, now trying Fossil) to manage some of my personal files for over a decade.

And to clarify, in case i came across as a "never use scm for backups" proponent: i'm all for the highly selective use of scm for certain backups. My objections are centered around using scm in place of what we generally know of as full backups.

(5) By anonymous on 2023-06-22 02:30:57 in reply to 4 [link] [source]

Would you happen to know how to go about this problem if I don't want to use cloud storage?

I want my backup system to help me KNOW this accident occurred, so that if I'm running out of space and decide to DELETE old backups, I don't accidentally delete the only uncorrupted copy of my data. Instead, it can help me SAVE it to my most recent copy.

Regarding the file size limit, what do you do if you have large assets then? Like when making a game? I've always wondered why so many version control systems have this limitation.

Also, regarding the following quote, if I don't care about how slow the system is (since technology will eventually march forward), will it still RELIABLY work if I just wait long enough for it to complete?

Additionally, it's not uncommon (is it?), to have tens or hundreds of thousands of files under a home directory1, and fossil gets unbearably slow with certain operations for repositories with such long lists of files.

Does anyone know of backup systems out there that handle this scenario? (Sorry for asking this, if it's too off-topic.) I briefly looked at SparkleShare, but it still says it has a difficult time in the same areas as fossil. How does everyone handle trying to detect and prevent a possible corruption of an older file? Or does everyone just assume all their past files are never "touched" by accident? I'm really interested if anyone else has had the same concerns as me.

I think it would be really helpful to know what the limitations are regarding things like this:

Certain files require specific permissions or they won't work (like anything under ~/.ssh).

Could you point me to the right documentation that has an itemized list of what fossil will NOT copy? I really don't know of any other system that can help me with what I'm asking for, and I'm completely clueless on what ~/.ssh even is. I don't know if it helps any to say that I'm mainly using Windows OS.

Thank you so much for your time, responses, and wisdom!

(6) By Stephan Beal (stephan) on 2023-06-22 07:49:06 in reply to 5 [link] [source]

Would you happen to know how to go about this problem if I don't want to use cloud storage?

External storage like USB drives. The backups have to go somewhere (preferably on a different computer and more preferably a different geographic location in case of a disaster like a fire or flood).

Regarding the file size limit, what do you do if you have large assets then? Like when making a game?

They presumably use a tool geared towards such media. Source code projects, aside from high-end games, do not (as a rule) use files anywhere near that large.

I've always wondered why so many version control systems have this limitation.

Possibly for the same reason fossil does: its blob internals (developed long before 64-bit machines were in common use) use 32-bit signed integers for sizes, which are limited to 2gb. Fossil also performs many of its operations directly in-memory. For example, if you check in a one-byte change to a 1.5gb file, fossil needs at least 3gb RAM, plus an amount proportionate to the size of the change, to calculate the differences to that file.

Or does everyone just assume all their past files are never "touched" by accident?

Making the files read-only using OS-/filesystem-level mechanisms virtually eliminates all possibility of them being modified, barring storage media corruption.

Could you point me to the right documentation that has an itemized list of what fossil will NOT copy?

Fossil will copy any files you tell it to, up to the limits of its own internals and the computer it is running on¹. However, it will not record any file-level meta-data, like file permissions, except for the "executable bit."

I don't know if it helps any to say that I'm mainly using Windows OS.

Yes, you should have started with that ;). On Windows, file permissions are far less important than they are on Unix-like systems. Even so, fossil is not a great tool for doing All The Files backups due to its other scaling-related limitations.

If you decide on using an SCM for backups, i recommend breaking it down into multiple repositories (one per group of files, however you like to group them) and only SCM files which might actually change (e.g. not photos or movies). For static files, remote/external backups are every bit as good and don't require 3rd-party software to manage.

^{^} It won't be able to store a 1gb blob on a Pi Zero with only 512MB RAM unless that Pi has an appropriate amount of virtual memory.

(7) By Daniel Dumitriu (danield) on 2023-06-22 07:50:40 in reply to 5 [link] [source]

Regarding the file size limit, what do you do if you have large assets then? Like when making a game?

You search for surrogate solutions. Look up lfs, largefiles, annex, attic and the like.

I've always wondered why so many version control systems have this limitation.

Because they store differences between versions - among others, in order to improve over e.g. just storing a snapshot of your files after any change in a separate directory (presumedly what programmers used to do before Rochkind). That allows for smart and desired operations: after all, SCMs have been created specially for source code, where the user needs deeper answers (diff, blame, merge) than just how files looked at timestamp T. Computing and managing all that amounts to longer times and larger disk space.

Also, regarding the following quote, if I don't care about how slow the system is (since technology will eventually march forward), will it still RELIABLY work if I just wait long enough for it to complete?

Define "long enough" and get acquainted with computational complexity; the bottleneck here are some quadratic time routines. Although we do believe Fossil will function correctly, we have very limited experience with its reliability in such cases - since most of us tend to give up after some time ranging from hours to days.

(By the way "technology will eventually march forward" is no panacea; no technology will help you too much with an exponential time algorithm (or cubic, for what is worth) - quantum computing be here bracketed out.)

Does anyone know of backup systems out there that handle this scenario?

I do not think using a SCM is the right solution here. Check backup solutions, they can use switch between full, incremental, and differential, use deduplication and other approaches geared towards this problem. You can get encryption on top of it and thus maybe think also about cloud storage as an extra layer. Check on duplicati, duplicacy, rsync, rclone, restic etc.

(8) By Konstantin Khomutov (kostix) on 2023-06-22 10:45:22 in reply to 1 [link] [source]

So, YES, I can use a "normal" backup program; however, what those backup programs DON'T do (so far as I'm aware) is help me know WHAT changed between backups.

Borg backup can do that.

Since it's block-level deduplicating, you can also have a flexible policy of backup expiration: say, keep 7 daily backups, 12 monthly backups and 2 yearly backups.

Again, since it's block-level deduplicating, it works reasonably fine in cloud setups: I personally backup to another machine I own and then sync up the resulting repo to a cloud using rclone to have an "off-site" copy.

Sure, what I cannot do is to have something like git status or fossil status: that is, to show you that you have deleted something locally compared to the last backed up state.

(9) By mark on 2023-06-22 11:14:22 in reply to 6 [link] [source]

Would you happen to know how to go about this problem if I don't want to use cloud storage?

External storage like USB drives. The backups have to go somewhere (preferably on a different computer and more preferably a different geographic location in case of a disaster like a fire or flood).

I think the general theme consistent in the answers from Stephan, Daniel, and James, is to use specific tools suited to distinct tasks; for example, creating versionable backups of source code and system configuration files is distinct from backing up photos and other media.

Personally, for system configs, I use RCS, which I feel is tailor-made for such a task. For research, school projects, financial records, and some documentation, Fossil is perfect. And for source code, I find Got to be ideal. But for things like media and backups of system images, I use a mix of external drives, remote and local self-hosted next cloud servers, rsync, and dump(8) and restore(8). That's another way of saying, I don't think it best to use Fossil for everything on the file system that needs to be tracked and preserved, but it is excellent for managing some of that data at different stages in its lifecycle.

(10) By sean (jungleboogie) on 2023-06-22 17:50:00 in reply to 1 [link] [source]

A couple cross-platform utilities that may be better suited for your need:

restic and rclone

I've used fossil for system configurations, .vimrc stuff, application configuration files, etc before.
As others have said, you'll need to be mindful of the permissions and of what you're copying into the repo. The main reason I used fossil this way was to share the configurations across a few computers without scp'ing everything.

RCS is a fine choice for system config files as well and I've used that on systems where I didn't install fossil, but that's a non-Windows option only, AFAIK.

My most recent fossil repo is to use it for recipes. I've typed them up in markdown (mostly with fileedit :D) and use my ipad's browser to see the steps/ingredients when I'm in the kitchen.

So fossil can be used in many (unintended) ways, but be mindful of its limitations.

(11) By anonymous on 2023-06-22 18:39:28 in reply to 1 [link] [source]

Thank you all so much for your patience with me and taking the time to write all of your wonderful responses!

Getting to know more terminology of what I need to look for definitely helps, as well as all of the suggestions regarding this "square peg in a round hole"!

I never thought of using a mix of systems. That makes awesome sense! For some unenlightened reason, I thought I had to just pick one and stick with it.

Have a blessed day everyone!

P.S. If you have any more information, now or later, feel free to still respond to this thread for the likes of others like me or even the future me! :)

(12.1) By Andy Bradford (andybradford) on 2023-06-23 01:53:43 edited from 12.0 in reply to 5 [link] [source]

> How  does everyone  handle trying  to  detect and  prevent a  possible
> corruption of  an older file? Or  does everyone just assume  all their
> past files are  never "touched" by accident? I'm  really interested if
> anyone else has had the same concerns as me.

Data  protection,  data backups,  and  security  are  some of  the  most
challenging things  to think about and  actually get right. You  are not
alone in these concerns. There's no  one size fits all solution. I found
the following to be thought provoking:

https://research.exoticsilicon.com/articles/backup_strategies

Andy