fossil add myFile.xlsx results in: SKIP myFile.xlsx

(1) By anonymous on 2019-03-21 11:05:58 [link] [source]

hi,

the title is basically the problem. I did have the extension mentioned as binary-glob. Adding the file wasn't successful. When I removed .xlsx from the binary-glob I still could not add the file. I'm not using .fossil-settings

Thanks, Gert

(2) By Warren Young (wyoung) on 2019-03-21 14:33:27 in reply to 1 [link] [source]

I can't reproduce it. Do you get any other output from Fossil?

The source code says this only happens when its attempt to modify the repository DB results in zero DB changes, and that happens after it consults the --ignore flag and ignore_glob settings.

Incidentally, you'll get more efficient storage by unzipping that file before checking it into Fossil. It'll have roughly the same compression level, and subsequent changes will have smaller diffs. That'll mean you have to reconstitute the *.xlsx from its contents, of course, but that's easily automated.

(4) By Gert (gertvanoss) on 2019-03-22 17:09:26 in reply to 2 [link] [source]

not sure I understand this. It is an excel file. made a workaround putting it in a folder outside the repository.

(6) By Warren Young (wyoung) on 2019-03-22 18:38:14 in reply to 4 [link] [source]

not sure I understand this. It is an excel file.

Are you saying that you don't understand why I would ask you to unzip an xlsx file? It's because it is a Zip file!

If you're running this on a system without an "unzip" command, install Info-Zip, then say unzip -l my-sheet.xlsx. That will list the content of that Zip file.

If you're asking why that's a good idea, it's because binary data compression turns a file into pseudorandom noise, which defeats Fossil's delta compression algorithm. Depending on the way the binary compression algorithm (Zip in this case) works, you can get pathological conditions where a single-byte change near the start of the raw data ends up changing nearly every byte after it in the output, so that every Fossil checkin results in a nearly completely redundant copy of the input data instead of delta compressing it as it should.

Unzipping the file and checking in the pieces avoids this problem. Some files contained in the Zip archive will not change at all between checkins, and those that do will change only in proportion to the amount of data that was changed within Excel.

This advice reflects a general principle, not specific to Fossil or Excel at all: when using binary data compression with any tool that can do delta compression, the second and subsequent copies are often more efficient when you use uncompressed input. The on-disk footprint is bigger, but the on-wire footprint can be much lower.

To take another example, it's more efficient to store Windows BMP files in Fossil than PNGs. Both PNG and Fossil use similar binary compression methods, and a change to a few pixels in the BMP will result in only a small checkin, whereas a few pixels change in a PNG could cause the whole PNG to be stored again, mostly redundantly.

All of this means you then need a way to reconstitute your xlsx files from the unzipped contents, or your PNGs from your BMPs, but that's a small matter of scripting. Personally, I'd do it with a Makefile, since my projects normally have one of those anyway, and with that I can tie changes to individual files to just the steps needed to reconstitute the outputs:

    all: my-sheet.xlsx my-pig.png

    my-sheet.xlsx: xl/workbook.xml xl/styles.xml docProps/app.xml ...etc...
        zip my-sheet.xlsx $^

   my-pic.png: my-pic.bmp
        convert $@ $<

(7) By Warren Young (wyoung) on 2019-03-22 18:43:10 in reply to 6 [link] [source]

Why is Fossil rendering the above <pre> block so poorly? I've tried two other rendering engines on the raw artifact text, and they give reasonable output.

(8) By Warren Young (wyoung) on 2019-03-22 19:24:52 in reply to 6 [link] [source]

The on-disk footprint is bigger

Clarification: the on-disk footprint of the checkouts will be bigger, but the resulting repository sizes will be smaller unless you only ever check in a single version of each such file.

And that is why the on-the-wire sizes are smaller: Fossil's xfer mechanism will be sending properly delta compressed artifacts instead of multiple highly-redundant copies of the same data.

I said this problem isn't specific to Fossil. It'll also affect Git, rsync, Unison...anything with a delta compression feature.

Personally, I'd do it with a Makefile

I now see that the above doesn't address the reverse direction. You can't write a set of dependencies to undo the above because that'll create a loop: the inputs to the compressed files change, so the uncompressed outputs are generated, which changes the inputs to the compressed files, so they get recompressed, which changes the inputs to the uncompressed files...

Instead, I'd add a manually-run target to unpack things on demand:

    unpack:
        unzip my-sheet.xlsx
        convert my-pic.png my-pic.bmp

Adding a loop to do this for a list of PNGs, Excel spreadsheets, or whatever is a trivial extension.

This technique does increase your build-time dependencies, but it has side benefits like the ability to change output formats easily. I've got a web app containing 4 different versions of almost every PNG, because they were created prior to the revelations I'm preaching here:

An indexed-color version with single-color transparency, matted onto a static background color, from the early days of the web when you couldn't count on support for 24-bit PNG at all.
A second copy of the above on a different background color for a customer that contracted to resell our software under their branding, with their color scheme.
A 24-bit version, still matted onto our company standard background color from the days where IE could handle 24-bit PNGs but not transparency.
A 24-bit alpha-blended version once we finally dropped support for those old versions of IE, so now they'll work on any background color.

If I'd had the foresight to store the final version from the start, I could have programmatically generated the first three from it as a build step and saved the wasted space in the repository.

(9) By Warren Young (wyoung) on 2019-03-24 17:30:10 in reply to 8 [source]

I've expanded this information into a new Fossil document based on a practical experiment showing how use of compressed files in a Fossil repo balloons the repo size.

If you don't want to read the new document, here's the sizzle:

results bar graph

I don't know how to make my point more clear than that. :)

(10) By Gert (gertvanoss) on 2019-03-26 17:05:17 in reply to 9 [link] [source]

Appreciate the in depth answers and I am certainly going to experiment with this.

(3) By anonymous on 2019-03-21 20:20:48 in reply to 1 [link] [source]

Was the .xlsx file opened still in Excel?

If that's the case, it may be locked, so Fossil could not process it. Try adding it after making sure it has been closed in Excel.

(5) By anonymous on 2019-03-22 17:09:59 in reply to 3 [link] [source]

Not sure. I changed my setup but I'll definitely pay attention to that next time.