Using LibreOffice FODT and Fossil for document revisioning
(1) By Andy Bradford (andybradford) on 2025-04-03 02:58:40 [link] [source]
A few months ago I suggested to some people who were planning on collaborating on a document to use Fossil as a VCS for their document which they were planning on writing in LibreOffice. After some investigation I discovered that LibreOffice has a "flat" format called FODT which looked promising so I thought that it might actually work well. After a few commits, however, it became apparent that there were some things about LibreOffice that made managing the file with Fossil a nightmare, despite the fact that the FODT format seemed to be ideal for this. The biggest challenges we discovered were: 1) LibreOffice has a counter of sorts called Rsid that is updated everytime the document is saved. 2) LibreOffice generates automatic styles for just about every paragraph and sometimes even every word and character within the paragraph that change with almost every paragraph when you alter text. This is done even if you think you have applied a specific style to something because the automatic style is actually linked to the original style through a parent-child relationship. 3) The rsid is part of the automatic styles, so everytime the document is changed dozens of new styles get inserted automatically into the document. As a result, a one character change to a paragraph (e.g. for a spelling correction) could actually end up being committed to Fossil as a 500 line change. Completely useless and unmanageable. These things, among others, result in automatic styles that are identical in "function" but have entirely different names and different rsids. After just a few commits there were over 300 of such automatic styles that were in the document making comparing revisions an impossible task because even though two paragraphs visually had the same style, in the actual source of the document they had a different name and rsid. What we found, however, is that it is possible to avoid all of these behaviors by very careful and deliberate use of user-defined styles. For example, rather than highlighting a word in a paragraph and changing it to an italic font, one should instead apply a predefined character style to the word. Similarly, instead of highlighting a paragraph and changing the font size or font name to something, one should simply click on the paragraph and change the style using user-defined styles. In other words, one should have explicit styles and always be deliberate in assigning styles to words, paragraphs and characters. If a style doesn't exist for some text effect that is desired, create it first and then assign it to the text in question. Finally, it is possible to disable the Rsid which also eliminates another source of noise in the diff between commits. The setting is called "Store it when changing the document" and disabling it will prevent LibreOffice from saving this with the document. It even works with images because images are placed in the document as base64 encoded blobs. This is not really ideal because compression is likely not going to be as good as it could be, but some sacrifices have to be made to have a useful history that management in Fossil can provide. Before every commit, it is necessary to run "fossil diff" or use the new "fossil ui" to look at the diff and verify that no new automatic styles have been introduced. If any are found, then it's easy to track them down in the document and highlight the text and use the "Clear direct formatting" option to reset it to the user-defined style and then apply a proper style if necessary. Continue doing this until the diff shows no new automatic styles and then commit. After all of this, I'm going to consider this a success because as long as one is careful about not letting LibreOffice run away with automatic styles, Fossil can actually be quite useful to compare revisions of the document. I'm sure I've left something out, but I thought I would share this experience just in case someone else gets the notion to combine LibreOffice FODT documents with Fossil revision management. Andy
(2) By Mike Swanson (chungy) on 2025-04-03 03:17:29 in reply to 1 [link] [source]
That's some pretty good advice. I encountered similar drawbacks a couple years ago when I tried to store versioned *.fod? files (it was Git at the time, but the problems with the Flat OpenDocument format will be apparent with every VCS). I didn't really dig into workarounds, and ended up just storing the binary *.od? files instead. One drawback I found particularly damaging was that *.fods files did not keep conditional formatting, and it wasn't always consistent. Using *.ods never seemed to exhibit a similar problem.
I'm glad that there is a way, with care, to maintain diff-able *.fod? files, I may have to keep this post in mind for the future.
For text documents, I think there are a few alternatives that make it easier to maintain sensible diffs, in order from easiest to hardest:
- Plain text files, especially the Markdown variety.
- Pros: Any text editor in the world can see them, diffs will look natural, Fossil can display Markdown files with built-in HTML rendering.
- Cons: Pandoc can be difficult to use, and Markdown can be much more limiting than a normal word processor.
- TeXmacs
- Pros: stores files in a fairly readable plain text format, making diffs look natural. Brings almost all the features of LaTeX in a nice-ish GUI.
- Cons: Uncommon software that needs installation, and the GUI isn't as nice as LibreOffice is.
- (La)TeX
- Pros: Stable format (documents from the 1980s should still render fine in modern versions), plain text format making diffs look natural
- Cons: The language is infamously difficult to learn; even LaTeX that was created to have macros to do common things that TeX doesn't offer out-of-the-box did not solve the problem completely.
(3) By Andy Bradford (andybradford) on 2025-04-03 03:32:20 in reply to 2 [link] [source]
> For text documents, I think there are a few alternatives that make it > easier to maintain sensible diffs, in order from easiest to hardest: Absolutely, and in fact I tend to prefer using TeX, but this wasn't an option for the others who wanted to contribute. Searching this Forum for FODT will turn up comments by others (including some by me when I first started investigating FODT with Fossil). Andy
(4) By anonymous on 2025-04-05 01:29:34 in reply to 2 [source]
You could also use org-mode or djot as alternatives to Markdown, which in theory would allow you to dispense with Pandoc. Both Org-mode and Djot has the favorable capability of not being based on HTML output, but rather being more general lightweight markup formats.
Or just use a smaller, more focused markdown utility like lowdown(1).
(5) By Vadim Goncharov (nuclight) on 2025-06-24 17:39:15 in reply to 2 [link] [source]
BTW, has someone seriously considered *roff
language (nroff
, groff
) or it is nowhere in modern UTF-8 world?
(6) By Andy Bradford (andybradford) on 2025-06-24 19:33:38 in reply to 5 [link] [source]
> BTW, has someone seriously considered *roff language (nroff, groff) That's another good idea and one that would lend itself nicely to version control. Unfortunately, I had to find something that was otherwise easily consumed by others with less time to dedicate to learning. Andy
(7) By Warren Young (wyoung) on 2025-06-25 18:05:02 in reply to 5 [link] [source]
Quoting the groff
man page:
Input to GNU troff…must be in the character encoding it recognizes: ISO Latin-1 (8859-1).
It then goes on to offer a preprocessor that smashes UTF-8 down to Latin-1.
These tools were written decades before Unicode existed, much less UTF-8, and there seems to be no interest in updating them to fix that, doubtless out of backwards compatibility concerns.
(8) By aitap on 2025-06-27 14:49:23 in reply to 5 [link] [source]
There's neatroff (on GitHub) which does speak UTF-8 (and UTF-8 only), but it's not widely used. There's even an equation typesetter by the same author.