Code base refactoring?

(1) By PChemGuy on 2024-05-07 06:43:39 [link] [source]

I have come across this comment about the fossil app being extremely monolithic in nature. Lately, I needed to go through some of the Fossil sources to better understand certain behaviors and got an impression myself that the Fossil code base is indeed very monolithic. I understand that refactoring it to have better "separation of concerns", DRY, and whatnot would be a huge effort. I also understand that the limited Fossil niche means limited resources. At the moment, I do not know whether I can get into this.

I wanted to bring up this matter to see if the core dev team is, generally, open to this kind of code base refactoring and accepting/integrating associated contributions.

(2) By Stephan Beal (stephan) on 2024-05-07 07:22:54 in reply to 1 [source]

... got an impression myself that the Fossil code base is indeed very monolithic

That comment (i'm it's author) was specifically about the feature set, though, rather than the source tree. Literally every feature of fossil is in that one binary.

I understand that refactoring it to have better "separation of concerns", DRY, and whatnot would be a huge effort.

Anecdote...

The first time i met Richard, in 2011, he asked me, "what does Fossil need next?" to which i quickly responded, "a library interface."

We both quickly agreed, however, that reimplementing it as such would require a "herculean effort" (his words).

A few years later i started work on libfossil (which you link to above) and found that to be absolutely true. Literally person-years of work. Not only that, but the code required for implementing and using library-style APIs is very often 2-3x as long as the equivalent code in the monolithic fossil app, primarily because the library has to check, and appropriately respond to, every single result code whereas the app largely relies on die-on-error behavior for any important operations (most significantly, db ops and out-of-memory errors, which essentially never happen in practice but have to be accounted for in almost every single library-level API call). The savings in lines of code and developer productivity between the two cannot be overstated. Yes, the library approach is "cleaner" but the monolithic app approach is most definitely a tremendous time saver for the fossil developers.

I wanted to bring up this matter to see if the core dev team is, generally, open to this kind of code base refactoring and accepting/integrating associated contributions.

Major overhauls would very probably not be well-received in the core tree. Any major refactoring would, at least short term, hinder the handful of long-time developers by pulling the proverbial rug out from under their feet. e.g. Fossil is a critical tool in the (literally) daily development of sqlite, and features are sometimes spontaneously added to fossil to support that. If Richard were to one day require a new Feature X for sqlite's development, and could no longer navigate a source tree he's been using for 17-ish years, grief would ensue. We also have a few folks who maintain custom patches and forks, and any major restructuring would almost certainly break their copies.

That said: such separation is the raison d'etre of libfossil and we absolutely are open to new developers there. Feel free to get in touch with me off-list and i'll be happy to get you set up.

(3) By Warren Young (wyoung) on 2024-05-07 09:48:33 in reply to 1 [link] [source]

[I] got an impression myself that the Fossil code base is indeed very monolithic…

Debatable. While it does all build into a single executable, it's reasonably well factored already.

The primarily level is the subcommands, each of which has a single C file under src/ named after it. If you want to know how cloning works, you look first at src/clone.c. Simple, no?

The primary problem is that then leads you down into the sync mechanism, defined in src/sync.c, and thence into the transfer protocol defined in src/xfer.c, which updates the repo DB via the routines in src/db.c. It's all very sensible in retrospect, but I can understand if you're having trouble finding which of these four C files has the one item you're interested in.

A similar problem occurs in the code that builds up the Fossil UI. That's broken up into src/{cgi,http,style}.c plus one additional C file for each top-level page URL. (e.g. src/timeline.c for /timeline URLs.) Where do you go looking for any given bit of functionality? Experience tells.

refactoring

If anything, I worry that Fossil has too many source files rather than not enough, but let's get concrete: what single refactoring would you begin with if you were granted Setup powers on the source repo right now? Wave a magic wand, what is it that needs doing first?

separation of concerns

That I think is reasonably well covered above, but once again, I challenge you to give me a concrete example of where this isn't already being done. Where do you see cross-cutting concerns in the Fossil source base?

DRY

I'm not aware of any single place where Fossil violates the DRY principle. It is, in fact, one of the primary advantages of a monolithic code base: any single bit of functionality is available for reuse by other parts. Fossil's source base makes this uncommonly easy in that header files are generated rather than written by hand, so that merely defining a non-static C function makes it available for calling everywhere in the source base.

see if the core dev team is, generally, open to this kind of code base refactoring

When we have good reason, functions do get moved between C files from time to time. Combined with the automated header file generation, other files calling that function go on as if nothing changed. That's about as close to Martin Fowler's ideals of refactoring as you could wish for.

The question is, what needs to move in your estimation, and why?

Make a good argument, and then we can discuss it.

(4.1) By PChemGuy on 2024-05-07 11:49:51 edited from 4.0 in reply to 3 [link] [source]

Just to clarify, I have nothing against the single executable. In fact, I like this design decision about both Fossil and SQLite.

One example that I would consider refactoring is "main.c". IMHO, the main module should be fairly focused on the general process startup, such as execution environment and global variables setup. Anything related to user interaction, for example, command line processing may (or may not?) deserve a dedicated module (even just a few generic routines).

But my main concern about the "main.c" module is that most of its code deals with web/http/network. I strongly feel that this code does not belong to the main module, this is, IMHO, a completely separate concern.

(5) By Warren Young (wyoung) on 2024-05-07 12:03:24 in reply to 4.1 [link] [source]

my main concern about the "main.c" module is that most of its code deals with web/http/network.

Yes, that one's a bit junk-drawerish. I can see a use for creating:

cmdopts.c: command line parsing
server.c: handler for server/ui commands
version.c: ditto for "version" command

Plus:

move routines like fossil_redirect_home() and fossil_wants_https() to http.c or cgi.c, as best fit
move cgi command handler to cgi.c
ditto http and http.c

I'm surprised to find the handlers for server/ui, cgi, and http mislocated like this. I must've been shielded from it before by use of ctags/cscope jumping.

(6) By Richard Hipp (drh) on 2024-05-07 12:35:54 in reply to 5 [link] [source]

"Junk-drawish" is a great adjective to describe main.c. Yes.

But there is no compelling reason to refactor at this point. It is still easy to find the code that implements any particular command or webpage: just search for "COMMAND: $name" or "WEBPAGE: $name". On the other hand, moving routines from one file to another makes it more difficult to track the change history of those routines. The "annotate" command does not work well through a refactor.

Would it be better if various routines like "COMMAND: ui" were in a different file? Probably so. But that's water under the bridge at this point. There is no reason to move it, other than aesthetics, but there are reasons (Ex: "annotate") to leave it alone.

(7) By Warren Young (wyoung) on 2024-05-07 13:26:48 in reply to 6 [link] [source]

If you're not going to do it, then it shouldn't be done, by anyone. No other committer has justification to be shown as the new "owner" of those lines of code merely because they moved them to a new file.

(8) By anonymous on 2024-05-07 16:42:07 in reply to 1 [link] [source]

...I needed to go through some of the Fossil sources to better understand certain behaviors

Fossil's source code is fairly modular. However the run-time behavior may not be intuitive at first.

I find best to orient myself around the Fossil codebase is by using a reasonable IDE environment, even Geany should work. This will allow cross-referencing the symbols and source navigation.

As for the run-time, there are two major cases 1) command-line -- main() calls corresponding registered command by function pointer; 2) server-mode -- that's the http/CGI jungles.

In both cases, it's possible to attach a debugger to either already running Fossil process or to break in main(), then step further.