Fossil Forum

[RFC] new feature: diff -p|--prototype
Login

[RFC] new feature: diff -p|--prototype

[RFC] new feature: diff -p|--prototype

(1) By jamsek on 2022-01-16 10:12:32 [link] [source]

See commit 792674372e7d6a8e on branch diff-show-func

This implements the diff(1) -p|--show-c-function for fossil diff:

-p  --show-c-function
              Show which C function each change is in.

With this change, you no longer need fossil diff --command "diff -up".

This only applies to command line unified diffs. But there's no reason we couldn't expand it to display in the web ui like GitHub.

Tested on OpenBSD 6.9-release and 7.0-current, Linux Ubuntu 20.04, and macOS Catalina 10.15.7.

Feedback, comments, objections, ok?

(2) By Stephan Beal (stephan) on 2022-01-16 10:29:49 in reply to 1 [link] [source]

Feedback, comments, objections, ok?

Any language-specific features seem way out of place to me. Once that door is opened, it can't be closed, and there are lots of languages out there. Some people use it to manage python code or just markdown files. Most fossil repos probably aren't C code, so the whole idea feels very dubious to me. The next person is going to want the same for Java, then python, then ruby, and each one requires new support code.

Consider me unconvinced, doubly because the patch applies to the older diff code, which will ideally be replaced by a diff-builder implementation at some point. How such a feature might be shoehorned into the diff-builder interface is unclear to me.

(3) By jamsek on 2022-01-16 10:39:42 in reply to 2 [link] [source]

Fair enough. Though it's an explicit option so I don't quite get the way out-of-place critique. If you're not diffing a C file, you wouldn't pass it. It's been around along time in diff(1), and from the looks of it, it's a popular feature (cf. GitHub, which displays it by default), and we've made strides in replicating other such features recently.

That said, if someone wants a Java one, or a Python one, or a this and that one, they can implement it in this do-ocracy.

Nonetheless, I'm happy to add it to my growing pile of local patches. And I appreciate your feedback, so thank you!

(4) By Stephan Beal (stephan) on 2022-01-16 11:02:25 in reply to 3 [link] [source]

Though it's an explicit option so I don't quite get the way out-of-place critique.

Though i didn't expressly say it (and should have), my doubt centers on 2 things:

  1. Code maintenance. Every feature added to the older diff code entrenches that code deeper into the tree, to the point that we'll never be rid of it and will have two diff impls forever.

  2. Just like platform-specific discrepancies in fossil make me ill (namely symlinks support), so does supporting one programming language over another internally. Ideally, IMO, all content should be completely opaque to fossil, with the obligatory distinction between "text" and "binary" needed for both diffs and merges.

Nonetheless, I'm happy to add it to my growing pile of local patches.

i'm not the one who needs to be convinced in order to get it merged in, i'm just expressing doubt.

(5) By jamsek on 2022-01-16 12:03:53 in reply to 4 [source]

Though i didn't expressly say it (and should have), my doubt centers on 2 things:

Code maintenance. Every feature added to the older diff code entrenches that code deeper into the tree, to the point that we'll never be rid of it and will have two diff impls forever.

This is the reality of every new feature, so it's up to someone to decide whether the juice is worth the squeeze.

FWIW, I'd be happy to implement it in the new diff driver, but for whatever reason drh@ kept the old implementation for the unified diff, so I had to write it there. No idea whether he intends to keep it around for the foreseeable future or not—but were this to land on trunk, I'd take up the task of porting it to the new diff when/if that happens. This is just the (hopeful) start, though, I wouldn't be averse to adding it the web ui like GitHub.

Just like platform-specific discrepancies in fossil make me ill (namely symlinks support), so does supporting one programming language over another internally. Ideally, IMO, all content should be completely opaque to fossil, with the obligatory distinction between "text" and "binary" needed for both diffs and merges.

I also feel the same way for the general case, but when it comes to options I'm more liberal. They're by nature optional, so I think added options for specific use-cases make projects better. Also, Fossil itself is in C, so it adds utility to Fossil development. And it relates to the recent change in increasing context lines: I often expand context just to see which function the change is in—this replaces that task much more efficiently.

i'm not the one who needs to be convinced in order to get it merged in, i'm just expressing doubt.

No, but your opinion counts nonetheless and I'm not giving up on changing your mind either. I've successfully done so in the past—as recently as this morning!

(6.1) By Marcelo Huerta (richieadler) on 2022-01-16 13:11:43 edited from 6.0 in reply to 4 [link] [source]

Deleted

(7) By Marcelo Huerta (richieadler) on 2022-01-16 13:11:53 in reply to 4 [link] [source]

I'll add my bystander +1 to point 2. Such a feature gives a privileged status to C source code, which has never been the case for any other language. It feels inappropiate somehow. (And as a Pythonista myself, having quite a few personal Python projects kept in Fossil, I would feel somewhat slighted.)

(8) By jamsek on 2022-01-16 13:35:06 in reply to 7 [link] [source]

You could always add an option to do the same thing for Python if you wanted to display the function in diff chunk headers. This is adding something to the project for C developers, not denying other developers the ability, so I don't get the slight.

And it is an option—it is not automatic for every diff—if you don't want to see functions in diff chunk headers, you wouldn't pass the explicit option to your diff commands.

I can understand reluctance for novel ideas, but this exists in every other diff command that I know of, and has for some time. That's not to say we should do it because they have it—it's to point out that it is not novel but common, and part of many developers workflow. That said, there is (recent) precedence for adding features to Fossil because they were observed in other projects and determined to be worthwhile additions. If that's not the case in this instance, that's fine.

(9) By Martin Gagnon (mgagnon) on 2022-01-16 13:42:42 in reply to 8 [link] [source]

I can understand reluctance for novel ideas, but this exists in every other diff command that I know of, and has for some time

You beat me on it, I was about to say that.

It's not something new, I had this on CVS (cli and cvsweb/viewcs). SVN has it too.

(10) By jamsek on 2022-01-16 14:00:19 in reply to 9 [link] [source]

It's not something new, I had this on CVS (cli and cvsweb/viewcs). SVN has it too.

Yes, and, IIRC, Git displays it by default. And I think CVS does too.

(13) By Richard Hipp (drh) on 2022-01-16 15:47:49 in reply to 10 [link] [source]

I did a lot of experiments with GitHub while I was doing the recent diff refactoring in Fossil. My impression was that GitHub got the containing function name wrong about half the time.

I don't think this proposal is a good idea.

  1. It is specific to C/C++ syntax. Fossil is not a version control system for C/C++. It is a general-purpose version control system.

  2. Even for C/C++ source code, the name of the containing function is often incorrectly deduced.

(14) By jamsek on 2022-01-16 15:55:55 in reply to 13 [link] [source]

I'm not sure about GitHub's algo, but I checked the results of this implementation against diff(1) and they were equivalent, with both returning the correct function in 100% of the admittedly limited diffs I checked.

Nonetheless, your opinion is the only one that matters, so please scrap this proposal. Thank you.

(18) By Warren Young (wyoung) on 2022-01-16 17:38:50 in reply to 13 [link] [source]

I don’t particularly care to have this feature, but here’s a doable way to achieve it more reliably than GNU diff does: delegate to ctags, cscope, etags, et al.

If the tag file for a recognized tool exists, use its database of files, function entry points and line numbers to infer which function the diff hunk must be in.

These tools support the many languages wanted, and we’d need to support just a few common ones to get broad support.

(19) By jamsek on 2022-01-17 02:07:27 in reply to 18 [link] [source]

I'm not sure about the GNU implementation, but in the limited diffs I tested this with (~6 a piece from the Fossil and libfossil trees, and about double that in fnc), it produced 100% accuracy in 100% of the diffs. Even when outside a function block, you get the appropriate struct.

Ironically, thinking about it, the heuristic would likely "work" with Java, C#, Python, etc. Pretty much all C-like languages. It would not work in its current form with Lisp syntax, and it'd also fall down in markdown documents. That said, I only tested it in C projects.

Nonetheless, Stephan has allowed me to move libfossil's diff v1 code into fnc! As a result, I can add this feature to fnc, which is how I view my diffs, and a better outcome for me personally. I also plan to implement my own diff algorithm. This was just one of the patches I carry in my fork that I thought would be of interest to the project.

Your idea would certainly be a great way to implement this; however, it also depends on external tooling. While I use ctags and a language server in my Vim development environment, some devs might not. In any case, I'm happy with the outcome, so have no further desire to advocate this change or write your proposed implementation. But I think any such feature will add value to Fossil, so would love to see someone do it!

(20) By Stephan Beal (stephan) on 2022-01-17 03:58:11 in reply to 19 [link] [source]

Nonetheless, Stephan has allowed me to move libfossil's diff v1 code into fnc! As a result, I can add this feature to fnc, which is how I view my diffs, and a better outcome for me personally.

Well, "allowed" is too strongly worded ;). i've been wanting to get rid of the v1 diff code but couldn't because fnc still relies on it, so i'm happy to help move those bits moved over to fnc :). That sounds like a win-win for all involved.

(11.1) By Marcelo Huerta (richieadler) on 2022-01-16 14:28:09 edited from 11.0 in reply to 8 [link] [source]

You could always add an option to do the same thing for Python if you wanted to display the function in diff chunk headers. This is adding something to the project for C developers, not denying other developers the ability, so I don't get the slight.

By you you mean specifically me? I don't have commit privileges to the repo, nor the necessary experience in C, nor the time or interest of commiting to the maintenance such a feature (nor the tools to keep such a change updated in my own clone).

Requesting this to be added creates a burden on the core developers, who AFAIK haven't seen the need to add this functionality for the management of SQLite.

But again, I'm just another happy user and I wouldn't presume to talk for the Fossil team.

(12) By jamsek on 2022-01-16 14:31:46 in reply to 11.1 [link] [source]

The proverbial you in this do-ocracy.

It's a maintenance burden I am willing to undertake as a contributor to the project. I'm not asking anyone to assume the burden. The code has been written, and would be maintained by me—a happy user and developer.

(15.1) By Larry Brasfield (larrybr) on 2022-01-16 16:28:07 edited from 15.0 in reply to 1 [link] [source]

I have an alternative to suggest. But first a little background:

There are many file differencing tools. An endless variety, it seems, and for good reason since so many kinds of text are managed incrementally.

I see it as a mistake for Fossil to go down the path of catering to specific file types among the "plain" text category, for reasons stated in this thread among others.

However, it would be nice if Fossil would help with the task of using an external differencing tool for either its tracked artifact versions or not yet committed versions of the same. Fossil is clearly well poised to cough up its stored versions and identify that there are differences.

I can imaginea Fossil having a configuration option whereby an external differencing tool, and how to invoke it using some special, %X format specs, could be set by users who want to use such a tool. Fossil would put renditions of its stored files into temporary files, then invoke the differencer when that is requested (via a web UI button or a CLI flag). It might wait for the differencer to exit and cleanup the temp files.

It's not hard to imagine the Fossil docs identifying a set of differencing tools that work well with Fossil, along with the config string to use them.

To sum up: I think jamsek's proposal is well-motivated, and that support for the capability he hopes to see would be well received. I just don't think the support should be the direct, Fossil-does-it-all kind.


a. Not much imagination is needed. I have seen and gladly used a feature much like this with another SCM. (I think it was Perforce, or maybe an internal precursor to a SCM tool that Microsoft eventually published.)

(16) By Richard Hipp (drh) on 2022-01-16 16:35:40 in reply to 15.1 [link] [source]

There is already (and has been for a long time):

fossil setting diff-command ...

(21) By ddevienne on 2022-01-17 08:09:03 in reply to 16 [link] [source]

That's not how I understood Larry's point (which I came to myself reading that thread, FWIW).

The point is not the replace the built-in diff command.
The point is to augment the diff command with a semantic diff option,
that's file-type specific, and returns higher-level changes than purely textual ones.

And I imagine there might be different alternatives to such a semantic diff.
I.e. not only do you want it to be per-file-type, but also to configure different
ways to compute and display it. So one would want different named configurations.

This invoking fossil diff --semantic config file.ext (of course, replace semantic with
whatever option name is more appropriate) would call an external program based on config and ext,
IIF the regular built-in diff found any differences in the first place. And the external program could possibly
be invoked using information gathered by the built-in diff, like offset of the 1st diff for example
(although for proper semantic, probably the whole file needs to be processed).
Which might involved new % variables to use in the invoking command part of the config.

Anyway, that's how I think the OP's use case can be addressed,
w/o modifying Fossil itself, and that's how I read Larry's post too.

(17) By jamsek on 2022-01-16 16:46:26 in reply to 15.1 [link] [source]

I'm legitimately surprised at, albeit accepting of, the consensus. If for no other reason than the diff tool itself is used to report changes in source code—a very general-purpose tool—yet it provides the option to show C function names in headers. So I don't view the feature as being exclusive, but as another tool that a subset of developers might find useful.

Fossil is a version control system that goes above and beyond to promote situational awareness, which is the primary objective of this feature, and my motivation for its implementation.

This isn't another attempt at convincing anyone. I wanted to express my genuine surprise and appreciation for another reminder that we don't always see things the same way. And more often than not, it's when you think you have a shared understanding of the ethos behind a thing, you're wrong! As Stephan often says, "it'd be a boring place if we were all the same."

And, Larry, we already have that (see diff --command), and it's what I use. Also, thank you and the other posters for your contribution to the discussion.