Show check-ins that modify file

(1) By anonymous on 2019-11-25 22:10:02 [link] [source]

Hello,

Is there a way to show all the check-ins in chronological order that alter a specific file?

I've tried the finfo?name report, but the file I am interested in was reverted back to a previous state, so in that report that file looks like a dead end fork.

The closest thing I've seen that gives me what I want is the MLink report (mlink?name).  But it isn't the easiest thing to use.

I guess what I'm looking for is a timeline that has has a filter that only shows check-ins that have modified a file given a filename.

As a related question is there a way to detect these dead-end files?  (Something like "fossil leaves" for individual files)

I think what happened was one of the developers does not close their editor, and leaves whatever files he was working on open.  He did a pull and an up, but the editor did not update the file.  So when the file was saved again, it saved the older version.  I'd like to be able to detect any of these unintentional file reverts.

In other words detect where a file goes from A -> B -> A

Thanks

(2) By Richard Hipp (drh) on 2019-11-26 01:04:56 in reply to 1 [link] [source]

I don't think there is an existing webpage that does exactly what you want. The closest would be the /finfo page which shows the first occurrence of each distinct versions of a file. Example:

https://fossil-scm.org/fossil/finfo?name=src/finfo.c

There is also the af=HASH query parameter on the /timeline page that shows all check-ins that use the file identified by HASH. Example:

https://fossil-scm.org/fossil/timeline?uf=4387af681d114ed7

But neither of those seem to be quite what you are looking for.

As you observe, you could probably write a query against the MLINK table, probably joined with EVENT and maybe BLOB to figure out what you want. If you can write such a query, we can easily add it as a new webpage, or as a new query parameter on an existing web page, and we'll likely do so if there is a consensus that it is useful. The timeline generating logic in Fossil is modular and so once you have an SQL query that identifies interesting check-ins (or other artifacts) that you want to display, it is normally quite simple to add that capability as a new or enhance web interface.

(3) By Warren Young (wyetr) on 2019-11-26 15:45:45 in reply to 2 [link] [source]

What would also be useful is to be able to track backwards across file renames in fossil finfo and /finfo. That'll also currently result in apparent dead ends.

(4) By Richard Hipp (drh) on 2019-11-26 16:28:19 in reply to 3 [link] [source]

One difficulty is how to specify which file you want to track. Suppose you have two files in your project that have each been renamed three times:

"abc" → "def" → "ghi" → "jkl"
"uvw" → "xyz" → "def" → "abc";

If you ask /finfo to show you the history of file "def", does it show you the history of file 1 or the history of file 2? Both files were called "def" at some point in history. Which one does it show?

Your immediate instinct is to require /finfo to specify the "latest" name of the file. But that doesn't really work either because the file might have different names in different branches.

Apparently what you would have to do is identify the file using both a check-in name and the on-disk name of the file within the check-in. That makes constructing URLs a little more complicated. Perhaps there should be a new URI. We would still have /finfo which means "show me all files whose on-disk name is 'blahblah'" and then some new URI that means "show me all files that are in the same file history tree as file 'blahblah' of check-in 'whatever'".

New terminology is needed. Notice the confusion in the previous paragraph. What does "file" mean? Is it:

A specific instance of an on-disk artifact in a particular check-in?
All on-disk artifacts that have the same on-disk filename?
All artifacts that are within a single file history graph?

Right now, I normally use "artifact" for (1) and "file" for (2) and I do not really have a good name for (3). As this is complicated, we really do need a good name for (3) before we can have an efficient conversation about the technical details. Suggestions? In the sequel, I'll call it a "file-3".

To make this efficient, it seem likely that we would need to change the Fossil repository schema to add a new column to the MLINK table that specifies a canonical name for the "file-3" in question. I propose that the canonical name of a "file-3" be the artifact hash of the first place that the file is inserted into the block-chain. In other words, the canonical name is the artifact hash for the root of the history tree for the file-3. The MLINK table would need a new column that is an integer that references the BLOB.RID for the canonical name of the "file-3". Adding a new column to MLINK and keeping it consistent is going to be a big change, but it would make some of the existing logic a little easier, and in particular would facilitate doing things like showing "blame" across renames.

But before we even begin to journey down that road, we need a good names to distinguish files according to definition (2) from files according to definition (3).

So what say y'all?

(7) By Warren Young (wyetr) on 2019-11-26 18:02:41 in reply to 4 [link] [source]

An internal assumption in my request is that Fossil records the rename in some way that makes this:

  $ mv abc def
  $ fossil rm abc
  $ fossil add def
  $ fossil ci -m 'renamed abc to def'

...distinct from this:

  $ fossil mv --hard abc def
  $ fossil ci -m 'renamed abc to def'

If they result in the same block-chain insertion, then I think your assessment above stands.

If not, then can't finfo, when it hits the end of the line for a file when backtracking its history check the parent of the check-in for the rename info, get the old name from that, and continue the finfo back-tracking?

And if not, then maybe all that's needed is an extension to the manifest format that does record this information. It won't help going back in history, but you would then be able to back-track through a rename.

As to your point about multiple files called "def" at various points in history, that's solved by looking at parent check-ins only, not at the index of file names to check-in IDs.

(8) By Richard Hipp (drh) on 2019-11-26 18:15:56 in reply to 7 [link] [source]

Fossil remembers the "fossil mv" command. That is part of the block chain. "mv" holds more information than does "rm" followed by "add".

(9) By Joel Dueck (joeld) on 2019-11-26 19:40:45 in reply to 4 [link] [source]

But before we even begin to journey down that road, we need a good names to distinguish files according to definition (2) from files according to definition (3).

Some ideas for what we could call the artifacts within a single file history graph (#3 in your list):

Ancestor-files, ancestors
Precursor-files, precursors
Progenitors

(10) By Richard Hipp (drh) on 2019-11-26 20:05:22 in reply to 9 [link] [source]

Those names seem to imply only objects that are closer to the root of the tree. I want a name that means all objects in the tree, ancestors, descendands, and cousins.

An example of cousins would be a file originally called "abc" but whose name was changed to "def" in one branch and into "ghi" in a different branch. The "ghi" would be a cousin-file of "def". (Technically a sibling file, but I'm trying to be generic.)

I'm looking for a name for the set of file-like objects that have a common ancestor.

Maybe "clan"?

So then we have:

Artifact → A single version of a single file. A specific pattern of bits that get stored on disk.
File → One or more artifacts having the same on-disk name.
Clan → The set of all artifacts that derive from a common ancestor.

With these definitions, the /finfo page would should a graph of all artifacts that have the same on-disk name and the /claninfo page would show a graph of all artifacts that derive from a common source. In the absence of file renames, the two graphs would be the same. But when renames are present, the graphs can be wildly different.

These names still don't seem quite right to me, though...

(11) By Warren Young (wyetr) on 2019-11-26 20:28:55 in reply to 10 [link] [source]

I think file names only have a meaning at one point in time. They're the starting point for a finfo query, but they're not necessarily consistent through the complete query.

Thus, finfo shouldn't be pulling up the set of all artifacts called foo but the one that happens to be called foo at some point in the DAG and all of its parents, no matter what they were once named.

All that info is already in the block chain, no?

(14) By Erik (elechak) on 2019-11-26 23:10:48 in reply to 10 [link] [source]

Hello,

Maybe these names are too technical.  I've tried not to over-purpose the terms Artifact and File.  I'm not suggesting that these names be used in the fossil ui.  But rather to facilitate conversation about the concepts.

FileBlob -> A unique pattern of bits stored in the blob table, only exists in fossil repository

FileRecord -> An object internal to the fossil repository that has three main components (not stored on disk):
    1) A file name or path (filename.name)
    2) Data associated with the file name or path (blob.content)
    3) A time or check-in component that ensures that a FileLink ties to only one blob.content

File -> An object stored on disk that has two main component parts:
    1) A file name or path
    2) Data associated with the file name or path
    
    Files are generated by fossil using filename.name as the name and a single FileBlob as its content (see FileRecord).
    Files that have identical content will get content from the same FileBlob
    
FileRecordHistory -> The genealogy of this FileRecord (based on name)
 
FileBlobHistory -> The genealogy of this FileBlob (based of content)

(15) By Scott Robison (sdr) on 2019-11-27 00:00:16 in reply to 10 [link] [source]

Just a thought:

Artifact: Current meaning remains intact.
Name or Named Artifact(s): One or more artifacts having the same on-disk name.
File: The set of all artifacts that derive from a common ancestor (including renamed / moved files).

(16) By ramsan on 2019-11-27 07:50:09 in reply to 10 [link] [source]

What about the most simple:

Artifact → A single version of a single file. A specific pattern of bits that get stored on disk.
File name → The name that a file has on disk for a particular version. A file can have different file names in its lifetime
File → The set of all artifacts that derive from a common ancestor.

Then, fossil finfo myfileName would give information about the file whose fileName is myFilename for current version.

(13) By Erik (elechak) on 2019-11-26 20:55:57 in reply to 4 [source]

Hello All,

I wrote the initial question and was hoping that the feature already existed. After the initial response, I started working on some SQL that could be used to generate a new report meeting my requirements.

I don't think I've ever used rename. So it's not really a concern of mine.

I am interested in the cases where the contents pointed to by filename.name change. if the file is renamed, that file ceases to exist, and the last check-in would show a rename or delete or whatever fossil does to signify removal of a file.

Even if the same filename exists in multiple branches or forks, I would be ok with a timeline that showed me every time the contents of the file specified by a filename changed. Obviously nice to have features would be to limit the results to a specific branch or even show the descendant relationships with connector lines like in the timeline.

Here is a query that I pieced together. I have not taken a look at the internals of fossil for quite some time, so this query is just a quick stab at the problem. I would imagine that others would want to modify it before it makes it into the system. It also sounds like this might have to wait until more design thought goes into it.

select distinct datetime(event.mtime), (SELECT uuid FROM blob WHERE rid=mlink.mid), coalesce(event.ecomment, event.comment), event.user
from mlink, filename, event
where
mlink.fnid = filename.fnid
and
filename.name = ___THE_FILE_PATH___
and
event.objid = mlink.mid
and
event.type = "ci"
order by event.mtime desc

I have submitted code to fossil in the past, not sure if my Contributor Agreement form is on file. Let me know if you need me to fill anything out. Do what you want with the query. I'd be happy to add it to the fossil code, but it might be better to have someone more familiar with the internals do it.

Can someone point me to any documentation about the fossil tables and fields.

(17) By Stephan Beal (stephan) on 2019-11-27 15:36:23 in reply to 13 [link] [source]

The closest thing we have to documentation for the db is schema.c.

(18) By Richard Hipp (drh) on 2019-11-27 15:49:25 in reply to 13 [link] [source]

So I went to implement your query and discovered that there is something like it already in the code....

If you add the "chng=FILENAME" query parameter to /timeline it restricts the output to just those check-ins that involve a change to FILENAME. Is that what you wanted?

FILENAME in this context means the on-disk filename, not the "clan name".

(20) By Erik (elechak) on 2019-11-27 16:26:07 in reply to 18 [link] [source]

Thank you. Yes that is exactly what I was looking for.

It would be neat if it had connector lines showing the ancestry of the file (like finfo connector lines) rather than show the ancestry of the check-in. But I understand why it does not. And it's not problem at all.

I expect the answer is no, but is there a way to access this via the UI?

If not I would recommend putting a link next to the "MLink Table" link at the top of the "finfo?name=xxxxx" page.

I would also recommend that on the "finfo?name=xxxxx" page the title "History of filename" be changed to "Artifact History of filename". The reason for this is that it took me a while to figure out that I was looking at a history based on blob.uuid rather than filename.name.

I'd be more than happy to add these to the code.

(21) By Richard Hipp (drh) on 2019-11-27 17:05:04 in reply to 20 [link] [source]

I'm always looking for new ways to make features of Fossil more easily discoverable, via links or whatnot. But at the same time, it is important not to overwhelm the user with too many links, such that they all get lost in the crowd.

Perhaps we need to add a general mechanism to pop up "Advanced-Search" style dialog boxes with lots of links and form entries for doing useful but more obscure displays. Then the main page that most people see is not overly cluttered with links and buttons and entry boxes, but there is always an "Advanced" button to show more options if what you want is not readily available. We already have something like that on the timeline sub-menu which can toggle between "Basic" and "Advanced". But maybe the mechanism needs to be generalized and improved.

(22) By Erik (elechak) on 2019-11-27 19:14:41 in reply to 21 [link] [source]

A while ago I used to seriously mess with the dynamic ticket system to allow me to add features to fossil. I would adjust the "new ticket page", "view ticket page", "edit ticket page" and "report list page", to not only give me more control over the ticket system, but also add reports that gave me insight into the code or users.

Have you considered generalizing the dynamic nature of the "html ticket page" system to a more abstract "reporting system"? Since it can execute queries only let admins add to the high level reports. Or create a new permission for it.

I think I saw a while ago that you can run TH1 scripts from embedded docs. But I like to keep my project code separate from my version control system. So I'd rather not load up each one of my projects with the same set of TH1 scripts to provide extra functionality. To be honest, I've never tried this option because of the aforementioned reason.

If there was a mechanism where people could experiment and refine useful reports using TH1 and SQL, they could present them to the fossil community in their completed form. And you would not have to edit the foundational C code to add the extra functionality. Reports that everyone likes could be distributed with the system, and maybe more esoteric or specialized reports could be maintined on your site, and plugged-in to their running fossil system at will.

If the timeline connector code is modular, that could even be a feature that people could tap into for their reports. Special functionality could use the existing special character ability of the ticket system reports ("#" assumed to be a ticket number, "_" show character-for-character). You could have "$" mean this is a check-in hash, "*" is an artifact uuid.

These are just some ideas, but it's been something I've wanted for years. I gave up tinkering with the ticket system pages because I always felt like I was stretching their intended scope and was kludging my version control system.

I guess a downside would be the possible loss of standardization across fossil instances, and maybe a performance decrease.

(23) By anonymous on 2019-12-03 18:45:01 in reply to 22 [link] [source]

I gave up tinkering with the ticket system pages because I always felt like I was stretching their intended scope

The intended scope of Fossil's ticket system is to be useful. If your changes make the ticket system more useful to you, that's certainly within the intended scope.

Have you considered generalizing the dynamic nature of the "html ticket page" system to a more abstract "reporting system"?

This is too vague. Yes, Fossil's ticket system is rather basic, but at the places I've worked and other places that people I know work, for all the power of, for example, Jira, the reports managers seem to prefer are spreadsheets. They also like summary stats shown as graphs. Fossil can give you the summary stats. Just have to use another tool to produce the graphs.

(24) By Erik (elechak) on 2019-12-04 18:38:03 in reply to 23 [link] [source]

I think I have to disagree with you. The intended scope of Fossil's ticket system is to be a useful and customizable Ticket System. I was using the few customizable ticket system HTML/TH1 pages to generate all kinds of reports that realistically did not belong under a ticket system. I work in a highly regulated industy so our code and developer actions have to be audit ready.

Example Reports:

the last time each user checked in code
the files modified by a certain user over a given time span
all the users that touched a certain file over a given time span
list all the technotes and corresponding code checkin for specific delivery tags
show where check-in comments were edited
show all check-ins where the comments contained certain words

Those reports are not in scope of a Ticket System (or subsystem). Just ramming a bunch of reports that require input from the user into the ticket system because it is useful doesn't mean that the ticket system is conducive to that type of reporting. Shoehorning them in to the "New Ticket Page", "View Ticket Page", "Edit Ticket Page" and "Report List Page" was not optimal, but they were the only places where I could modify the HTML/TH1 code to accept input elements. I also modified Fossil's header as well.

What I propose is a General Fossil Reporting System that ties the version control, ticketing, technotes, and forum features together. Admins could create new HTML/TH1 pages displaying reports or showing input elements for reporting. This would provide a foundation to support custom reusable reporting.

I don't think this proposal is too vague. For the people that like Jira, spreadsheets, or graphs, that's fine. I like fossil. I just wish it made it easier to create custom reports based on user input.

(5) By jvdh (veedeehjay) on 2019-11-26 16:58:37 in reply to 3 [link] [source]

implementing this would be really appreciated and very helpful.

currently I abstain from file renames as far as possible (even where it would be desirable) for the sole reason that fossil does not backtrack across the rename. I view this a real limitation in comparison to mercurial, e.g., where that never was a problem.

since fossil actually seems to keep track of the renames (as far as I can tell from the manifests...) I wonder what is the principle obstacle here making it difficult/tedious to implement or being a problem performance wise?

(6) By Richard Hipp (drh) on 2019-11-26 17:12:50 in reply to 5 [link] [source]

The first gate is, what distinct names do you provide for the three definitions of "file" in my previous post.

If you really want Fossil to do a better job of showing the history of files across renames, then help out by inventing some descriptive and reasonably self-explanatory names for three distinct meanings of "file", for use in documentation and in comments.

(12) By jvdh (veedeehjay) on 2019-11-26 20:43:15 in reply to 6 [link] [source]

disclaimer: I don't can propose a better nomenclature right now and have not thought through the situation/problem fully. nevertheless my previous experience with mercurial tells me that the usual (maybe only really relevant?) desire of the user is to backtrack the timeline (on the command line, too) across renames that have happened in the past. just "finfo" for the full history of the file currently named "symbol" and formerly known as "prince". the "file" being distinct from the name tag currently attached to it.

where I want/miss this, is of course when I want to see the full history of the current file entity residing in the checkout with current name tag "symbol" back to when "prince" first was checked into the repo.

nor sure whether this helps and there might be severe complications or ambiguities I overlook but the above alone (i.e. track "mv" actions properly etc) would help a lot.

another observation: mercurial did get away with not having special terminology in place to to talk about this feature (backtracking across renames). is it really helpful/important or might it only confuse the average user?

my 2c ...

(19) By jshoyer on 2019-11-27 16:17:25 in reply to 6 [link] [source]

I would say ‘file lineage’ rather than ‘clan’ for your third category. However that phrase works better for the ancestors of a given artifact (tracking across renames) rather than all relatives. Being a biologist, I would call the set of all artifacts that derive from a common ancestor a ‘clade’.

Making it easier to see history of a file across renames will be a great improvement!