Fossil: The Annotate Algorithm

1.0 Introduction

The fossil annotate, fossil blame, and fossil praise commands, and the /annotate, /blame, and /praise web pages are all used to show the most recent check-in that modified each line of a particular file. This article overviews the algorithm used to compute the annotation for a file in Fossil.

2.0 Algorithm

Locate the check-in that contains the file that is to be annotated. Call this check-in C0.
Find all direct ancestors of C0. A direct ancestor is the closure of the primary parent of C0. Merged in branches are not part of the direct ancestors of C0.
Prune the list of ancestors of C0 so that it contains only check-ins in which the file to be annotated was modified.
Load the complete text of the file to be annotated from check-in C0. Call this version of the file F0.
Parse F0 into lines. Mark each line as "unchanged".
For each ancestor of C0 on the pruned list (call the ancestor CX), beginning with the most recent ancestor and moving toward the oldest ancestor, do the following steps:
1. Load the text for the file to be annotated as it existed in check-in CX. Call this text FX.
2. Compute a diff going from FX to F0.
3. For each line of F0 that is changed in the diff and which was previously marked "unchanged", update the mark to indicated that line was modified by CX.
Show each line of F0 together with its change mark, appropriately formatted.

3.0 Discussion and Notes

The time-consuming part of this algorithm is step 6b - computing the diff from all historical versions of the file to the version of the file under analysis. For a large file that has many historical changes, this can take several seconds. For this reason, the default /annotate webpage only shows those lines that were changed by the 20 most recent modifications to the file. This allows the loop on step 6 to terminate after only 19 diffs instead of the hundreds or thousands of diffs that might be required for a frequently modified file.

As currently implemented (as of 2015-12-12) the annotate algorithm does not follow files across name changes. File name change information is available in the database, and so the algorithm could be enhanced to follow files across name changes by modifications to step 3.

Step 2 is interesting in that it is implemented using a recursive common table expression.