challenges doing automation with fossil

(1) By mattwell on 2020-05-28 17:02:47 [link] [source]

The "friction" to do automation using fossil is higher than necessary.

Can't use sqlite3 - fossil is always ahead of the locally installed sqlite3 versions and language sqlite3 bindings do not work. Bummer. While directly accessing the db has some risk this is the most flexible and performant method and I really wish it was available. In fact when I first ran into Fossil I considered the accessible sqlite3 database format a feature but that turns out to be 100% not the case. The "fossil sqlite3" command does not solve this.

The text cli interface is not friendly for automation. E.g. the timeline puts dates on separate lines so a simple line based approach won't work. Fair enough, using the text cli output is a generally poor solution.

The json interface is a decent solution, I much prefer to go direct to the data files, but json is ok. Unfortunately json support is not available in the shipped binaries and thus you are forced to compile. Tiresome.

I compiled without json, realized my mistake, ran configure --json;make, but the new binary still does not have json support. You must first run make clean, not a big deal but annoying.

All in all a simple ask, convert the timeline to a spreadsheet, has been unnecessarily difficult.

I suggest either default to using sqlite3 from the OS install (my preference for sure) or at least make json enabled by default in the shipped binaries.

(2) By sean (jungleboogie) on 2020-05-28 17:24:32 in reply to 1 [link] [source]

Hi,

What's the problem you want to solve? Just having json support enabled by default for a better view of the timeline?

(3) By mattwell on 2020-05-28 17:29:54 in reply to 2 [link] [source]

Make it easy and reliable to build fossil into automation out of the box (i.e. no special install needed)

(7) By Stephan Beal (stephan) on 2020-05-28 18:06:51 in reply to 3 [link] [source]

Make it easy and reliable to build fossil into automation out of the box

i think Sean's question was intended as, "what specific automation problem is currently giving you grief?" Major changes to fossil's output are not going to happen overnight, but someone here might already have a solution to the specific problem you're trying to solve.

(4) By Warren Young (wyetr) on 2020-05-28 17:34:36 in reply to 1 [link] [source]

Can't use sqlite3...The "fossil sqlite3" command does not solve this.

I'm confused: if the standalone sqlite3 binary would do what you want, then why does fossil sql not do what you want? I wasn't aware that fossil sql was missing anything compared to standalone sqlite3, and in fact adds some things the latter lacks, like Fossil-specific custom SQL functions.

using the text cli output is a generally poor solution.

Yes, it's the old "Perl parsing command output" problem.

Perhaps Fossil could gain a reliably parseable output format, in the way that some commands have JSON, XML, or CSV output modes. Someone just has to want this enough to put the time into creating it.

Unfortunately json support is not available in the shipped binaries and thus you are forced to compile. Tiresome.

I assume there is a good reason for it not being enabled by default, but I don't know what it is. Hopefully someone will explain, which may then lead us to a possible solution.

ran configure --json;make, but the new binary still does not have json support.

Sounds like a dependency problem, easily corrected. I'd go after it myself, but I don't use the JSON API, so my incentive to put time into it is low.

default to using sqlite3 from the OS install

There's a build option for that, but you've already rejected it.

To the extent that this is even possible, it means we'd probably need to compile binaries for each OS to achieve it, which then means what we really want is for Fossil to get into official OS package repos.

That said, Fossil often makes use of bleeding-edge SQLite features, so your wish isn't very practical. You'll either end up with old Fossil versions so you can link to old platform SQLite versions, or you'd be back to the current solution: build against an internal up-to-date copy of SQLite.

My preference is to abandon this plan and figure out why your fossil sql and JSON options are failing.

(5) By Stephan Beal (stephan) on 2020-05-28 17:49:58 in reply to 1 [link] [source]

Unfortunately json support is not available in the shipped binaries and thus you are forced to compile. Tiresome.

Related: the json API was mostly implemented by yours truly back in 2011 and 12, and hasn't (to the best of my knowledge) been used all that much since then, aside from 3(?) wikis were i use it to host custom front-ends to wiki-only repos.

sqlite recently got its own JSON API, and i've kinda-sorta been idly wondering what it would be like to reimplement the current JSON API on top of that, rather than on the 3rd-party JSON library it currently uses (which was also implemented by me, so it's only "3rd party" in a pedantic sense).

Sidebar: the reason JSON is not on by default is because because i never wrote complete tests for it. Richard told me, way back when, that if it had thorough automated testing, including fuzz testing for malicious inputs, it could be enabled. So... if someone wants to volunteer to do that...

We "could," and maybe even "should," reimplement it on top of the sqlite3 JSON support. That would be, in terms of code, a completely rewrite, but we at least have an existing implementation to model it on, so it would be a port in which 100% of the code needs to be rewritten to use another API. It would, however, be making use of a JSON API which is inherently covered by sqlite3's extreme testing infrastructure.

That might also be a good chance to re-architect it. The current JSON API is admittedly somewhat over-engineered because (A) that's admittedly my general tendency and (B) the idea was that it could/should be used with arbitrary HTTP-capable clients. In practice it's simply not being used that way (not to my knowledge, anyway), so its interface could probably be drastically simplified to eliminate its over-formal request/response structure and error reporting mechanism.

That said: i'm not volunteering to do that. It would be a considerably undertaking and my RSI-plagued hands are kind of hit-and-miss these days - they work some days and not others. i would, of course, offer any possible support i could to someone(s) interested in taking it on, though.

(6) By Richard Hipp (drh) on 2020-05-28 17:50:59 in reply to 1 [link] [source]

I suggest either default to using sqlite3 from the OS install

In addition to serving as the version-control system for SQLite, Fossil is also a test platform for SQLite. Fossil is where we do our earliest beta-testing of new features. And so Fossil typically requires a very recent version of SQLite, if not the very latest version.T The SQLite installed by vendors typically lags by several years. Hence, your suggestion is unlikely to happen.

But, there are work-arounds.

You can use the "fossil sql" command. When you do this, you also get access to some custom functions and virtual tables to help you decode the underlying database.
You can update the SQLite installed on your OS to the very latest trunk version.

Be warned: The Fossil repository schema is not an API. We reserve the right to change it in the future. It has not changed (much) in a long time and we have no plans to do so anytime soon. But you should be aware that changes might happen. So if you access the database directly, there might come a time when you would need to modify your scripts.

(8) By mattwell on 2020-05-28 18:55:41 in reply to 6 [link] [source]

fossil sql doesn't help. I would like to be able to use Ruby, python etc. language bindings which won't be possible. Based on the fossil will be on the latest policy I understand that using the sqlite3 db for automation is not a good idea. We can let that possibility go.

There is currently no automation mechanism for fossil.

sqlite3 db - not an option json - not tested and not officially supported cli - not usable

It would be nice to add a bullet to the "What is Fossil?" page with something about how to do automation as this is a common need. I have code in my Megatest project for example that detects commits in fossils and triggers running of testsuites, builds etc. for continuous integration and continuous deployment. These are common requirements for integrating tools like fossil into CIT systems such as teamcity and jenkins.

(9) By Richard Hipp (drh) on 2020-05-28 19:11:48 in reply to 8 [link] [source]

What types of information are you hoping to extract?

We are open to adding a new command ("fossil lowlevel-data ..." or similar) that provides information in an easy to parse format, if it is helpful. But we need to have a better idea of what kind of information you need in order to do that.

(10) By mattwell on 2020-05-28 19:37:49 in reply to 9 [link] [source]

Everything is fair game, the json approach is a pretty good one - but here is a short list of automation needs I've had over the years:

automatic commit of changes after build and regression testsuite QA pass.
detect a commit to a branch
detect a new tag
extract timeline details (e.g. I needed the timeline as a spreadsheet)
update, modify other fossil views such as tickets, wiki pages, unversioned files etc.

Basically if you can do it from the command line or web page there is a good chance we'll want to do it automatically sooner or later.

I use Megatests ability to detect fossil commits and autobuild and QA Megatest itself. At work we have had code to detect fossil commits under teamcity and wish we had an out of the box support for Jenkins. I've used the wiki interface to gather control data and we now use unversioned extensively. We do all this automation via the command line which has worked ok. The timeline is the most problematic as it's output is not automation friendly.

(11) By Warren Young (wyetr) on 2020-05-28 21:01:44 in reply to 10 [link] [source]

The /timeline.rss feed is readily automated for such purposes:

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple;
use XML::RSS::Parser;

my $url = 'https://fossil-scm.org/fossil/timeline.rss?y=ci';
my $rss = LWP::Simple::get($url);
die "Unable to pull RSS feed: $!\n" unless $rss;
my $cis = XML::RSS::Parser->new->parse_string($rss);
my $vurl = $cis->query('/channel/item/link')->text_content;
my ($ver) = $vurl =~ m{
    ([0-9a-f]{40,64})    # Fossil version (commit artifact hash)
    $                    # ...at the end of the version /info URL
}x;

my $last = '';
my $lfile = $ENV{HOME} . '/.new-ci-last.txt';
if (-r $lfile) {
    open my $lfh, '<', $lfile;
    $last = <$lfh>;
    chomp $last;
}

if ($last ne $ver) {
    print 'New version committed since last check: ',
            substr($ver, 0, 10), "\n";
    open my $lfh, '>', $lfile;
    print $lfh $ver, "\n";
}
else {
    print "Nothing new since last check; get busy!\n";
}

Install deps with:

   $ sudo cpanm LWP::Simple LWP::Protocol::https XML::RSS::Parser

Add CI/CD and other such actions inside the "if" branch at the end.

Want to monitor something other than commits? Change the URL.

(12) By mattwell on 2020-05-29 02:19:15 in reply to 11 [link] [source]

Thanks Warren. I agree that the rss feed is useful. We tried using that early on but as I recall, load impact and complexity were concerns. I think we are still using rss in some automation. ASIDE: For anyone using fossil in a Linux based compute center with NFS file sharing - one lesson learned was do as much work on /tmp as possible. File locks from fossil and other tools can swamp the filer lock queues causing problems. If the locks are done on the fossil files in /tmp there are no lock calls to the filer and thus no swamping the filers.

My solution for commit detection was a fossil sync with upstream then look at the timeline using json to find new nodes derived from the last node where QA ran. In an environment with hundreds of fossil commits to dozens of fossils every day efficiency is a high priority.

Based on this discussion I think it'd be great if the fossil devs decided on one path and threw some effort at making that path full featured, trusted and available out of the box. I'd vote for the json support but maybe the rss feed would be a viable alternative.

Just my $0.02.

(13) By Warren Young (wyetr) on 2020-05-29 03:15:52 in reply to 12 [link] [source]

load impact and complexity were concerns.

I modified the program to point to a local Fossil server and added "&n=5" to limit the amount of XML it pulls, then benchmarked 100 iterations of the core loop. It takes about 15 ms per iteration. That's the XML generation from disk, the HTTP pull, the XML parsing, and the XML query, all in roughly the same time that your monitor refreshes.

Meanwhile, you're talking about CI/CD, which involves rebuilding and reinstalling the whole system.

You want something to quibble about, it's that the program as-is only works with repos where user category "nobody" has check-out capability. You'd have to add a login sequence and/or cookie management to get around that. LWP makes that pretty easy.

My solution for commit detection was a fossil sync with upstream...

This is faster than pulling /timeline.rss?

In an environment with hundreds of fossil commits...

...you have an average of one commit every 2 minutes. 0.015 seconds doesn't even show as a blip on the radar at that rate.

Run the script once a minute, and you'll never even notice the overhead.

I'd vote for the json support....

Why would a hypothetical /timeline.json run materially faster than the current code? I get that XML is heavier than JSON, but surely the bottleneck is the DB manipulation and network I/O, which will be approximately equal for both versions.

Let me guess: you don't like Perl? So what? Try it! Then if it works, you can rewrite the thing in your favorite programming language.

If you don't like XML, and you're rejecting the RSS option on those grounds, realize that RSS is tech underlying the world's blogging and podcasting infrastructure. The tools made to parse it are robust, plentiful, and fast.

(14) By anonymous on 2020-06-17 19:37:51 in reply to 12 [source]

but maybe the rss feed would be a viable alternative

In addition to /timeline.rss there is the fossil rss command.

fossil rss has the advantages that it is available "out of the box" and avoids using HTTP.

So your commit detection script could do:

    fossil pull
    fossil rss -y ci -n 1 >fossil.rss

Optionally, you add -tag YOUR_TAG to only see commits tagged with "YOUR_TAG". (Where YOUR_TAG is whatever tag you want to filter on.)

On the other hand, querying the upstream /timeline.rss allows:

    $rss=LWP::Simple::get($rssurl);
    # code to get versions
    if ($ver ne $last)
    {
        system 'fossil pull';
        # code to process latest commit
    }

So you avoid unnecessary pulls. If you are only looking for commits with a specific tag, this might save you some overhead.

On the third hand, if TH1 hooks were enabled in the published executables, then the upstream repo could send your CI/deployment server a notification.

(15.1) By Warren Young (wyoung) on 2020-06-17 20:33:08 edited from 15.0 in reply to 14 [link] [source]

fossil rss -y ci -n 1 >fossil.rss

There's no need to redirect the RSS out to a file: Perl lets you pipe a command's output into a variable. It's a small modification to the above script:

#!/usr/bin/env perl
use strict;
use warnings;

use English;
use XML::RSS::Parser;

my $rbase = 'reponame';
my $repo = $ENV{HOME} . "/museum/$rbase.fossil";
system("fossil pull -R $repo") == 0 or die "Failed to update repo: $!\n";
my $cmd = "fossil rss -type ci -limit 1 -R $repo";
open my $fossil, '-|', $cmd or die "Failed to pull RSS from $repo: $!\n";
my $rss = do { local($INPUT_RECORD_SEPARATOR); <$fossil> };     # slurp!
close $fossil;

my $cis = XML::RSS::Parser->new->parse_string($rss);
my $vurl = $cis->query('/channel/item/link')->text_content;
my ($ver) = $vurl =~ m{
    /info/               # sanity check: only care about info URLs
    ([0-9a-f]{40,64})    # Fossil version (commit artifact hash)
    $                    # ...at the end of the check-in's info URL
}x;

my $last = '';
my $lfile = $ENV{HOME} . "/.new-ci-$rbase-last.txt";
if (-r $lfile) {
    open my $lfh, '<', $lfile;
    $last = <$lfh>;
    chomp $last;
}

if ($last ne $ver) {
    print 'New version committed since last check: ',
            substr($ver, 0, 10), "\n";
    open my $lfh, '>', $lfile;
    print $lfh $ver, "\n";
}
else {
    print "Nothing new since last check; get busy!\n";
}

This version changes a few incidental things:

The shebang line is more portable.
It pins the /info URL more reliably in the regex.
Adds $rbase, used both in the repo path name and the cache for last-seen commit ID, so a single host can monitor multiple repos.
It now pulls only a single RSS record, which should be all it needs, given that it already filters for commit records. The last one should be the only one relevant to the script.

EDIT: By exchanging potentially-remote HTTP access for local CLI access, you no longer need to either give user category nobody cloning rights or work out the LWP cookie stuff that allows you to remember a login cookie. The downside is that this version requires two separate transactions against the repo rather than one: the HTTP method doesn't need the explicit "pull" operation, because the RSS is always up-to-date when pulling from a remote server rather than a potentially out-of-date local clone.

(16) By Warren Young (wyoung) on 2020-06-17 20:36:44 in reply to 15.1 [link] [source]

Ironically, this version is probably slower than the HTTP version, even if you do login and cookie handling, because this one has to prospectively pull the upstream changes before fossil rss can be relied upon to give trustworthy results.

With the HTTP method, the CI/CD buildbot can put off the "pull" until after the remote host declares that the latest commit has a new commit ID compared to the last one the script saw.

(18) By anonymous on 2020-06-17 20:52:25 in reply to 15.0 [link] [source]

chomp $last;

Side note: It is generally more reliable to use $last =~ s/[\x0a\x0d]+\z//; instead because it matches any of the 3 commonly used line endings while chomp only matches the current value of $INPUT_RECORD_SEPARATOR, which is not a regular expression.

(17) By Erik (elechak) on 2020-06-17 20:51:19 in reply to 10 [link] [source]

I find it quite easy to automate fossil (except for the whole check-in thing currently being discussed in a separate thread).  I use the command line functionality and python's subprocess module.

Here's a quick script I threw together to help you with your timeline issue.  You can use xlsxwriter to convert the output to Excel or write it to csv.

I've done some crazy automation with fossil and the command line output has been consistently predictable and stable.  Two important attributes for parsing subprocess output.

I hope this helps.


import subprocess

def run(cmd):
    x = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    x.wait()
    return x.stdout.read().decode("ascii").split("\n")

def fossil_timeline(checkin="", limit=50):
    row = []
    date = ""
    for line in run(f"fossil timeline {checkin} -n {limit}"):
        if line.startswith("---"): break
        if line.strip()=="":continue
        if line.startswith("=="):
            date = line.strip("=").strip()
            continue
        time,checkout,message = line.split(None,2)
        checkout = checkout[1:-1]
        row.append([date,time,checkout,message])
    return row

for r in fossil_timeline():
    print(r)

(19) By anonymous on 2020-06-17 21:16:55 in reply to 17 [link] [source]

I think the output of fossil rss is a better choice.

Though, I have been forced to handle the output of fossil changes and fossil info in build scripts (to control insertion of version info into the build product).