Parsable output
(1) By Warren Young (wyoung) on 2021-02-06 23:40:06 [source]
In an attempt to cut off a threadjacking attempt...
Parsable output has come up several times, but the proposals tend to be either vague or overambitious.
I think what we want is for someone sufficiently interested to audit the command set to collect those where they think this should happen, then give example outputs for these command that meet the goal.
In many cases, I think we'll need to specify an alternate output format, the emitted details being too mixed to permit simple parsing.
Take fossil stat
output, for instance. How can we make that reliably parsable without an option like --format=json
to handle the disparate key/value pairs, some of which may have newlines and other embedded characters that make parsing difficult?
I think we can strap all of this atop SQLite's output formatting logic, building virtual tables for the output and then sending them to it, to minimize the amount of new code. Then we get 14 different output formats for free, many of which are parsable.
I'm fairly interested in this myself. I tire of passing output thru cut -f12-
and such to strip off human-readable row headers and the like.
(2) By Larry Brasfield (LarryBrasfield) on 2021-02-07 00:29:51 in reply to 1 [link] [source]
In an attempt to cut off a threadjacking attempt...
As the would-be threadjacker, I would say it was more a gentle, closely related diversion and extension of a thread that was more or less dead if restricted to its narrow original topic. [a]
[a. Given the common understanding of "threadjacking", as "The act of taking over an e-mail list or discussion thread with a subject unrelated to the original posting", it was a pathetic "attempt". I disown as such. ]
On vagueness, ambition, deciding where automaton formatting should happen, and how it might be done:
It seems to me that the target outputs are precisely [b] those which are created by stepping through a DB query result set to produce output, even if the row count is expected to be 0 or 1 sometimes.
[b. This is the polar opposite of vaguely. There would be no harm if a few commands that get result sets for output would respect the new option even if nobody could think of a needed output-consuming tool. ]
If there was an effectively global flag, named output_is_for_an_automaton_consumer, a very simple format suggests itself: CSV (or TSV with appropriate escapes.) Every command that does work to get the meat of its output ready to be blurted row by row would instead hand the DB connection off to the automaton feeder instead if it saw that flag true. A leading column names header would be nice, but nicer yet to be able to optionally skip it.
I don't see this as a particularly ambitious change. If I did it, I would probably add two flags: --csv-output and --sql-insert-output. As the SQLite shell already has those, it would be but a few minutes more work.
(3) By Stephan Beal (stephan) on 2021-02-07 07:24:21 in reply to 2 [link] [source]
I don't see this as a particularly ambitious change.
Except that it doubles the amount of work needed for output-generating routines, and requires a good deal of dentistry if we're going to go retrofit all commands.
If you dig deep into the forum and mailing lists you'll find several warnings from myself, in response to people parsing fossil's output, that we've never had/made any stability guarantees regarding output and that scripting is therefore done at one's own risk.
Ideal? Not at all, but few, if any, of us have the wherewithal to commit to keeping any given output stable for years to come, nor the desire to be constrained by it.
That's not to say that i'm against seeing options added (as opposed to adding options!) like --csv
or --sql
to get certain output shapes for certain commands, but being compelled to support X formats of output, and keep them stable against a written spec, for arbitrary commands would be a tremendous drag, both on productivity and motivation.
(4) By skywalk on 2021-02-07 16:28:10 in reply to 1 [link] [source]
Wow, many uptoots for this! Echo the input and delimited output with only 1 NewLine type please!!
(5) By Larry Brasfield (LarryBrasfield) on 2021-02-07 17:35:32 in reply to 3 [link] [source]
I don't see this as a particularly ambitious change.
Except that it doubles the amount of work needed for output-generating routines,
I would agree that doubling the output generation work should be deemed ambitious. But the way I would regularize the output for a certain class of commands would not require reformatting all the little printf()-like (or blob_append()-like) calls. That's too much work, and would seriously clutter the code. However, what I have in mind and submit is much less work is: Make the automaton_feeder(...) function know nothing about individual commands. Once written, it requires little ongoing change. It would accept a prepared statement, or possibly the built/constant SQL for a pure query. ("SELECT") It would also need a mapping, for each result column whose value must be transformed for output, from possible result value to string value. [a] For sake of making automaton output stability easier, it might also accept a mapping from query column ordinals to formal (and stable) column names.
[a. A vector of callbacks would work, where the referenced functions would be usable in ordinary, prettified, human-readable output. NULL callbacks would mean "no unusual transformation". With a modicum of "dentistry", use of those callbacks would make the code targeting humans more readable. (There is a much repeated pattern of translating flags to text, ripe for some refactoring.) ]
requires a good deal of dentistry if we're going to go retrofit all commands.
Ughh. Too ambitious. What I recommended be affected was "precisely [command outputs] which are created by stepping through a DB query result set to produce output, even if the row count is expected to be 0 or 1 sometimes."
As for dentistry, there would be low impact on the existing output generation code except for the refactoring I mention in footnote a. I dare say, just as is usually true after work by a competent dentist, the result would be an improved state of affairs.
If you dig deep into the forum and mailing lists you'll find several warnings from myself, in response to people parsing fossil's output, that we've never had/made any stability guarantees regarding output and that scripting is therefore done at one's own risk.
That's the situation that goes without saying, absent a stability guarantee. It was the absence of such a guarantee, coupled with Mr. Hipp's creation/publication of a fossil-output-consuming script, that led to my suggestion that there should be such a guarantee for a subset of the commands. As I mentioned there, the Subversion project (leaders) pledged early on to such a guarantee.
Ideal? Not at all, but few, if any, of us have the wherewithal to commit to keeping any given output stable for years to come, nor the desire to be constrained by it.
I submit that, with the exception of the output column transformation mappings, which would not need to change [b], there would be very little maintenance required to keep the automaton feeder output stable. That would be much of the reason to get its generation into a single (and stable) function. Your conviction that I propose a maintenance nightmare tempts me to show how it can be done with minimal ongoing impact on existing output generation or the freedom to prettify and adorn it differently.
[b. The names with which Fossil denotes its working concepts are highly likely to survive until the Fossil conceptual behavior model is drastically altered. In that case, some name changes and subsequent breakage of some output-consumers would be a good outcome. ]
... but being compelled to support X formats of output, and keep them stable against a written spec, for arbitrary commands would be a tremendous drag, both on productivity and motivation.
You again tempt me to show that it would not be even a moderate drag. I believe that the slight refactoring of flag-to-text code I anticipate would prove to reduce the present drag on productivity. Of course, I cannot speak to motivation, except to say that I have worked with depressingly messy code, code that is a joy to see and evolve, and much in between. I do not propose, and would not condone or participate in, any shift of the Fossil codebase toward the messy end of that spectrum.