Fossil User Forum

Is there any direct SQL Query to get the content of file
Login

Is there any direct SQL Query to get the content of file

Is there any direct SQL Query to get the content of file

(1) By anonymous on 2019-05-20 04:19:34 [link] [source]

I have many repositories and trying to create some tools by parse some file in each repo using direct SQLite3 Query. Using fossil command, for example I can run like this:

fossil cat js/app.js -r trunk -R my-repo.fossil | grep version: and it return string for example: version: 1.2.3

Is there an alternatif Query SQL to get such information, or Query to get the content of some file from the latest trunk ?

(2) By Florian Balmer (florian.balmer) on 2019-05-20 06:55:54 in reply to 1 [link] [source]

I've tried this:

SELECT content(blob.uuid) FROM blob WHERE blob.rid = "RID";

However, it turned out that the line endings were modified, see this post for details:

[fossil-users] Direct SQL dumps have modified line endings

(4) By Richard Hipp (drh) on 2019-05-20 11:21:41 in reply to 2 [link] [source]

The line endings are only changed if you use the "fossil sql" command to extract the content manually on a Windows machine. I think you can overcome that by using the command ".binary on" first, before extracting the content. You can also use the something like this:

SELECT writefile('out.txt',content(uuid)) FROM blob WHERE rid=:rid

(7) By Florian Balmer (florian.balmer) on 2019-05-20 17:55:53 in reply to 4 [link] [source]

Thank you for the information, I'll try your suggestions.

My script is not pure SQL, so using fossil artifact with the RIDs extracted by SQL queries was an alternative.

(3) By Stephan Beal (stephan) on 2019-05-20 06:59:05 in reply to 1 [link] [source]

The short answer is "no", but...

Some years ago i asked Richard how to get a list of all files for a specific version of the repo, and he explained that doing so would require at least 1 db record for each file for each checkin. For moderately-sized repositories that would mean millions and millions of db records. For example, in the core fossil repo there are 12367 checkin and currently 874 files = 10.8 million entries. For larger repositories that number could easily grow into the hundreds of millions.

One thing to keep in mind about fossil's data is that ALL of the SCM-related content is actually only in the blob table. ALL other SCM-related data in the database is derived from the raw data stored in the blob table. The whole database, except for the blob table, is a cached/normalized "rephrasing" of the blob table.

However, there may be a way to get what you're looking for...

fossil ls

can list the names of all files in a given checkin, but it can't list their versions (because of the limitation mentioned above):

fossil ls -R ~/fossil/fossil.fsl -r trunk | head

However, fossil can output the raw content of a checkin's "manifest":

fossil artifact trunk | head

So you could do something like:

[stephan@lapdog:~/fossil/fossil]$ f artifact trunk | grep src/json.c | cut -d' ' -f3
356dc00fca5481569a58c65a23f85b880edc11e363526226de62da844afe74e9
f53fa14ab92730d30be35311b52a17848eeabe44

Ooops, we need to be careful to escape the period:

[stephan@lapdog:~/fossil/fossil]$ f artifact trunk | grep 'src/json\.c' | cut -d' ' -f3
356dc00fca5481569a58c65a23f85b880edc11e363526226de62da844afe74e9

or use grep's -w flag:

[stephan@lapdog:~/fossil/fossil]$ f artifact trunk | grep -w src/json.c | cut -d' ' -f3
356dc00fca5481569a58c65a23f85b880edc11e363526226de62da844afe74e9

Once you know the version of a file, there are multiple ways to get its raw contents. Using the file json.c, version 356dc00fca5, as an example:

fossil artifact 356dc00fca5

Or, using SQL:

echo "select content('356dc00fca5')" | fossil sqlite  | head

Hmmmm... that last one doesn't work like i thought it did: it outputs the hex-encoded form of the content.

Note that simply outputing the content of the blob.uuid record 356dc00fca5 would probably not work because that record might (might) be compressed and/or delta-encoded:

sqlite> select uuid, size, length(content) from blob where uuid like '356dc00fca5%';
'356dc00fca5481569a58c65a23f85b880edc11e363526226de62da844afe74e9',73408,50

But note that the length() SQL function doesn't work for binary data, so its output value is not reliable in this context. It does, however, demonstrate that the value of the size field (which records the "real" size of the content) differs from the length() of the blob's content field (the file content, compressed and (possibly) delta-encoded).

(5) By Richard Hipp (drh) on 2019-05-20 11:26:53 in reply to 3 [link] [source]

Some years ago i asked Richard how to get a list of all files for a specific version of the repo, and he explained that doing so would require at least 1 db record for each file for each checkin.

Yeah, but afterwards I added a new virtual table "files_of_checkin" to do this for you without adding all those additional records to the database.

See the header comment for a terse description of how to use this virtual table. The files_of_checkin virtual table is actually used by Fossil itself in the "search" logic.

(6) By Stephan Beal (stephan) on 2019-05-20 11:39:02 in reply to 5 [link] [source]

Yeah, but afterwards I added a new virtual table "files_of_checkin" to do this for you without adding all those additional records to the database.

To my shame, i had forgotten about that .

(8) By Florian Balmer (florian.balmer) on 2019-05-23 07:58:38 in reply to 3 [source]

It's necessary to use the -list option for non-hexdump output:

echo SELECT content(blob.uuid) FROM blob WHERE blob.rid = "RID"; | fossil sql -list

But I haven't found out how to use .binary on to preserve line endings with scripting:

echo .binary on; SELECT content(blob.uuid) FROM blob WHERE blob.rid = "RID"; | fossil sql -list
→ Usage: .binary on|off

Moreover, redirection to a file produces an error, but the file seems to be written correctly:

echo SELECT content(blob.uuid) FROM blob WHERE blob.rid = "RID"; | fossil sql -list > output.txt
→ Error: sqlite3_close() returns 5: unable to close due to unfinalized statements or unfinished backups

(The writefile trick mentioned above works to preserve line endings, but I would like to redirect the output of the SQL statement to a pipe.)