Learning TH1: getting started, a study path

(1) By matt w. (maphew) on 2021-08-20 17:57:16 [link] [source]

Having wrapped some of my brain around the various features and differences between wiki pages and tech docs I'm ready to start exploring how I might leverage TH1 scripts to extend fossil-as-cms to cover my use case (porting my website of static html files to fossil rendering of markdown files).

I've been looking through the docs and fossil code for *.th1 and sending interesting looking fragments through the /admin_th1 page. Very little of what I've tried so far works though so it's not proving to be a fruitful learning path.

What are some existing scripts that work out of the box that I can look at and study and learn from?

If it helps, here's a pseudo code description of my initial goal:

for *.md recursive:
	get first "#" to end of line as Title
	get next nn characters as Intro
	append [Title, Intro] to Entries
	
with "index.md" open for writing:
	for Entries:
		write "[Title][path/to/my_page.md]"
                write "_Intro ..._"
        save and close

render index.md

I'm using Fossil 2.17 on Windows.

(2) By Stephan Beal (stephan) on 2021-08-20 22:27:21 in reply to 1 [link] [source]

What are some existing scripts that work out of the box that I can look at and study and learn from?

We don't really have a catalog of th1 scripts other than those in the skins/*/* files.

Of use, however:

fossil test-th-render
fossil test-th-eval
fossil test-th-script (that the real difference is between this and test-th-render, i'm not sure)

The difference between render and eval is that the latter expects its input to be only th1 code, whereas the former expects it to be a text file with th1 code wrapped in <th1>...</th1> blocks.

Be sure to see fossil help test-th-... for details.

with "index.md" open for writing:

To the best of my recollection, th1 doesn't(?) offer any functions which either read or write files. The type of amalgamation you're trying to do needs to be done as part of a build process or by adding full TCL support to fossil's TH1: see ./configure --help for some details but i have zero experience with adding TCL support so can't say anything useful about it.

(3) By mistachkin on 2021-08-21 00:59:38 in reply to 2 [link] [source]

I think you meant "test-th-source" (which evaluates a file), not "test-th-script"...

(4) By Stephan Beal (stephan) on 2021-08-21 01:36:31 in reply to 3 [link] [source]

I think you meant "test-th-source" (which evaluates a file), not "test-th-script"...

Indeed! test-th-eval accepts either a file or script string in the mean time, so test-th-source might now be redundant.

(5.1) By matt w. (maphew) on 2021-08-26 03:11:56 edited from 5.0 in reply to 2 [link] [source]

Thanks for that much speedier path to testing and seeing results Stephan.

Ok, so I've learned how to do a simple for loop:

puts "\n---------------------------------------\n"
puts "# A simple FOR loop, list the items in VALUES variable\n"
set values {one two three four five}
foreach x $values {puts "  - $x\n"}
puts "---------------------------------------\n"

result:

---------------------------------------
# A simple FOR loop, list the items in VALUES variable
  - one
  - two
  - three
  - four
  - five
---------------------------------------

What I haven't managed to figure out is capturing the result of another command into something that could be used in a loop. I've gotten as far as capturing the names of environment variables into a th1 var, but am stuck on getting to the values inside the var.

# find and save environment variable names
catch "info vars" env_vars
# print the var names
puts $env_vars

Now let's try to retrieve var contents:

puts "\n---------------------------------------\n"
puts " # Environment variable values:\n"
foreach x $env_vars {puts "  - $x\n"}
foreach x $env_vars {setting $x}

Natch:

---------------------------------------
 # Environment variable values:
  - values
  - x
  - tcl_platform

(6) By Stephan Beal (stephan) on 2021-08-26 03:54:10 in reply to 5.1 [link] [source]

What I haven't managed to figure out is capturing the result of another command into something that could be used in a loop.

catch "info vars" env_vars

Try (no pun intended):

set env_vars [info vars]

(7) By stevel on 2021-08-26 03:58:00 in reply to 5.1 [link] [source]

There is a tutorial on the Tclers Wiki at https://wiki.tcl-lang.org/page/Tcl+Tutorial+Lesson+0. While th1 is a subset of the Tcl that will nevertheless get you started.

(8) By matt w. (maphew) on 2021-08-26 09:04:55 in reply to 6 [link] [source]

That seems to have identical results? (though I welcome learning that [...] can be used like backticks in a bash shell).

(9) By matt w. (maphew) on 2021-08-26 09:05:49 in reply to 7 [link] [source]

Thank you! what a marvelously clean and informative site.

(10.1) By matt w. (maphew) on 2021-08-26 09:49:20 edited from 10.0 in reply to 5.1 [link] [source]

after much experimentation I've learned a critical point: one must initialize or open the repo within the session using checkout 1 before th1-eval can use it. After this things like globalState checkout will work. More importantly for my project: now dir for identifing list of files to operate on can work.

checkout 1
catch "globalState checkout" checkout
catch "globalState repository" repository
puts "Checkout: $checkout\n"
puts "Repository: $repository\n"
puts "\n---------------------------------------\n"
puts "Can we get list of filenames?\n"
catch "dir trunk *Install*.md" posts
foreach p $posts {puts "$checkout/$p\n"}
puts "\n---------------------------------------\n"

Out:

Checkout: C:/Users/matt/www/maphew.com/
Repository: C:/Users/matt/www/maphew.com.fossil

---------------------------------------
Can we get list of filenames?
C:/Users/matt/www/maphew.com//blog/HowTo/Installing_Leo_Editor_on_Windows.md
...snip...

Closing in on showing the contents of the files, it looks like we need to use artifact ID. However there is still a critical unsolved bit: how to get the all important ID? In the example below 'ec82fe99a4' is the .md file's ID. I haven't yet found a way to get that hash number using the the path info we retrieved above.

set fileid ec82fe99a4
catch "artifact $fileid" article

This next bit does same thing, but going from the checkout tag ID instead of a specific file. It was manually recorded from fossil info cmd line:

set trunkid 0ed9438f5
catch "artifact $trunkid blog/HowTo/Installing_Leo_Editor_on_Windows.md" article

# just print the first 100 characters
puts [string range "$article" 0 100]...

Out:

---------------------------------------                             
#  Installing Leo Editor on Windows                                                                                                     

*This is for a virgin system without python etc installed. ... 
---------------------------------------

(11) By Stephan Beal (stephan) on 2021-08-26 11:53:56 in reply to 10.1 [link] [source]

I haven't yet found a way to get that hash number using the the path info we retrieved above.

i think the only way to do that with th1 is the query command. However, it will require "getting somewhat comfortable" with the fossil db schema (keeping in mind that most of the schema is an internal implementation detail which can change at any time (but rarely does)).

In any case, there is no 1-to-1 mapping of file name to hash. Any given file name can reference any number of hashes across any number of checkins, and may even refer to semantically different files which had, at some point in their history, the same name.

From the CLI:

sqlite> select * from files_of_checkin where checkinID=53032 limit 2;
53032,'.dockerignore','cd9c1b2de67ec60db244641291e2b5d8288e1d25fca4bf5a322592a98e3680d5',NULL,NULL
53032,'.editorconfig','132c5a213aa3ce13dcc9c19f8a7ea306e3640bec4ae693378116cee339c34a1a',NULL,NULL

where 53032 is the blob.rid value of a checkin. The hashes seen there are those of the file for that checkin.

My personal opinion is that trying to do long/complex logic in th1 is generally not worth the hassle, primarily because th1's error reporting capabilities are extremely limited. Imagine having 200 lines of th1 and the result of running them is simply "Error," with no indication of which line the problem is on.

If you're wanting to do any serious amount of scripting, i'd strongly recommend enabling the full tcl integration, which allows th1 to run tcl scripts.

[stephan@nuc:~/fossil/fossil]$ ./configure --help
Usage: configure [options] [settings]

...
  --with-tcl=path                     Enable Tcl integration, with Tcl in the specified path
  --with-tcl-stubs                    Enable Tcl integration via stubs library mechanism
  --with-tcl-private-stubs            Enable Tcl integration via private stubs mechanism

However, i have zero experience with that and cannot recommend how to go about adding this feature and then making use of it. The th1 docs describe several commands for running tcl code, though (search that file for "tcl").

(12) By matt w. (maphew) on 2021-08-27 04:07:54 in reply to 11 [link] [source]

Possible workaround to getting file hash id without having to learn database internal structure is ""the "fossil setting manifest on" command will cause the manifest file to be materialized to disk"" (ref).

Resulting manifest file is a very simple space delimited filename ID format, so parsing to get individual file ID is straightforward, provided we can get to the filesystem with th1.

Oh! better yet is manifest.uuid which contains one thing: the current checkout ID. So we don't have to the step of searchig for a file ID ourselves as th1 can do that for us in one step: artifact $trunkid path/to/readme.md

Still have to solve opening and reading the file though.

(13) By Stephan Beal (stephan) on 2021-08-27 04:36:38 in reply to 12 [link] [source]

provided we can get to the filesystem with th1.

We can't, though. th1 is intended primarily for use in skins and tickets, and those are not intended to have any access to filesystem-level resources. From a security perspective, every possibility to access filesystem-level resources via the web is a security hole waiting to happen. (Just ask the PHP guys and their clients, like WordPress (probably the single-most-often-cracked software on the planet).) Fossil's only access to the filesystem is sandboxed via the sqlite dbs it uses, with the optional exception of the /ext path, which can provide access to whatever the client cares to stick in that path.

Still have to solve opening and reading the file though.

You can stop looking right now ;). Even if th1 provided filesystem-level access, that wouldn't help you with a remote respository, as they have no associated checkout and therefore no manifest.* files.

Hypothetically the th1 tcl integration offers a possibility for filesystem access, but i've never used/needed it.

(14) By matt w. (maphew) on 2021-08-27 15:00:03 in reply to 13 [link] [source]

Aww, well. Nothing to do then but keep diving. Can you suggest a doc or discussion that outlines the db table and field structure? Reading the results of 'database' search hasn't been too revealing. (Well, it's revealed a lot, and I'm learning, but table and columns names have not been discussed so far.)

(15) By Stephan Beal (stephan) on 2021-08-27 18:31:27 in reply to 14 [link] [source]

Can you suggest a doc or discussion that outlines the db table and field structure?

It isn't documented in details because it's all private implementation details which are subject to change at any time, but src/schema.c contains the terse docs. Fossil has a global --sqltrace flag which tells it to send all queries it runs to stdout, which can be extremely helpful in studying how it uses its db but can also, depending on which command is running, generate tons and tons of output.

(16.1) By matt w. (maphew) on 2021-08-31 10:16:18 edited from 16.0 in reply to 1 [source]

YES! Thanks to patient guidiance from Stephan Beale here a th1 script that fetches and displays (to console) the first few characters from a markdown file from the latest commit to trunk.

puts "\n=======================================\n"
puts "Reinitializing TH1 interpreter [reinitialize]\n"
# open the repo
puts "[repository 1]\n"

# get the highest reference id associated with 'trunk' tag (latest commit)
query {
    SELECT max(rid) as rid, value FROM tagxref where value like $postBranch
    } {
    puts "$rid, $value\n"
    }

# get the blob uuid associated with that ref-id 
# (The same hash as would be reported from `fossil info trunk`)
query {
    SELECT rid,uuid FROM blob WHERE rid=$rid
    } {
    puts "$rid, $uuid\n"
    }

# from that uuid get content of a fil
catch "artifact $uuid blog/index.md" article
puts "\n---------------------------------------\n"
puts [string range "$article" 0 100]...
puts "\n---------------------------------------\n"
puts "\n=======================================\n"
puts ""

Output:

=======================================
Reinitializing TH1 interpreter
D:/Matt/www/maphew.com-fossil/maphew.com.fossil
trunk, 432
432, 9f6d0f581b494a1101f8eb473551f847eecaeada64545ef7ea134af17ab37dd6,
---------------------------------------
# Sitemap

-   [Home](index.md)
    -   [About Maphew](About_Maphew.md)
    -   Batch files:
        ...
---------------------------------------

=======================================

2021-08-31: updated to fix bug in max(rid), courtesy of Andreas