Fossil: serverext.wiki

File www/serverext.wiki from the latest check-in

<title>CGI Server Extensions</title>

<h2>1.0 Introduction</h2>

If you have a [./server/|Fossil server] for your project,
you can add [./aboutcgi.wiki|CGI]
extensions to that server.  These extensions work like
any other CGI program, except that they also have access to the Fossil
login information and can (optionally) leverage the "[./customskin.md|skins]"
of Fossil so that they appear to be more tightly integrated into the project.

An example of where this is useful is the 
[https://sqlite.org/src/ext/checklist|checklist application] on
the [https://sqlite.org/|SQLite] project.  The checklist
helps the SQLite developers track which release tests have passed,
or failed, or are still to be done.  The checklist program began as a
stand-alone CGI which kept its own private user database and implemented
its own permissions and login system and provided its own CSS.  By
converting checklist into a Fossil extension, the same login that works 
for the [https://sqlite.org/src|main SQLite source repository] also works
for the checklist.  Permission to change elements of the checklist
is tied on permission to check-in to the main source repository.  And
the standard Fossil header menu and footer appear on each page of
the checklist.

<h2>2.0 How It Works</h2>

CGI Extensions are disabled by default.
An administrator activates the CGI extension mechanism by specifying
an "Extension Root Directory" or "extroot" as part of the 
[./server/index.html|server setup].
If the Fossil server is itself run as 
[./server/any/cgi.md|CGI], then add a line to the 
[./cgi.wiki#extroot|CGI script file] that says:

<pre>
extroot: <i>DIRECTORY</i>
</pre>

Or, if the Fossil server is being run using the 
"[./server/any/none.md|fossil server]" or
"[./server/any/none.md|fossil ui]" or 
"[./server/any/inetd.md|fossil http]" commands, then add an extra 
"--extroot <i>DIRECTORY</i>" option to that command.

The <i>DIRECTORY</i> is the DOCUMENT_ROOT for the CGI.
Files in the DOCUMENT_ROOT are accessed via URLs like this:

<pre>
https://example-project.org/ext/<i>FILENAME</i>
</pre>

In other words, access files in DOCUMENT_ROOT by appending the filename
relative to DOCUMENT_ROOT to the [/help?cmd=/ext|/ext]
page of the Fossil server.
Files that are readable but not executable are returned as static
content.  Files that are executable are run as CGI.

<h3>2.1 Example #1</h3>

The source code repository for SQLite is a Fossil server that is run
as CGI.  The URL for the source code repository is [https://sqlite.org/src].
The CGI script looks like this:

<verbatim>
#!/usr/bin/fossil
repository: /fossil/sqlite.fossil
errorlog: /logs/errors.txt
extroot: /sqlite-src-ext
</verbatim>

The "extroot: /sqlite-src-ext" line tells Fossil that it should look for
extension CGIs in the /sqlite-src-ext directory.  (All of this is happening
inside of a chroot jail, so putting the document root in a top-level
directory is a reasonable thing to do.)

When a URL like "https://sqlite.org/src/ext/checklist" is received by the
main webserver, it figures out that the /src part refers to the main
Fossil CGI script and so it runs that script.  Fossil gets the remainder
of the URL to work with: "/ext/checklist".  Fossil extracts the "/ext"
prefix and uses that to determine that this a CGI extension request.
Then it takes the leftover "/checklist" part and appends it to the
"extroot" to get the filename "/sqlite-src-ext/checklist".  Fossil finds
that file to be executable, so it runs it as CGI and returns the result.

The /sqlite-src-ext/checklist file is a
[https://wapp.tcl.tk|Wapp program].  The current source code to the
this program can be seen at
[https://www.sqlite.org/src/ext/checklist/3070700/self] and
recent historical versions are available at
[https://sqlite.org/docsrc/finfo/misc/checklist.tcl] with
older legacy at [https://sqlite.org/checklistapp/timeline?n1=all]

There is a cascade of CGIs happening here.  The web server that receives
the initial HTTP request runs Fossil as a CGI based on the
"https://sqlite.org/src" portion of the URL.  The Fossil instance then
runs the checklist sub-CGI based on the "/ext/checklists" suffix.  The
output of the sub-CGI is read by Fossil and then relayed on to the
main web server which in turn relays the result back to the original client.

<h3>2.2 Example #2</h3>

The [https://fossil-scm.org/home|Fossil self-hosting repository] is also
a CGI that looks like this:

<verbatim>
#!/usr/bin/fossil
repository: /fossil/fossil.fossil
errorlog: /logs/errors.txt
extroot: /fossil-extroot
</verbatim>

The extroot for this Fossil server is /fossil-extroot and in that directory
is an executable file named "fileup1" - another [https://wapp.tcl.tk|Wapp]
script.  (The extension mechanism is not required to use Wapp.  You can use
any kind of program you like.  But the creator of SQLite and Fossil is fond
of [https://www.tcl.tk|Tcl/Tk] and so he tends to gravitate toward Tcl-based
technologies like Wapp.)  The fileup1 script is a demo program that lets
the user upload a file using a form, and then displays that file in the reply.
There is a link on the page that causes the fileup1 script to return a copy
of its own source-code, so you can see how it works.

<h2 id="cgi-inputs">3.0 CGI Inputs</h2>

The /ext extension mechanism is an ordinary CGI interface.  Parameters
are passed to the CGI program using environment variables.  The following
standard CGI environment variables are supported:

  *  AUTH_TYPE
  *  AUTH_CONTENT
  *  CONTENT_LENGTH
  *  CONTENT_TYPE
  *  DOCUMENT_ROOT
  *  GATEWAY_INTERFACE
  *  HTTPS
  *  HTTP_ACCEPT
  *  HTTP_ACCEPT_ENCODING
  *  HTTP_COOKIE
  *  HTTP_HOST
  *  HTTP_IF_MODIFIED_SINCE
  *  HTTP_IF_NONE_MATCH
  *  HTTP_REFERER
  *  HTTP_USER_AGENT
  *  PATH_INFO
  *  QUERY_STRING
  *  REMOTE_ADDR
  *  REMOTE_USER
  *  REQUEST_METHOD
  *  REQUEST_SCHEME
  *  REQUEST_URI
  *  SCRIPT_DIRECTORY
  *  SCRIPT_FILENAME
  *  SCRIPT_NAME
  *  SERVER_NAME
  *  SERVER_PORT
  *  SERVER_PROTOCOL
  *  SERVER_SOFTWARE

Do a web search for
"[https://duckduckgo.com/?q=cgi+environment_variables|cgi environment variables]"
to find more detail about what each of the above variables mean and how
they are used.
Live listings of the values of some or all of these environment variables
can be found at links like these:

  *  [https://fossil-scm.org/home/test_env]
  *  [https://sqlite.org/src/ext/checklist/top/env]

In addition to the standard CGI environment variables listed above, 
Fossil adds the following:

  *  FOSSIL_CAPABILITIES
  *  FOSSIL_NONCE
  *  FOSSIL_REPOSITORY
  *  FOSSIL_URI
  *  FOSSIL_USER

The FOSSIL_USER string is the name of the logged-in user.  This variable
is missing or is an empty string if the user is not logged in.  The
FOSSIL_CAPABILITIES string is a list of 
[./caps/ref.html|Fossil capabilities] that
indicate what permissions the user has on the Fossil repository.
The FOSSIL_REPOSITORY environment variable gives the filename of the
Fossil repository that is running.  The FOSSIL_URI variable shows the
prefix of the REQUEST_URI that is the Fossil CGI script, or is an
empty string if Fossil is being run by some method other than CGI.

The [https://sqlite.org/src/ext/checklist|checklist application] uses the
FOSSIL_USER environment variable to determine the name of the user and
the FOSSIL_CAPABILITIES variable to determine if the user is allowed to
mark off changes to the checklist.  Only users with check-in permission
to the Fossil repository are allowed to mark off checklist items.  That
means that the FOSSIL_CAPABILITIES string must contain the letter "i".
Search for "FOSSIL_CAPABILITIES" in the
[https://sqlite.org/src/ext/checklist/top/self|source listing] to see how
this happens.

If the CGI output is one of the forms for which Fossil inserts its own
header and footer, then the inserted header will include a
Content Security Policy (CSP) restriction on the use of javascript within
the webpage.  Any &lt;script&gt;...&lt;/script&gt; elements within the 
CGI output must include a nonce or else they will be suppressed by the
web browser.  The FOSSIL_NONCE variable contains the value of that nonce.
So, in other words, to get javascript to work, it must be enclosed in:

<verbatim>
<script nonce='$FOSSIL_NONCE'>...</script>
</verbatim>

Except, of course, the $FOSSIL_NONCE is replaced by the value of the
FOSSIL_NONCE environment variable.

<h3>3.1 Input Content</h3>

If the HTTP request includes content (for example if this is a POST request)
then the CONTENT_LENGTH value will be positive and the data for the content
will be readable on standard input.


<h2>4.0 CGI Outputs</h2>

CGI programs construct a reply by writing to standard output.  The first
few lines of output are parameters intended for the web server that invoked
the CGI.  These are followed by a blank line and then the content.

Typical parameter output looks like this:

<verbatim>
Status: 200 OK
Content-Type: text/html
</verbatim>

CGI programs can return any content type they want - they are not restricted
to text replies.  It is OK for a CGI program to return (for example)
image/png.

The fields of the CGI response header can be any valid HTTP header fields.
Those that Fossil does not understand are simply relayed back to up the
line to the requester.

Fossil takes special action with some content types.  If the Content-Type
is "text/x-fossil-wiki" or "text/x-markdown" then Fossil
converts the content from [/wiki_rules|Fossil-Wiki] or 
[/md_rules|Markdown] into HTML, adding its
own header and footer text according to the repository skin.  Content
of type "text/html" is normally passed straight through
unchanged.  However, if the text/html content is of the form:

<verbatim>
<div class='fossil-doc' data-title='DOCUMENT TITLE'>
... HTML content there ...
</div>
</verbatim>

In other words, if the outer-most markup of the HTML is a &lt;div&gt;
element with a single class of "fossil-doc", 
then Fossil will adds its own header and footer to the HTML.  The
page title contained in the added header will be extracted from the
"data-title" attribute.

Except for the three cases noted above, Fossil makes no changes or
additions to the CGI-generated content.  Fossil just passes the verbatim
content back up the stack towards the requester.

<h3>4.1 <tt>GATEWAY_INTERFACE</tt> and Recursive Calls to fossil</h3>

Like many CGI-aware applications, if fossil sees the environment
variable <tt>GATEWAY_INTERFACE</tt> when it starts up, it assumes it
is running in a CGI environment and behaves differently than when it
is run in a non-CGI interactive session. If you intend to run fossil
itself from within an extension CGI script, e.g. to run a query
against the repository or simply fetch the fossil binary version, make
sure to <em>unset</em> the <tt>GATEWAY_INTERFACE</tt> environment
variable before doing so, otherwise the invocation will behave as if
it's being run in CGI mode.

<h2>5.0 Filename Restrictions</h2>

For security reasons, Fossil places restrictions on the names of files
in the extroot directory that can participate in the extension CGI
mechanism:

  1.  Filenames must consist of only ASCII alphanumeric characters,
      ".", "_", and "-", and of course "/" as the file separator.
      Files with names that includes spaces or
      other punctuation or special characters are ignored.

  2.  No element of the pathname can begin with "." or "-".  Files or
      directories whose names begin with "." or "-" are ignored.

If a CGI program requires separate data files, it is safe to put those
files in the same directory as the CGI program itself as long as the names
of the data files contain special characters that cause them to be ignored
by Fossil.

<h2>6.0 Access Permissions</h2>

CGI extension files and programs are accessible to everyone.

When CGI extensions have been enabled (using either "extroot:" in the
CGI file or the --extroot option for other server methods) all files
in the extension root directory hierarchy, except special filenames
identified previously, are accessible to all users.  Users do not
have to have "Read" privilege, or any other privilege, in order to
access the extensions.

This is by design.  The CGI extension mechanism is intended to operate
in the same way as a traditional web-server.

CGI programs that want to restrict access 
can examine the FOSSIL_CAPABILITIES and/or FOSSIL_USER environment variables.
In other words, access control is the responsibility of the individual
extension programs.


<h2>7.0 Trouble-Shooting Hints</h2>

Remember that the /ext will return any file in the extroot directory
hierarchy as static content if the file is readable but not executable.
When initially setting up the /ext mechanism, it is sometimes helpful
to verify that you are able to receive static content prior to starting
work on your CGIs.  Also remember that CGIs must be
executable files.

Fossil likes to run inside a chroot jail, and will automatically put
itself inside a chroot jail if it can.  The sub-CGI program will also
run inside this same chroot jail.  Make sure all embedded pathnames
have been adjusted accordingly and that all resources needed by the
CGI program are available within the chroot jail.

If anything goes wrong while trying to process an /ext page, Fossil
returns a 404 Not Found error with no details.  However, if the requester
is logged in as a user that has <b>[./caps/ref.html#D | Debug]</b> capability 
then additional diagnostic information may be included in the output.

If the /ext page has a "fossil-ext-debug=1" query parameter and if
the requester is logged in as a user with Debug privilege, then the
CGI output is returned verbatim, as text/plain and with the original
header intact.  This is useful for diagnosing problems with the
CGI script.