Fossil: How CGI Works In Fossil

Introduction

CGI or "Common Gateway Interface" is a venerable yet reliable technique for generating dynamic web content. This article gives a quick background on how CGI works and describes how Fossil can act as a CGI service.

This is a "how it works" guide. This document provides background information on the CGI protocol so that you can better understand what is going on behind the scenes. If you just want to set up Fossil as a CGI server, see the Fossil Server Setup page. Or if you want to development CGI-based extensions to Fossil, see the CGI Server Extensions page.

A Quick Review Of CGI

An HTTP request is a block of text that is sent by a client application (usually a web browser) and arrives at the web server over a network connection. The HTTP request contains a URL that describes the information being requested. The URL in the HTTP request is typically the same URL that appears in the URL bar at the top of the web browser that is making the request. The URL might contain a "?" character followed query parameters. The HTTP will usually also contain other information such as the name of the application that made the request, whether or not the requesting application can accept a compressed reply, POST parameters from forms, and so forth.

The job of the web server is to interpret the HTTP request and formulate an appropriate reply. The web server is free to interpret the HTTP request in any way it wants. But most web servers follow a similar pattern, described below. (Note: details may vary from one web server to another.)

Suppose the filename component of the URL in the HTTP request looks like this:

/one/two/timeline/four

Most web servers will search their content area for files that match some prefix of the URL. The search starts with /one, then goes to /one/two, then /one/two/timeline, and finally /one/two/timeline/four is checked. The search stops at the first match.

Suppose the first match is /one/two. If /one/two is an ordinary file in the content area, then that file is returned as static content. The "/timeline/four" suffix is silently ignored.

If /one/two is a CGI script (or program), then the web server executes the /one/two script. The output generated by the script is collected and repackaged as the HTTP reply.

Before executing the CGI script, the web server will set up various environment variables with information useful to the CGI script:

Variable	Meaning
GATEWAY_INTERFACE	Always set to "CGI/1.0"
REQUEST_URI	The input URL from the HTTP request.
SCRIPT_NAME	The prefix of the input URL that matches the CGI script name. In this example: "/one/two".
PATH_INFO	The suffix of the URL beyond the name of the CGI script. In this example: "timeline/four".
QUERY_STRING	The query string that follows the "?" in the URL, if there is one.

There are other CGI environment variables beyond those listed above. Many Fossil servers implement the test_env webpage that shows some of the CGI environment variables that Fossil pays attention to.

In addition to setting various CGI environment variables, if the HTTP request contains POST content, then the web server relays the POST content to standard input of the CGI script.

In summary, the task of the CGI script is to read the various CGI environment variables and the POST content on standard input (if any), figure out an appropriate reply, then write that reply on standard output. The web server will read the output from the CGI script, reformat it into an appropriate HTTP reply, and relay the result back to the requesting application. The CGI script exits as soon as it generates a single reply. The web server will (usually) persist and handle multiple HTTP requests, but a CGI script handles just one HTTP request and then exits.

The above is a rough outline of how CGI works. There are many details omitted from this brief discussion. See other on-line CGI tutorials for further information.

How Fossil Acts As A CGI Program

An appropriate CGI script for running Fossil will look something like the following:

#!/usr/bin/fossil
repository: /home/www/repos/project.fossil

The first line of the script is a "shebang" that tells the operating system what program to use as the interpreter for this script. On unix, when you execute a script that starts with a shebang, the operating system runs the program identified by the shebang with a single argument that is the full pathname of the script itself. In our example, the interpreter is Fossil, and the argument might be something like "/var/www/cgi-bin/one/two" (depending on how your particular web server is configured).

The Fossil program that is run as the script interpreter is the same Fossil that runs when you type ordinary Fossil commands like "fossil sync" or "fossil commit". But in this case, as soon as it launches, the Fossil program recognizes that the GATEWAY_INTERFACE environment variable is set to "CGI/1.0" and it therefore knows that it is being used as CGI rather than as an ordinary command-line tool, and behaves accordingly.

When Fossil recognizes that it is being run as CGI, it opens and reads the file identified by its sole argument (the file named by argv[1]). In our example, the second line of that file tells Fossil the location of the repository it will be serving. Fossil then starts looking at the CGI environment variables to figure out what web page is being requested, generates that one web page, then exits.

Usually, the webpage being requested is the first term of the PATH_INFO environment variable. (Exceptions to this rule are noted in the sequel.) For our example, the first term of PATH_INFO is "timeline", which means that Fossil will generate the /timeline webpage.

With Fossil, terms of PATH_INFO beyond the webpage name are converted into the "name" query parameter. Hence, the following two URLs mean exactly the same thing to Fossil:

In both cases, the CGI script is called "/fossil". For case (A), the PATH_INFO variable will be "info/c14ecc43" and so the "/info" webpage will be generated and the suffix of PATH_INFO will be converted into the "name" query parameter, which identifies the artifact about which information is requested. In case (B), the PATH_INFO is just "info", but the same "name" query parameter is set explicitly by the URL itself.

Serving Multiple Fossil Repositories From One CGI Script

The previous example showed how to serve a single Fossil repository using a single CGI script. On a website that wants to serve multiple repositories, one could simply create multiple CGI scripts, one script for each repository. But it is also possible to serve multiple Fossil repositories from a single CGI script.

If the CGI script for Fossil contains a "directory:" line instead of a "repository:" line, then the argument to "directory:" is the name of a directory that contains multiple repository files, each ending with ".fossil". For example:

#!/usr/bin/fossil
directory: /home/www/repos

Suppose the /home/www/repos directory contains files named one.fossil, two.fossil, and subdir/three.fossil. Further suppose that the name of the CGI script (relative to the root of the webserver document area) is "cgis/example2". Then to see the timeline for the "three.fossil" repository, the URL would be:

http://example.com/cgis/example2/subdir/three/timeline

Here is what happens:

The input URI on the HTTP request is /cgis/example2/subdir/three/timeline
The web server searches prefixes of the input URI until it finds the "cgis/example2" script. The web server then sets PATH_INFO to the "subdir/three/timeline" suffix and invokes the "cgis/example2" script.
Fossil runs and sees the "directory:" line pointing to "/home/www/repos". Fossil then starts pulling terms off the front of the PATH_INFO looking for a repository. It first looks at "/home/www/resps/subdir.fossil" but there is no such repository. So then it looks at "/home/www/repos/subdir/three.fossil" and finds a repository. The PATH_INFO is shortened by removing "subdir/three/" leaving it at just "timeline".
Fossil looks at the rest of PATH_INFO to see that the webpage requested is "timeline".

The web server sets many environment variables in step 2 in addition to just PATH_INFO. The following diagram shows a few of these variables and their relationship to the request URL:

charwid = 0.075
thickness = 0

SCHEME: box "https://" mono fit
DOMAIN: box "example.com" mono fit
SCRIPT: box "/cgis/example2" mono fit
PATH:   box "/subdir/three/timeline" mono fit
QUERY:  box "?c=55d7e1" mono fit

thickness = 0.01

DB: box at 0.3 below DOMAIN "HTTP_HOST"    mono fit invis
SB: box at 0.3 below SCRIPT "SCRIPT_NAME"  mono fit invis
PB: box at 0.3 below PATH   "PATH_INFO"    mono fit invis
QB: box at 0.3 below QUERY  "QUERY_STRING" mono fit invis
RB: box at 0.5 above PATH   "REQUEST_URI"  mono fit invis

color = lightgray

box at SCHEME width SCHEME.width height SCHEME.height
line fill 0x7799CC behind QUERY \
  from SCRIPT.nw \
    to RB.sw \
    to RB.se \
    to QUERY.ne \
  close
line fill 0x99CCFF behind DOMAIN \
  from DOMAIN.nw \
    to DOMAIN.sw \
    to DB.n \
    to DOMAIN.se \
    to DOMAIN.ne \
  close
line fill 0xCCEEFF behind SCRIPT \
  from SCRIPT.nw \
    to SCRIPT.sw \
    to SB.n \
    to SCRIPT.se \
    to SCRIPT.ne \
  close
line fill 0x99CCFF behind PATH \
  from PATH.nw \
    to PATH.sw \
    to PB.n \
    to PATH.se \
    to PATH.ne \
  close
line fill 0xCCEEFF behind QUERY \
  from QUERY.nw \
    to QUERY.sw \
    to QB.n \
    to QUERY.se \
    to QUERY.ne \
  close

Additional CGI Script Options

The CGI script can have additional options used to fine-tune Fossil's behavior. See the CGI script documentation for details.

Additional Observations

Fossil does not distinguish between the various HTTP methods (GET, PUT, DELETE, etc). Fossil figures out what it needs to do purely from the webpage term of the URI.
Fossil does not distinguish between query parameters that are part of the URI, application/x-www-form-urlencoded or multipart/form-data encoded parameter that are part of the POST content, and cookies. Each information source is seen as a space of key/value pairs which are loaded into an internal property hash table. The code that runs to generate the reply can then reference various properties values. Fossil does not care where the value of each property comes from (POST content, cookies, or query parameters) only that the property exists and has a value.
The "fossil ui" and "fossil server" commands are implemented using a simple built-in web server that accepts incoming HTTP requests, translates each request into a CGI invocation, then creates a separate child Fossil process to handle each request. In other words, CGI is used internally to implement "fossil ui/server".

SCGI is processed using the same built-in web server, just modified to parse SCGI requests instead of HTTP requests. Each SCGI request is converted into CGI, then Fossil creates a separate child Fossil process to handle each CGI request.
Fossil is itself often launched using CGI. But Fossil can also then turn around and launch sub-CGI scripts to implement extensions.