Preface

This document is aimed at people who wish to write their own REXX script that utilizes the WWW's Common Gateway Interface (CGI).

Along with this document are some "helper" REXX scripts that your main REXX (CGI) script may call to faciliate dealing with a CGI interface. These helper scripts are based upon the cgi-lib.rxx REXX library of functions, and modified to suit the Reginald REXX interpreter for Windows.

For more information on Server Side CGI, see the WWW Virtual Library. Also, you may wish to peruse the Web Development Center.

Since there are security and other risks associated with executing user scripts in a WWW server, you may wish to first view a document providing information on a SLAC Security Wrapper for users' CGI scripts.Besides improving security, this wrapper also simplifies the task of writing a CGI script for a beginner.

Before embarking on writing a script, you may also want to check out some rough notes on SLAC Web Utilities Provided by CGI Scripts.

What is a CGI interface?

The CGI is an interface for running external programs, or gateways, under an information server. Currently, the supported information servers are HTTP (the Transport Protocol used by WWW) servers.

Gateway programs are executable programs which can be run by themselves (but you wouldn't want to except for debugging purposes). They have been made executable to allow them to run under various (possibly very different) information servers interchangeably. GCI can run such a gateway program and pass it data from some form on a web page, and have the gateway program return some contents to be sent to the client.

To be able to run a REXX script as a gateway program, you need to tell the CGI to run Reginald's script launcher as the gateway program (and also pass the name of the script that you wish to run). A special version of Reginald's script launcher has been included, called REGINALD.EXE. You should use this to run your script as a gateway program (rather than using RXLAUNCH.EXE).

To determine how to set up your CGI server to run REGINALD.EXE and pass it the name of the script to run, consult the documentation with your CGI server.

Getting data (input) from the CGI interface

The input may be sent to your script in several ways depending on the client's Uniform Resource Locator (URL) or HyperText Markup Language (HTML) Form.

QUERY_STRING

The CGI may set the environment variable named QUERY_STRING to any text which follows the first ? in the URL used to access your gateway. Such text could be added by an HTML ISINDEX document, or by an HTML Form (with the GET action). It could also be manually embedded in an HTML hypertext link, or anchor, which references your gateway. This text will usually be an information query (e.g. what the user wants to search for in databases) or perhaps the encoded results of your feedback Form. The text can be retrieved in REXX as so:
string = GETENV('QUERY_STRING')
This text is encoded in the standard URL format which changes spaces to +, and encoding special characters with %xx hexadecimal encoding. You will need to decode it in order to use it. The included REXX function CgiDeWeb() shows how to decode those special characters.

If your server is not decoding results from a Form, the CGI will also pass the query string (decoded for you) onto the command line. This means that your REXX script can get the query string via a PARSE ARG instruction.

For example, if you have the URL...

http://www.slac.stanford.edu/cgi-bin/foo?hello+world
...and you use the REXX instruction...
PARSE ARG Arg1 Arg2
...then Arg1 will contain hello and Arg2 will contain world (i.e. the + sign is replaced with a space, so PARSE ARG will break up the argument at that space, into Arg1 and Arg2). If you choose to use the PARSE ARG to retrieve the input, you need to do less processing on the data before using it because it is already decoded.
PATH_INFO
Much of the time, you will want to send data to your gateways which the client shouldn't muck with. Such information could be the name of the Form which generated the results they are sending.

CGI allows for extra information to be embedded in the URL for your gateway which can be used to transmit extra context-specific information to the scripts. This information is usually made available as "extra" information after the path of your gateway in the URL. This information is not encoded by the server in any way. The CGI sets the environment variable named PATH_INFO to this extra text, and your REXX script can retrieve it as so:

string = GETENV('PATH_INFO')
To illustrate, let's say I have a REXX script which is accessible to my server with the name foo. When I access foo from a particular document, I want to tell foo that I'm currently in the English language directory, not the Pig Latin directory. In this case, I could access my script in an HTML document as:

<A HREF="http://www/cgi-bin/foo/language=english">foo</A>

When the server executes foo, it will set PATH_INFO to /language=english, and my script can retrieve this with GETENV(), decode this and act accordingly.

The PATH_INFO and the QUERY_STRING may be combined. For example, consider the URL:

http://www/cgi-bin/htimage/usr/www/img/map?404,451
The above URL will cause the server to run the script called htimage. It would pass the remaining path information "/usr/www/img/map" to htimage in the PATH_INFO environment variable, and pass "405,451" in the QUERY_STRING variable. In this case, htimage is a script for implementing active maps supplied with the CERN HTTPD.
Standard Input
If a web page has METHOD="POST" in its FORM tag, your REXX script will receive the encoded Form input via the standard input stream. (ie, The GCI will write the data to standard input before your script starts, and that data will be waiting for your script to read it via CHARIN(). You omit the filename arg to CHARIN in order to read from the standard input stream). The GCI will also set the environment variable named CONTENT_LENGTH to the number of characters in the data.

Here's how you would read the data:

data = CHARIN(, 1, GETENV('CONTENT_LENGTH'))

Note Once you read that data via CHARIN(), then you can no longer retrieve it again with another call to CHARIN. So, if you need to call another script that requires access to that data, you can pass the data to the other script, and the other script can utilize a USE ARG (or PARSE ARG, or ARG()) instruction to access it.

You can review the REXX script testinput.rex for an example of how to read the various form of input into your script.

The helper functions CgiReadForm.rex and CgiReadPost() may be used to simplify the task of reading input from a Form.

Parsing forms input

When you write a Form, each of your input items has a name tag. When the user places data in these items in the Form, that information is encoded into the Form data. The value (entered by the user) of each of the input items is called its value.

Form data is a stream of name=value pairs separated by the ampersand (&) character. Each name=value pair is URL encoded (i.e. spaces are changed into plus signs and some characters are encoded into hexadecimal). To decode the Form data, you must first parse the Form data block into separate name=value pairs tossing out the ampersands. Then you must parse each name=value pair into the separate name and value. Use the first equal sign you encounter to split the data. (If there is more than one, then something is wrong with the data). Toss out the equal sign. Finally, undo the URL encoding of each name and value. The helper function CgiGetVariables.rex can perform this task and stuff the results into stem variables of your choosing.

When parsing the name and value information in the script, you need to be aware that:

Sending a document back to the GCI interface (client)

Your script can return a myriad of document types. You can send back an image to the client, an HTML document, a plaintext document, a Postscript document, or perhaps even an audio clip of your bodily functions. You can also return references to other documents (to save space we will ignore this latter case here - more information may be found in NCSA's CGI Primer). The client must know what kind of document you're sending it so it can present it accordingly. In order for the client to know this, your script must tell the CGI interface what type of document it is returning.

To send data back to the server, you simply use the SAY instruction (followed by the data you wish to send). Each time you use SAY, another line of data is sent to the CGI interface (and ultimately, the client).

In order to tell the CGI interface what kind of document you are sending back, you must first send back a "header". This consists of two lines that you must SAY. (ie, You'll make two SAY instructions to send the header).

The first line must indicate the MIME type of the document you will be outputting. Typically, there is a content type, followed by a slash, and then a sub-type.

Some common MIME types are:

In order to tell the CGI interface your output's content type, the first line you SAY should be in the format:
Content-type: type/subtype
where type/subtype is the MIME type and subtype for your output, as listed above.

The second line should be blank (ie, You use a lone SAY instruction, with nothing after it). Once the CGI interface retrieves this line, it knows that you're finished telling the CGI interface about your document type, and you will now begin SAY'ing the actual content of your document. If you skip this second line, the CGI interface will attempt to parse your output trying to find further information about your document type and you will become very unhappy.

For example, if you wish to send back an HTML document, your header would be sent as so:

SAY 'Content-type: text/html'
SAY
/* Here you would SAY the actual HTML contents starting with an <HTML> tag */
The helper function CgiPrintHeader.rex can assist in outputting a header.

After these two lines have been SAY'ed, anything more you SAY will be included in the document sent to the client. This output must be consistent with the Content-type header. For example, if the header specified Content-type text/html then the following lines you SAY must include HTML formatting such as using <BR> or <P> for starting new lines or <PRE> to remove HTML's automatic formatting.

For example, here we write a simple web page that says "Hello world":

SAY 'Content-type: text/html'
SAY
SAY "<HTML><HEAD><TITLE>"
SAY MyTitle
SAY "</TITLE></HEAD><BODY><H1>Hello World</H1></BODY></HTML>"

Diagnostics and reporting errors

If your script encounters errors (e.g. no input provided when you need it, invalid characters found in the input, requested an invalid command to be executed, invalid syntax or undefined variable encountered in the REXX script), your script should provide detailed information on what is wrong etc, and SAY this to the CGI interface so that the information is relayed to the client.

CGIerror.rex demonstrates writing out an HTML document with an error message.

Running your script on the server

To get your Web server to execute a CGI script you must:

Other Sources of Interest

Helper functions and examples

Index of REXX CGI Functions
Function Owner Group Comment
minimal.rex cottrell sf A simple example of a Form CGI Script
testinput.rex Mwww oh An example to show processing of input
CgiCleanQuery.rex cottrell sf Removes all occurences of unassigned variables from a CGI query string string
CgiError.rex cottrell sf Sends an error HTML page to the CGI interface
CgiDelQuery.rex cottrell sf Removes an item from a CGI query string
CgiDeweb.rex cottrell sf Converts ASCII Hex coded %XX to ASCII characters
CgiFullUrl.rex cottrell sf Returns the complete CGI query URL
CgiHtmlBot.rex cottrell sf Returns the HTML tags at the end of a page
CgiHtmlTop.rex cottrell sf Returns the HTML title and h1 tags at the top of a page
CgiHTtab cottrell sf Converts a tab delimited file to an HTML table
CgiMethGet.rex cottrell sf Returns true if the form is using METHOD="GET"
CgiMethPost.rex cottrell sf Returns true if the form is using METHOD="POST"
CgiMyUrl.rex cottrell sf Adds the URL of the script to the page
CgiPrintHeader.rex cottrell sf Returns the Content-type header (to SAY)
CgiPrintVariables.rex cottrell sf Adds a listing of the Form name=value& variables to the page
CgiReadForm.rex cottrell sf Reads a Form's "GET" or "POST" input and returns it decoded
CgiReadPost cottrell sf Reads the standard input for a form with METHOD="POST"
CgiStripHtml.rex cottrell sf Removes HTML tags from a string
CgiWebify cottrell sf Encodes special characters in hex ASCCII %XX form

REXX Routines to Manipulate CGI input
cottrell@slac.stanford.edu
http://www.slac.stanford.edu/~cottrell.html/cottrell.html

These routines are modelled on a set of Perl routines from S.E.Brenner@bioc.cam.ac.uk, with some additions suggested by "Gateway Programming I: ..." in "HTML and CGI Unleashed" by John December and Mark Ginsberg, published by Sams/Macmillan.

For more information on Steve's functions, see:
http://www.bio.cam.ac.uk/web/form.html
http://www.seas.upenn.edu/~mengwong/forms/

For more information on "HTML and CGI Unleashed" see:
http://www.rpi.edu/~decemj/works/wdg.html

This document and/or portions of the material and data furnished herewith, was developed under sponsorship of the U.S. Government. Neither the U.S. nor the U.S.D.O.E., nor the Leland Stanford Junior University, nor their employees, nor their respective contractors, subcontractors, or their employees, makes any warranty, express or implied, or assumes any liability or responsibility for accuracy, completeness or usefulness of any information, apparatus, product or process disclosed, or represents that its use will not infringe privately-owned rights. Mention of any product, its manufacturer, or suppliers shall not, nor is it intended to, imply approval, disapproval, or fitness for any particular use. The U.S. and the University at all times retain the right to use and disseminate same for any purpose whatsoever.

Copyright (c) Stanford University 1995, 1996.

Permission granted to use and modify this library so long as the copyright above is maintained, modifications are documented, and credit is given for any use of the library.

Acknowledgements

Much of the text on the Common Gateway Interface and Forms comes from NCSA documents. Useful information and text was also obtained from The World-Wide Web: How Servers Work, by Mark Handley and John Crowcroft, published in ConneXions, February 1995.


Les Cottrell [Feedback]