CGI Web Applications with Python, Part Two

Server Side Web Programming

The Internet in 1999

 

 

Introduction

The previous article, CGI Web Applications with Python, Part One, explained the workings of the Common Gateway Interface (CGI), demonstrated how HTML forms are processed, and described a Python library you can use to make development of Python CGI web applications a snap.

This time, let's build on that expertise, look at some advanced CGI topics, such as CGI environment variables, HTML templating, and Unicode, and develop a complete CGI application.

All code in this article is intended to work with Python 2.2 and beyond.

On With the Show

Let's write a CGI application that displays an HTML form and interprets the results. In the Example Form shown last month, the target of the form was a script called formprocessor.py. Let's write that script.

When the form is submitted, it should call the script, which should be stored in an appropriate CGI directory on your web server (often named, aptly, cgi-bin). If the script is called without parameters, it should display a blank form. However, if the script receives parameters, it should interpret the parameters and display them.

Using the functions defined last time, receiving the parameters is easy: just use getform() and isblank(). The other requirements demand some new code, which is described just after the listing.

Formprocessor.py

Here is the complete code for formprocessor.py.

#! /usr/bin/python
import os
import sys
import cgi
import cgitb; cgitb.enable()

def getrequest(formvaluelist):
     """
     Initialize the FieldStorage and return the specified list of
     values.
     """

     return getform(cgi.FieldStorage(), formvaluelist)


def ucgiprint(inline='', unbuff=False, encoding='UTF-8'):
     """Print to the stdout.
     Includes keywords to define the output encoding
     (UTF-8 default, set to None to switch off encoding)
     and also whether we should flush the output buffer
     after every write (default not).
     """

     line_end = '\r\n'
     if encoding:
         inline = inline.encode(encoding)
         # prob. not necessary as line endings will be the
         # same in most encodings
         line_end = line_end.encode(encoding)
     sys.stdout.write(inline)
     sys.stdout.write(line_end)
     if unbuff:
         sys.stdout.flush()

# you also need getform(), isblank(), and contentheader
# from last month

def replace(instring, indict):
     """
     A convenient way of doing multiple replaces in a
     single string. E.g. for html templates. Takes a
     string and a dictionary of replacements. In the
     dictionary - each key is replaced with it's value.
     We can also accept a list of tuples instead of a
     dictionary (or anything accepted by the dict
     function).
     """

     indict = dict(indict)
     for key in indict:
         instring = instring.replace(key, indict[key])
     return instring

# let's define out html values and templates
pagetemplate = '''
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"
            lang="en" xml:lang="en">
            <head>
                    <title>Form Processor Script</title>
                    <meta http-equiv="Content-Type"
                            content="text/html;
                            charset=UTF-8" />
            </head>
            <body>
                    <h1>Welcome to Form Processor</h1>
                    <h2>A Python CGI</h2>
                    **body**
            </body>
    </html>
    '''

# note - the character encoding has been hardwired
# into the meta tag

htmlform = '''
<div class="form">
    <form action="**scriptname**" method="get">
            What is Your Name :
            <input name="name" type="text"
                    value="**the name**" />
            <br />
            <input name="radio_param" type="radio" value="this"
                    **checked1** /> Select This<br />
            <input name="radio_param" type="radio" value="that"
                    **checked2** />or That<br />
            <input name="check1" type="checkbox" **checked3**/>
                    Check This<br />
            <input name="check2" type="checkbox" **checked4**/>
                    and This Too ?<br />
            <input name="hidden_param" type="hidden"
                    value="some_value" />
            <br />
            <input type="hidden" name="_charset_" />
            <input type="reset"  />
            <input type="submit" />
    </form>
</div>
'''


divider = '<hr width="50%" />'

welcome = '''
<div class="welcome">
    <h2>Please Fill in Our Form</h2>
    <p>This CGI is an example CGI written for a <a
      href="http://www.pyzine.com">Pyzine</a> article.
      It was written by <a
      href="http://www.voidspace.org.uk/python/index.shtml">
      Michael Foord </a>.
    </p>
</div>
'''


results = '''
<div class="results">
    <h2>You Submitted a Form</h2>
    <p>Your Name is "%s" (so you claim).</p>
    <p>You selected "%s" rather than "%s".
            (The radio buttons)</p>
    <p>You %s check "This". (first checkbox)</p>
    <p>You %s check "This Too". (second checkbox)</p>
    <p>A hidden value was sent - "%s".</p>
    <p>It was all sent in "%s" character encoding.</p>
</div>
'''


# let's set up some variables
scriptname = os.environ.get('SCRIPT_NAME', '')
checked = ' checked="checked" '

formvalues = ['name', 'radio_param', 'check1', 'check2', \
   'hidden_param', '_charset_']

def main():
     """This function forms the main body of the script."""
     # print the content header
     ucgiprint(contentheader, encoding=None)
     # followed by a blank line
     ucgiprint('', encoding=None)

     # Instantiate cgi.FieldStorage() *and*
     # extract all our parameters from it
     formdict = getrequest(formvalues)

     # do we have a form submission to process ?
     if isblank(formdict):
         displaywelcome()        # no form
     else:
         displayform(formdict)   # yes form

def displaywelcome():
     """The script has been called without parameters.
     We should display the welcome message."""

     replacedict = {'**scriptname**' : scriptname,
                    '**the name**' : 'Joe Bloggs',
                    '**checked1**' : checked,
                    '**checked2**' : '',
                    '**checked3**' : checked,
                    '**checked4**' : checked
                    }

     # put the correct values into our form
     thisform = replace(htmlform, replacedict)

     # these three lines could all be done in one step,
     # but this is clearer
     pagebody = welcome + thisform
     wholepage = pagetemplate.replace('**body**', pagebody)
     ucgiprint(wholepage)

def displayform(formdict):
     """The script has been called with a form submission.
     Display the results of the form submission.
     """


     # encoding may not be straightforward,
     # let's find out what it is before we do anything else
     encoding = formdict['_charset_'] or 'UTF8'

     # let's extract our parameters
     thename = formdict['name'].decode(encoding)
     thisorthat = formdict['radio_param'].decode(encoding)
     this = formdict['check1'].decode(encoding)
     thistoo = formdict['check2'].decode(encoding)
     hidval = formdict['hidden_param'].decode(encoding)

     if thisorthat == 'this':
         check1 = checked
         check2 = ''
         radselect = 'This'
         unrad = 'That'
     else:
         check1 = ''
         check2 = checked
         radselect = 'That'
         unrad = 'This'

     if this == 'on':
         check3 = checked
         did1 = 'did'
     else:
         check3 = ''
         did1 = "didn't"

     if thistoo == 'on':
         check4 = checked
         did2 = 'did'
     else:
         check4 = ''
         did2 = "didn't"

     replacedict = {'**scriptname**' : scriptname,
                    '**the name**' : thename,
                    '**checked1**' : check1,
                    '**checked2**' : check2,
                    '**checked3**' : check3,
                    '**checked4**' : check4
                    }

     # put the previous values *back* into the form
     thisform = replace(htmlform, replacedict)
     fullresults = results % (thename, radselect, unrad, did1, did2, \
       hidval, encoding)
     pagebody = fullresults + divider + thisform
     wholepage = pagetemplate.replace('**body**', pagebody)
     ucgiprint(wholepage)

if __name__ == '__main__':
     main()

This might seem a lot of code, but it's quite easy to digest. The code defines some constants and some variables, most of which look remarkably like HTML. Next, main() is called.

You may notice some differences from last month's code, so let's cover the changes before proceeding further.

Buffering and Character Encodings

The first change in the code is the addition of a new convenience function, getrequest(). This function instantiates the cgi.FieldStorage() and extracts the parameters in one step.

Next, the cgiprint() function has metamorphosed into ucgiprint(). In ucgiprint(), the call to sys.stdout.flush() is no longer automatic. Now, the default is that output is buffered, unless you set the unbuff keyword when you call ucgiprint. In buffered mode, the server automatically adds a Content-Length header for you.

Also, you can now specify an encoding for anything that's printed. Using an encoding, you can receive Unicode strings as input, yet all output is automatically translated. Here, the default output encoding is set to UTF-8, for reasons that will become clear momentarily. (See the Character Encodings section for more information.) To switch this behaviour off, set encoding=None.

Good HTML

Some chunks of HTML output are the same no matter what "mode" the script is in. So, the HTML output of the script has been divided into reusable chunks, where each chunk is assigned to a variable.

pagetemplate is the template of the page. It's a model of the page into which different things will be inserted depending on what the script is doing. The page template has been given a proper DOCTYPE declaration [1] and the <html> tag specifies the language declaration for the DOCTYPE.

The HTML stored in pagetemplate also specifies the character set encoding of the page using a <meta http-equiv> tag. (If you want to make the character set encoding dynamic rather than static, as done here, you can specify the character encoding using another header. [2])

If you glance through the HTML chunks, all are structured using <div> tags with class names. All of the HTML is straightforward XHTML (which is understood properly by all modern browsers and is handled gracefully by older ones), with no extraneous style markup. This means the page is very lightweight and can be easily styled using a separate CSS stylesheet. (For more about designing with CSS, consult the book Designing With Web Standards by Jeffrey Zeldman. [3] ) Combining CSS and XHTML is quite easy and allows you to easily alter the look and feel of your applications with minimal hassle. (There isn't a stylesheet for this script -- that's left as an exercise for the reader!)

Templating

Templating is one answer to the question. "How do you insert dynamically varying information into a web page (essentially) defined by static elements?"

There are several good templating systems available for Python, [4] and Python itself has a perfectly good string interpolation operator, %. Unfortunately % is used in HTML, which royally messes things up if you try to use it for string interpolation. (Alternative page templating systems allow the embedding of code into HTML pages, but there's something to be said for keeping code and markup separate!)

The templating system implemented in formprocessor.py is very simple, but suffices for this and many other simple CGI applications. The template(s) have various "replaceable" elements, most of which are in the variable htmlform. The parts of htmlform that look like **the name** or **checked1** are to be replaced with real values before the output page is sent.

The full page is created from components; the components used vary depending on what the form must respond with. (Sometimes the welcome message is displayed, and other times the result of processing the form is displayed.) The function replace() and a dictionary maps all of the markers to the contents they're being replaced with.

Caring for Your Environment

Among the variables defined is scriptname, set in the code scriptname = os.environ.get('SCRIPT_NAME', ''). scriptname names the target of the form.

The code os.environ.get('SCRIPT_NAME', '') fetches the name of the script from an environment variable. Part of the CGI specification is that a CGI is run in a particular "environment". In addition to defining the input, output, and error streams, the environment defines several environment variables that the server sets with each incoming request. [5] These are available (assuming your server provides them) in the os.environ dictionary of environment variables.

SCRIPT_NAME is one of the more useful ones, but you can also use ones like HTTP_ACCEPT and HTTP_USER_AGENT to get information about each client. REQUEST_METHOD tells you whether the incoming request was a POST or a GET.

(If the script is run locally on your machine rather than accessed from a server, these variables won't be available, hence the use of the dictionary get() method to access the SCRIPT_NAME, which avoids a KeyError when testing.)

What The Program Does

formprocessor.py is a nice and simple script. The function main() is always called and sends the Content-Type header and the blank line that terminates the headers section of the response.

The variable formvalues has a list of all the parameters sent if a form submission has been made. main() uses the new function getrequest() to instantiate cgi.FieldStorage() (which must be done only once) and getform() to extract the values from the submission. main() then calls isblank() to check if a form submission was actually made.

If a form submission wasn't made, the code displays the welcome message and the form; this is done in displaywelcome().

If a form submission was made, the code must display the results and display the form again. (As a nice touch, the form is repopulated with the values the user sent previously. This is done in displayform().)

Assuming the script is running for the first time, no form has been submitted, so the displaywelcome() function is called without arguments. The function sets up the dictionary of values to be inserted into the form. The actual insert is performed with the line thisform = replace(htmlform, replacedict). The whole page body consists of the welcome message and the form. This is inserted into the main page template using wholepage = pagetemplate.replace('**body**', pagebody).

When the user hits submit, the form is sent. This is a slightly more complicated situation. getform() retrieves all the values from the form and passes them to displayform() as a dictionary. Using the _charset_ parameter, the code can deduce what character encoding was used to send the form. That encoding is used to decode all the parameters into Unicode strings.

Next, the Unicode string build replacedict in the same way as displaywelcome(). This time, however, the code displays the results as well. Simple string interpolation puts them into results with fullresults = results % (thename, radselect, unrad, did1, did2, hidval, encoding). When displaying the results, the page is comprised of fullresults + divider + thisform, which is displayed in the same way as in displaywelcome.

And that's it.

Because the form is sent using the GET method, you can see how the results are encoded into the URL when you hit Submit. Check out the online version at http://www.voidspace.org.uk/cgi-bin/article/formprocessor.py . Download the source from formprocessor.py.

Character Encodings

It's no longer possible to just talk about "a string of text". [6] Different languages use different character sets (as you may have noticed). Computers cope with this by using different sets of numbers to represent different character encodings.

In Python, like most computer languages, a normal string is just a series of binary numbers. It only has any meaning if you know what character set it is encoded with -- that is what numbers represent which characters. So, to interpret any text from a form to a CGI you need to know what encoding was used.

There are lots and lots (and lots!) of different encodings. Unicode is the grandpappy of all encodings, though, and is a scheme for representing almost all of the world's character sets. The only Unicode compatible encoding that browsers use is UTF-8, an 8-bit character encoding.

There used to be no official way (that worked) to determine the character set a browser is sending information in. Luckily, both Mozilla and Internet Explorer provide a way around this. [7] If you include a hidden field with the name _charset_ in a form, it's automatically set to the character encoding name whenever the form is submitted. (Pay attention to the single underscores rather than the double underscores you are more used to for magic names.)

So, to process a form, you might think it's sufficient to set the encoding of the page. After all, shouldn't the browser honor that encoding and return the form in the exact same encoding? Uh, no. Most modern browsers let you see and even change what character encoding is used to display a web page. Indeed, some users set their browsers set to automatically change the encoding.

So, when sending a page, you must declare an encoding, probably UTF-8. When you receive a form submission, you must check the encoding using the _charset_ field and decode the submission into Unicode strings.

You can specify the encoding for sending using the Content-Type header. By using Content-Type: text/html; charset=iso-8859-1, say, you can specify a Latin-1 (ISO-8859-1) character set. In formprocessor.py, the encoding is hard-coded into a <meta http-equiv...> tag. (This method works for static web pages as well as pages sent by a program. Besides, it it's nice to have several techniques under your belt.)

formprocessor.py uses the string method decode() to turn the form parameters into Unicode objects. Inside the application then, all strings can be kept as Unicode objects.

When the strings are sent, the encode() method turns them back into UTF-8 encoded strings. Both "normal" strings and Unicode strings have an encode() method. Calling it on ordinary strings can unexpectedly raise UnicodeDecodeError if the string contains characters not in sys.getdefaultencoding(). It's much better to first decode the string into Unicode by explicitly specifying the encoding used.

Who Are You ?

formprocessor.py is all well and good, but it's a "one-shot" script: it runs once and it's job is done. This is unlike a typical computer application, which runs in the background waiting for the user to fiddle with its user interface widgets. Typical GUI applications maintain state from one interaction to the next because the application is persistent. A web application does not maintain state, as HTTP is a stateless protocol. Nothing in an HTTP transaction tells you who the user is or what they were doing previously. [8]

When programming with CGI, you'll likely want some way to preserve the state of the application in between accesses. In the case of an online shop, this might be the identity of the user and the contents of his or her shopping cart. Maintaining state information between each unique HTTP request is usually called session management.

Assuming you need to store some information on the server about each active session, you need a way of tying each request to a specific user. You also need to have a way of cleaning up old session information.

There are two easy ways of managing sessions, both of which you might have seen and both of which have pros and cons. You can either place a session identifier in a hidden form field on every page or send a cookie as one of the HTTP headers.

The former has the disadvantage that every new page access must be through a form submission. The disadvantage of the latter is that some users switch off cookies because they are sometimes used for privacy invading advertising and tracking purposes.

Sending cookies is quite but not entirely simple using the Python cookie module. Using cookie, when the user next makes a request to your application, a cookie is returned. You can access it from the HTTP_COOKIE environment variable.

When sending a cookie you can specify a maximum age of the cookie, in seconds, after which it will be disposed of. If you don't specify a maximum age, the browser is likely to dispose of it after the browser is closed. These are called "session cookies".

You can also specify a cookie path. If this is set, the browser only returns the cookie if the URL accessed is on the specified path. This can be useful if you have several applications in the same domain (although in fact, you can have several cookies in the same domain anyway).

Conclusion

CGI is the simplest way to write web applications. (Additionally, if you don't have control over the server you are using [9], then it may be your only choice.) Once you've solved basic problems like user authentication, session management, and the right degree of separation between code and your HTML, your framework ought to scale easily up to bigger applications.

Note

Some of the functions used in these two articles have formed the basis of a module called cgiutils.

These are a set of functions to make writing CGIs easier. Some of the functions are also generally useful.

You can see more example CGI programs and modules over at the Voidspace CGI Page.


Footnotes

[1]For the full lowdown on DOCTYPE declarations go to http://www.alistapart.com/articles/doctype/.
[2]For a full discussion of how to send the character set declaration, see http://www.w3.org/TR/REC-html40/charset.html.
[3]As well as being the author of Designing With Web Standards, Jeffrey Zeldman runs http://www.alistapart.com, a very useful resource for attractive and standards-compliant web design.
[4]Some Python templating systems include Cheetah, Kid, and PTL from Quixote.
[5]See http://hoohoo.ncsa.uiuc.edu/cgi/env.html for a list of the CGI environment variables that ought to be available.
[6]For the minimum that every programmer needs to know about Unicode, read this "Joel On Software" article: http://www.joelonsoftware.com/articles/Unicode.html. For a good "Unicode over the web" reference, you could do worse than the official FAQ located at http://www.unicode.org/faq/unicode_web.html.
[7]See https://bugzilla.mozilla.org/show_bug.cgi?id=18643 for some of the Mozilla discussion about accepting this change.
[8]If the server supports Basic Authentication you can access the username through the REMOTE_USER environment variable. You'll have to manually configure this though and you can't access/modify/control it from within CGI.
[9]Like a normal shared hosting account.

For buying techie books, science fiction, computer hardware or the latest gadgets: visit The Voidspace Amazon Store.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Tue Aug 2 00:51:34 2011.

Counter...