Using the Python Object Publisher

Paul Everitt, paul@digicool.com

Digital Creations, L.C.
910 Princess Anne Street Suite 300
Fredericksburg, VA 22401
http://www.digicool.com


Abstract

The Python Object Publisher (Bobo) allows objects to be published without any Common Gateway Interface (CGI) or Hypertext Transfer Protocol (HTTP) specific code. Complex object hierarchies can be published with Uniform Resource Locators (URLs) that mimic the object hierachies. Form data, including file upload data, are marshalled into method parameters. Most tasks associated with the interaction between an application (such as URL traversal, parsing form data, parsing queries, parsing headers, parsing cookie data, access control, and error handling) are performed automatically. This paper provides an overview and simple examples.

Background

Publishing dynamic information on the web is straightforward. Publishing it coherently is unnecessarily difficult.

Generally, connecting code to the Web involves the use of the Common Gateway Interface[1] (CGI). For Python[2], this involves using the cgi.py module. The purpose of this module is to assist in bringing HTML form data into your Python application.

For many uses, this is a wonderful boost. However, the scope of the module doesn't assist in returning the necessary HTTP headers. Nor does it provide support for things such as parsing cookies, or handling exceptions.

Moreover, your code is wired to use CGI. What if you would like to use FastCGI[3], or ILU[4], or COM/DCOM[5] as a server-extension mechanism? Or, if you would like to use a native Python-HTTP service? Or, when you would like to use your application without the Web (e.g. from the command line for debugging, or in a GUI framework?

This gives some background on the Python Object Publisher[6], affectionately known as Bobo. The original premise was to provide a level of abstraction between our ways to publish objects. To date, we have used:

and are looking at COM/DCOM as well as CORBA[7]. The layer of abstraction would immunize that actual application from the infrastructure. Or, as the docstring says:
The Python object publisher provides a simple mechanism for
publishing a collection of Python objects as World Wide Web (Web)
resources without any plumbing (e.g. CGI) specific code.

Next, we wanted the ability to prevent application errors from returning strange traceback messages. As well, we wanted to find a way to map a URL into a full message sent to invoke an operation on an object. For instance, having:

http://www.here.com/cgi-bin/example/Cars/Pinto/purchase?name=Bob

maps into an object traversal of Cars to invoke the purchase method on the instance Pinto, sending in an argument of name with a value of Bob.

Subsequently, we began adding many more new features, and integrating other work to turn Bobo into an excellent environment for publishing Python objects. The next section discusses the benefits.

Benefits

The following is an overview of some reasons to use Bobo:

Typical Bobo Setup

To setup a module to be published by Bobo, you first decide what mechanism you'd like to use. Presuming you'd like to use CGI, you first setup your CGI directory, and a different directory that contains your application and its modules. This distinction enables you to protect your code by placing it outside of your Web root. Also, it allows many CGI areas to share the same code base.

Next, you join the two by making a symbolic link. The source would be a generic module that implemented the Bobo plumbing for CGI, which is named cgi-module-publisher. This generic module would be in the same directory as your application. The target of the symbolic link would be the application's name in the CGI directory.

For instance, in the URL mentioned above, you might have the application installed in /apps/mambo. The main module is example.py, located in that directory. However, your Web root is in /httpd/docs, with a CGI directory of /httpd/docs/cgi-bin. You would publish example.py with the command:

ln -s /apps/mambo/cgi-module-publisher /httpd/docs/cgi-bin/example

What happens? The CGI call executes example, which is a symbolic link to the CGI module publisher. This standard Bobo module then looks for a module in its directory that has the same name as the CGI script (basically, the Python variable sys.argv[1]). Bobo then imports that module, and starts looking for objects that match the request. We call this publishing your object "under Bobo control", as Bobo is encapsulating your objects.

A final note is that the ".", or current working directory, is still over in the CGI directory, even though your module can import things from the application directory.

Example

Of course, everyone would like to start out with a code snippet. Here is an overview of a sample Bobo application. Let's say we have the following module:


#!/usr/local/bin/python 
'''An example module for the Bobo paper.

This gives a simple example of nested objects, and publishing objects
which receive messages from the Web. The URL might be:

   http://yourplace.mars/cgi-bin/example/Cars/Pinto/purchaseForm'''

class Car:
    '''Vehicle, four wheels, you know what I mean.'''

    def purchaseForm(self,PARENT_URL):
	'''Give a short form back to collect purchase info.'''

	form = (
	    '<h1>Purchase Information</h1>\n'
	    '<p>Please enter the information below:\n'
	    '<form action="%s/purchase" method="GET">\n'
	    'Name: <input name="name"> Age: <input name="age:int">\n'
	    '<input type="submit" </form></p>')

	# PARENT_URL is supplied by Bobo, and tracks location in PATH_INFO
	response = form % PARENT_URL

	# Set TITLE, and return body
	return ('Purchase Information Form',response)

    def purchase(self,name,age):
	'''Send back some calculation.'''

	form = ('<h1>Thank You For Your Purchase</h1>'
		'<p>Well, %s, I think you are %s in dog years.</p>')

	# Bobo converted age to an integer, as directed in the form
	dog_years = age / 7
	response = form % (name,dog_years)
	return ('Purchase made',response)

pinto = Car()
taurus = Car()
lecar = Car()

# Make a dictionary, but give it a fake docstring to make it public
Cars__doc__ = "Clever hack, n'est-ce pas?"
Cars = {'Pinto':pinto, 'Taurus':taurus, 'LeCar':lecar}

The following URLs will give back the listed results:

  1. cgi-bin/example/help: Returns the docstring for the module.
  2. cgi-bin/example/Cars/help: Returns the docstring for the Cars dictionary.
  3. cgi-bin/example/Cars/Pinto/help: Returns the docstring for the Pinto instance.
  4. cgi-bin/example/Cars/Pinto/purchaseForm: Give the form to interact with a Car instance.

For example, here is what the Web server returns for the last URL:


GET /cgi-bin/example/Cars/Pinto/purchaseForm HTTP/1.1

HTTP/1.0 200 OK
Date: Tuesday, 08-Oct-96 15:37:19 GMT
Server: Open-Market-Secure-WebServer/2.0.0.RC3
MIME-version: 1.0
Security-Scheme: S-HTTP/1.1
Content-Length: 450
Content-Type: text/html

<html>
<head>
<title>Purchase Information Form</title>
	<base href="http://here.com/cgi-bin/">
</head>
<body>
<h1>Purchase Information</h1>
<p>Please enter the information below:
<form action="http://here.com/cgi-bin/example/Cars/Pinto/purchase" method="GET">
Name: <input name="name"> Age: <input name="age:int">
<input type="submit" </form></p>
</body>
</html>

This example shows several interesting Bobo features:

Publishing Objects

Objects are published by including them in a published module. When a module is published, any objects that:

are published.

Alternatively, a module variable named web_objects can be defined. If this variable is defined, it should be bound to a mapping object that maps published names to published objects. Objects that are published through a module's web_objects are not subject to the restrictions listed above. For example, modules or objects without documentation strings may be published by including them in a module's web_objects attribute.

Subobjects (or sub-sub objects, ...) of published objects are also published, as long as the subobjects:

A subobject that cannot have a docstring may be published by including a special attribute in the containing object named: subobject_name__doc__. For example, if foo.bar.spam doesn't have a doc string, but foo.bar has a non-empty attribute foo.bar.spam__doc__, then foo.bar.spam can be published.

Note that object methods are considered to be subobjects.

Object-to-subobject traversal is done by converting steps in the URI path to get attribute or get item calls. For example, in traversing from http://some.host/some_module/object to http://some.host/some_module/object/subobject, the module publisher will try to get some_module.object.subobject. If the access fails with other than an attribute error, then the object publisher raises a "NotFound" exception. If the access fails with an attribute error, then the object publisher will try to obtain the subobject with: some_module.object["subobject"]. If this access fails, then the object publisher raises a "Not Found" exception. If either of the accesses suceeds, then, of course, processing continues.

If the final object encountered when traversing the URL has an index_html attribute, the object traversal will continue to this attribute. This is useful for providing default methods for objects.

In some cases, a parent object may hold special attributes for a subobject. This may be the case either when a subobject cannot have the special attribute or when it is convenience for the parent object to manage attribute data (e.g. to share attribute data among multiple children). When the object publisher looks for a special attribute, it first trys to get the attribute from the published object. If it fails to get the special attribute, it uses the same access mechanism used to extract the subobject from the parent object to get an attribute (or item) using a name obtained by concatenating the subobject name with the special attribute name. For example, let foo.bar be a dictionary, and foo.bar.spam an item in the dictionary. When attempting to obtain the special attribute __realm__, the object publisher will first try to evaluate foo.bar.spam.__realm__, and then try to evaluate: foo.bar["spam"+"__realm__"].

Controlling Returned HTML

A published object, or the returned value of a called published object can be of any Python type. If the returned value has an asHTML method, then this method will be called to convert the object to HTML; otherwise, the returned value will be converted to a string and examined to see if it appears to be an HTML document. If it appears to be an HTML document, then the response content-type will be set to text/html. Otherwise, the content-type will be set to text/plain.

A special case is when the returned object is a two-element tuple. If the return object is a two-element tuple, then the first element will be converted to a string and treated as an HTML title, and the second element will be converted to a string and treated as the contents of an HTML body. An HTML document is created and returned (with type text/html) by adding necessary html, head, title, and body tags.

If the returned object is None or the string representation of the returned object is an empty string, then the HTTP return status will be set "No Content", and no body will be returned. On some browsers, this will cause the displayed document to be unchanged.

For instance, if an invoked method ends with the statement:


return ('Your car has been purchased',
        '<h2>Thank you!</h2><p>Your money will look great in our account.</p>'
       )
then an HTML document will be returned. A TITLE of "Your car has been purchased" will be inserted into a skeletal HTML snippet.

Bobo will also assist in setting the BASE tag to alleviate the affects of nested object references in PATH_INFO.

Exception Handling

Unhandled exceptions are caught by the object publisher and are translated automatically to nicely formatted HTTP output.

When an exception is raised, the exception type is mapped to an HTTP code by matching the value of the exception type with a list of standard HTTP status names. Any exception types that do not match standard HTTP status names are mapped to "Internal Error" (500). The standard HTTP status names are: "OK", "Created", "Accepted", "No Content", "Multiple Choices", "Redirect", "Moved Permanently", "Moved Temporarily", "Not Modified", "Bad Request", "Unauthorized", "Forbidden", "Not Found", "Internal Error", "Not Implemented", "Bad Gateway", and "Service Unavailable", Variations on these names with different cases and without spaces are also valid.

An attempt is made to use the exception value as the body of the returned response. The object publisher will examine the exception value. If the value is a string that contains some white space, then it will be used as the body of the return error message. It appears to be HTML, the error content type will be set to text/html, otherwise, it will be set to text/plain. If the exception value is not a string containing white space, then the object publisher will generate it's own error message.

The exceptions to the above are not covered here.

Handling Request Data

One of the essential benefits of Bobo is that you get to write Python code that looks like Python code. That is, if you have a method that expects some arguments, Bobo will get those values out of the request and pass them into your object.

How does Bobo do it? First, it traverses the PATH_INFO to get to the final object. Let's say the final object is a method of an instance. This method in your module is expecting 'name' and 'age' to be passed in, as stated in the method signature (self, name, age). Bobo inspects the incoming form data, finds the values for the 'name' and 'age' fields, and passes them in as arguments as it calls your method "under Bobo control".

If you put an argument in your method signature, it must be in the request. Otherwise, Bobo will return a HTTP error with a nicely-formatted error message. You can get around this by providing default arguments for those variables that are not mandatory. For instance, you could have a method signature of (self, name, age=None).

Normally, string arguments are passed to called objects. The called object must be prepared to convert string arguments to other data types, such as numbers.

If file upload fields are used; however, then FileUpload objects will be passed instead for these fields. FileUpload objects bahave like file objects and provide attributes for inspecting the uploaded file's source name and the upload headers, such as content-type.

If field names in form data are of the form: name:type, then an attempt will be to convert data from from strings to the indicated type. The data types currently supported are:

float

Python floating point numbers

int

Python integers

long

Python long integers

string

python strings

regex

Python case-sensitive regular expressions

Regex

Python case-insensitive regular expressions

date

Date-time values

For example, if the name of a field in an input form is age:int, then the field value will be passed in argument, age, and an attempt will be made to convert the argument value to an integer. This conversion also works with file upload, so using a file upload field with a name like myfile:string will cause the UploadFile to be converted to a string before being passed to the object.

Additionally, Bobo will bind special values to variables and make them available in method signatures. The most common are those supported in the standard cgi.py module. However, other interesting arguments might be the REQUEST and RESPONSE objects can be obtained. These allow finer control of Bobo's handling of the operation.

The most interesting additions to the extended variables are those that provide new services. For instance, supporting cookies is a breeze. If you expect a value from a cookie, just put the name of the cookie in the method signature. Bobo will get it for you. Also, there are variables that can help in object traversal, such as BASE, PARENT_URL (the URL to the spot above you in the hierarchy), and others.

Access Control

Access to an object (and it's subobjects) may be further restricted by specifying an object attribute named __allow_groups__. If set, this attribute should contain a collection of authorization groups. The __allow_groups__ attribute may be a mapping object, in which case it is a collection of named groups. Alternatively, the __allow_groups__ attribute may be a sequence, in which case it is a collection of named groups. Each group must be a dictionary that use names as keys (i.e. sets of names). The values in these dictionaries may contain passwords for authenticating each of the names. Alternatively, passwords may be provided in separate "realm" objects. If no realm is provided, then basic authentication will be used and the object publisher will attempt to authenticate the access to the object using one of the supplied name and password pairs. The basic authentication realm name used is module_name.server_name, where module_name is the name of the module containing the published objects and server_name is the name of the web server.

The module used to publish an object may contain it's own __allow_groups__ attribute, thereby limiting access to all of the objects in a module.

If multiple objects in the URI path have __allow_groups__ attributes, then the __allow_groups__ attribute from the last object in the path that has this attribute will be used. The __allow_groups__ attribute for a subobject overrides __allow_groups__ attributes for containing objects, however, if named groups are used, group data from containing objects may be inherited by contained objects. If a published object uses named groups, then for each named group in the published object, group data from groups with the same name in contained objects will be inherited from container objects if:

If the name of a group is the python object, None, then data from named groups in container objects will be inherited even if the grouped don't appear in the inheriting object, subject to the restrictions above.

When group data are inherited, then inherited data is appended to the existing data. When groups contain names and passwords, individual user names may have multiple passwords if they appear in multiple groups.

Note that an object may have an __allow_groups__ attribute that is set to None, in which case the object will be public, even if containing objects are not.

Debugging and Testing

Bobo provides some facilities to assist in tracking errors. At the base level, Bobo provides traceback information in the HTML that is returned to the user. The traceback information is sent as an HTML comment, so that the average user does not see it.

Also, Bobo provides some command-line facilities. There is a bobo.py module that simulates that HTTP request, thus recreating most of the environment of an actual request. This command-line Bobo also provides:

Related Services Integrated With Bobo

We have invested serious time in using Bobo for fielding commercial work. Along the way, we have tackled several other problem areas and integrated their solutions with Bobo.

For instance, the area of dynamic generation of HTML is addressed with our DocumentTemplate module. These objects are particularly integrated with Bobo, and have been designed to work with Bobo features such as nested traversal of objects.

Also, our persistent object storage effort (affectionately nicknamed Bobobase) is integrated. Specifically, other components are ensured to be pickleable and to work with the transaction capabilities of Bobobase.

Issues

Below is a brief treatment of Bobo issues:

  1. Arguments are collected and passed in using metainfo available in functions. The interface for this metainfo is not documented, and Guido has asserted that it might be deprecated. However, the interface has been discussed in the newsgroup.
  2. Some interesting objects currently can have no metadata, except docstrings. This means that Bobo applications have to adopt a convention to provide method metainformation. For instance, there is a special way to add docstrings to things that cannot have docstrings, such as dictionaries. Also, the __allow_groups__ for a method is assigned outside the method, using a specially-named variable.
  3. The current behavior of Python's getattr has created some curious Bobo hacks, specifically to support lazy access of persistent data. Furthermore, the act of traversing a hierarchy of objects, as represented by the URL, could be more tightly controlled using improved attribute referencing.
  4. Class attributes.

Current Status

Bobo is freely-available. It currently exists as a set of modules, with some extensions for optimization (though the extensions are not necessary). It has been heavily tested, in support of paid-for applications.

However, there has been very little optimization. Currently, we have a research project to add subclassable C types. The hope is that several pieces of Bobo could be tightly-controlled in ways not possible due to the issues raised above.

Finally, we are working on pushing Bobo out of the nest. This first involves getting it working well on Windows. Also, we need to work on our ILU support, and some of the other infrastructure items. As well, we have a lot of documentation and proselytizing to do.

References

  1. Common Gateway Interface - 1.1
    http://www.ast.cam.ac.uk/%7Edrtr/cgi-spec.html
  2. Python Home Page
    http://www.python.org/
  3. FastCGI
    http://www.fastcgi.com/
  4. ILU
    ftp://ftp.parc.xerox.com/pub/ilu/ilu.html
  5. COM/ActiveX/DCOM
    http://www.activex.org/
  6. Python Object Publisher, alias Bobo
    http://www.digicool.com/
  7. PersistentCGI
    http://www.digicool.com/
  8. CORBA
    http://www.omg.org/


ID: $Id: UsingBobo.html,v 1.2 1996/10/09 20:03:28 paul Exp $
paul@digicool.com
Last modified: Wed Oct 9 16:03:01 1996