Paul Everitt, paul@digicool.com
Digital Creations, L.C.
910 Princess Anne Street Suite 300
Fredericksburg, VA 22401
http://www.digicool.com
The Python Object Publisher (Bobo) allows objects to be published without any Common Gateway Interface (CGI) or Hypertext Transfer Protocol (HTTP) specific code. Complex object hierarchies can be published with Uniform Resource Locators (URLs) that mimic the object hierachies. Form data, including file upload data, are marshalled into method parameters. Most tasks associated with the interaction between an application (such as URL traversal, parsing form data, parsing queries, parsing headers, parsing cookie data, access control, and error handling) are performed automatically. This paper provides an overview and simple examples.
Publishing dynamic information on the web is straightforward. Publishing it coherently is unnecessarily difficult.
Generally, connecting code to the Web involves the use of the
Common Gateway Interface[1] (CGI). For Python[2], this involves using the
cgi.py
module. The purpose of this module is to assist in
bringing HTML form data into your Python application.
For many uses, this is a wonderful boost. However, the scope of the module doesn't assist in returning the necessary HTTP headers. Nor does it provide support for things such as parsing cookies, or handling exceptions.
Moreover, your code is wired to use CGI. What if you would like to use FastCGI[3], or ILU[4], or COM/DCOM[5] as a server-extension mechanism? Or, if you would like to use a native Python-HTTP service? Or, when you would like to use your application without the Web (e.g. from the command line for debugging, or in a GUI framework?
This gives some background on the Python Object Publisher[6], affectionately known as Bobo. The original premise was to provide a level of abstraction between our ways to publish objects. To date, we have used:
The Python object publisher provides a simple mechanism for publishing a collection of Python objects as World Wide Web (Web) resources without any plumbing (e.g. CGI) specific code.
Next, we wanted the ability to prevent application errors from
returning strange traceback messages. As well, we wanted to find a way
to map a URL into a full message sent to invoke an operation on an
object. For instance, having:
http://www.here.com/cgi-bin/example/Cars/Pinto/purchase?name=Bob
maps into an object traversal of Cars
to invoke the
purchase
method on the instance Pinto
,
sending in an argument of name
with a value of
Bob
.
Subsequently, we began adding many more new features, and integrating other work to turn Bobo into an excellent environment for publishing Python objects. The next section discusses the benefits.
The following is an overview of some reasons to use Bobo:
To setup a module to be published by Bobo, you first decide what mechanism you'd like to use. Presuming you'd like to use CGI, you first setup your CGI directory, and a different directory that contains your application and its modules. This distinction enables you to protect your code by placing it outside of your Web root. Also, it allows many CGI areas to share the same code base.
Next, you join the two by making a symbolic link. The source would
be a generic module that implemented the Bobo plumbing for CGI, which
is named cgi-module-publisher
. This generic module would
be in the same directory as your application. The target of the
symbolic link would be the application's name in the CGI
directory.
For instance, in the URL mentioned above,
you might have the application installed in
/apps/mambo
. The main module is
example.py
, located in that directory. However, your Web
root is in /httpd/docs
, with a CGI directory of
/httpd/docs/cgi-bin
. You would publish
example.py with the command:
ln -s /apps/mambo/cgi-module-publisher
/httpd/docs/cgi-bin/example
What happens? The CGI call executes example
, which is
a symbolic link to the CGI module publisher. This standard Bobo module
then looks for a module in its directory that has the same name as the
CGI script (basically, the Python variable
sys.argv[1]
). Bobo then imports that module, and starts
looking for objects that match the request. We call this publishing
your object "under Bobo control", as Bobo is encapsulating your
objects.
A final note is that the ".", or current working directory, is still over in the CGI directory, even though your module can import things from the application directory.
Of course, everyone would like to start out with a code snippet. Here is an overview of a sample Bobo application. Let's say we have the following module:
#!/usr/local/bin/python '''An example module for the Bobo paper. This gives a simple example of nested objects, and publishing objects which receive messages from the Web. The URL might be: http://yourplace.mars/cgi-bin/example/Cars/Pinto/purchaseForm''' class Car: '''Vehicle, four wheels, you know what I mean.''' def purchaseForm(self,PARENT_URL): '''Give a short form back to collect purchase info.''' form = ( ' Purchase Information
\n' 'Please enter the information below:\n' '
The following URLs will give back the listed results:
cgi-bin/example/help
: Returns the docstring for the module.
cgi-bin/example/Cars/help
: Returns the docstring for
the Cars dictionary.
cgi-bin/example/Cars/Pinto/help
: Returns the docstring
for the Pinto instance.
cgi-bin/example/Cars/Pinto/purchaseForm
: Give the form
to interact with a Car instance.
For example, here is what the Web server returns for the last URL:
GET /cgi-bin/example/Cars/Pinto/purchaseForm HTTP/1.1 HTTP/1.0 200 OK Date: Tuesday, 08-Oct-96 15:37:19 GMT Server: Open-Market-Secure-WebServer/2.0.0.RC3 MIME-version: 1.0 Security-Scheme: S-HTTP/1.1 Content-Length: 450 Content-Type: text/html Purchase Information Form Purchase Information
Please enter the information below:
This example shows several interesting Bobo features:
Objects are published by including them in a published module. When a module is published, any objects that:
Alternatively, a module variable named web_objects
can be
defined. If this variable is defined, it should be bound to a
mapping object that maps published names to published objects.
Objects that are published through a module's web_objects
are not
subject to the restrictions listed above. For example, modules or
objects without documentation strings may be published by including
them in a module's web_objects
attribute.
Subobjects (or sub-sub objects, ...) of published objects are also published, as long as the subobjects:
A subobject that cannot have a docstring may be published by including a special attribute in the containing object named: subobject_name__doc__. For example, if foo.bar.spam doesn't have a doc string, but foo.bar has a non-empty attribute foo.bar.spam__doc__, then foo.bar.spam can be published.
Note that object methods are considered to be subobjects.
Object-to-subobject traversal is done by converting steps in the URI
path to get attribute or get item calls. For example, in traversing
from http://some.host/some_module/object
to
http://some.host/some_module/object/subobject
, the module
publisher will try to get some_module.object.subobject
. If the
access fails with other than an attribute error, then the object
publisher raises a "NotFound" exception. If the access fails with
an attribute error, then the object publisher will try to obtain the
subobject with: some_module.object["subobject"]
. If this access
fails, then the object publisher raises a "Not Found"
exception. If
either of the accesses suceeds, then, of course, processing continues.
If the final object encountered when traversing the URL has an
index_html
attribute, the object traversal will continue to this
attribute. This is useful for providing default methods for objects.
In some cases, a parent object may hold special attributes for a
subobject. This may be the case either when a subobject cannot have
the special attribute or when it is convenience for the parent
object to manage attribute data (e.g. to share attribute data among
multiple children). When the object publisher looks for a special
attribute, it first trys to get the attribute from the published
object. If it fails to get the special attribute, it uses the same
access mechanism used to extract the subobject from the parent
object to get an attribute (or item) using a name obtained by
concatenating the subobject name with the special attribute
name. For example, let foo.bar
be a dictionary, and foo.bar.spam
an item in the dictionary. When attempting to obtain the special
attribute __realm__
, the object publisher will first try to
evaluate foo.bar.spam.__realm__
, and then try to evaluate:
foo.bar["spam"+"__realm__"]
.
A published object, or the returned value of a called published
object can be of any Python type. If the returned value has an
asHTML
method, then this method will be called to convert the
object to HTML; otherwise, the returned value will be converted to a
string and examined to see if it appears to be an HTML document. If
it appears to be an HTML document, then the response content-type
will be set to text/html
. Otherwise, the content-type will be set
to text/plain
.
A special case is when the returned object is a two-element tuple. If the return object is a two-element tuple, then the first element will be converted to a string and treated as an HTML title, and the second element will be converted to a string and treated as the contents of an HTML body. An HTML document is created and returned (with type text/html) by adding necessary html, head, title, and body tags.
If the returned object is None or the string representation of the returned object is an empty string, then the HTTP return status will be set "No Content", and no body will be returned. On some browsers, this will cause the displayed document to be unchanged.
For instance, if an invoked method ends with the statement:
then an HTML document will be returned. Areturn ('Your car has been purchased', ' Thank you!
Your money will look great in our account.
' )
TITLE
of
"Your car has been purchased" will be inserted into a skeletal HTML
snippet.
Bobo will also assist in setting the BASE tag to alleviate the affects of nested object references in PATH_INFO.
Unhandled exceptions are caught by the object publisher and are translated automatically to nicely formatted HTTP output.
When an exception is raised, the exception type is mapped to an
HTTP code by matching the value of the exception type with a list of
standard HTTP status names. Any exception types that do not match
standard HTTP status names are mapped to "Internal Error" (500). The
standard HTTP status names are: "OK"
,
"Created"
, "Accepted"
, "No
Content"
, "Multiple Choices"
,
"Redirect"
, "Moved Permanently"
,
"Moved Temporarily"
, "Not Modified"
,
"Bad Request"
, "Unauthorized"
,
"Forbidden"
, "Not Found"
, "Internal
Error"
, "Not Implemented"
, "Bad
Gateway"
, and "Service Unavailable"
, Variations on
these names with different cases and without spaces are also
valid.
An attempt is made to use the exception value as the body of the
returned response. The object publisher will examine the exception
value. If the value is a string that contains some white space, then
it will be used as the body of the return error message. It appears
to be HTML, the error content type will be set to
text/html
, otherwise, it will be set to
text/plain
. If the exception value is not a string
containing white space, then the object publisher will generate it's
own error message.
The exceptions to the above are not covered here.
One of the essential benefits of Bobo is that you get to write Python code that looks like Python code. That is, if you have a method that expects some arguments, Bobo will get those values out of the request and pass them into your object.
How does Bobo do it? First, it traverses the PATH_INFO to get to
the final object. Let's say the final object is a method of an
instance. This method in your module is expecting 'name' and 'age' to
be passed in, as stated in the method signature (self, name,
age)
. Bobo inspects the incoming form data, finds the values
for the 'name' and 'age' fields, and passes them in as arguments as it
calls your method "under Bobo control".
If you put an argument in your method signature, it must be in the
request. Otherwise, Bobo will return a HTTP error with a
nicely-formatted error message. You can get around this by providing
default arguments for those variables that are not mandatory. For
instance, you could have a method signature of (self, name,
age=None)
.
Normally, string arguments are passed to called objects. The called object must be prepared to convert string arguments to other data types, such as numbers.
If file upload fields are used; however, then FileUpload objects will be passed instead for these fields. FileUpload objects bahave like file objects and provide attributes for inspecting the uploaded file's source name and the upload headers, such as content-type.
If field names in form data are of the form: name:type, then an attempt will be to convert data from from strings to the indicated type. The data types currently supported are:
Python floating point numbers
Python integers
Python long integers
python strings
Python case-sensitive regular expressions
Python case-insensitive regular expressions
Date-time values
For example, if the name of a field in an input form is age:int, then the field value will be passed in argument, age, and an attempt will be made to convert the argument value to an integer. This conversion also works with file upload, so using a file upload field with a name like myfile:string will cause the UploadFile to be converted to a string before being passed to the object.
Additionally, Bobo will bind special values to variables and make
them available in method signatures. The most common are those
supported in the standard cgi.py
module. However, other
interesting arguments might be the REQUEST
and
RESPONSE
objects can be obtained. These allow finer
control of Bobo's handling of the operation.
The most interesting additions to the extended variables are those that provide new services. For instance, supporting cookies is a breeze. If you expect a value from a cookie, just put the name of the cookie in the method signature. Bobo will get it for you. Also, there are variables that can help in object traversal, such as BASE, PARENT_URL (the URL to the spot above you in the hierarchy), and others.
Access to an object (and it's subobjects) may be further restricted
by specifying an object attribute named __allow_groups__
. If set,
this attribute should contain a collection of authorization groups.
The __allow_groups__
attribute may be a mapping object, in which
case it is a collection of named groups. Alternatively, the
__allow_groups__
attribute may be a sequence, in which case it is
a collection of named groups. Each group must be a dictionary that
use names as keys (i.e. sets of names). The values in these
dictionaries may contain passwords for authenticating each of the
names. Alternatively, passwords may be provided in separate "realm"
objects. If no realm is provided, then basic authentication will be
used and the object publisher will attempt to authenticate the
access to the object using one of the supplied name and password
pairs. The basic authentication realm name used is
module_name.server_name
, where module_name
is the name of the
module containing the published objects and server_name is the name
of the web server.
The module used to publish an object may contain it's own
__allow_groups__
attribute, thereby limiting access to all of the
objects in a module.
If multiple objects in the URI path have __allow_groups__
attributes, then the __allow_groups__
attribute from the last
object in the path that has this attribute will be used. The
__allow_groups__
attribute for a subobject overrides
__allow_groups__
attributes for containing objects, however, if
named groups are used, group data from containing objects may be
inherited by contained objects. If a published object uses named
groups, then for each named group in the published object, group
data from groups with the same name in contained objects will be
inherited from container objects if:
If the name of a group is the python object, None
, then data from
named groups in container objects will be inherited even if the
grouped don't appear in the inheriting object, subject to the
restrictions above.
When group data are inherited, then inherited data is appended to the existing data. When groups contain names and passwords, individual user names may have multiple passwords if they appear in multiple groups.
Note that an object may have an __allow_groups__
attribute that is
set to None, in which case the object will be public, even if
containing objects are not.
Bobo provides some facilities to assist in tracking errors. At the base level, Bobo provides traceback information in the HTML that is returned to the user. The traceback information is sent as an HTML comment, so that the average user does not see it.
Also, Bobo provides some command-line facilities. There is a
bobo.py
module that simulates that HTTP request, thus
recreating most of the environment of an actual request. This
command-line Bobo also provides:
We have invested serious time in using Bobo for fielding commercial work. Along the way, we have tackled several other problem areas and integrated their solutions with Bobo.
For instance, the area of dynamic generation of HTML is addressed with our DocumentTemplate module. These objects are particularly integrated with Bobo, and have been designed to work with Bobo features such as nested traversal of objects.
Also, our persistent object storage effort (affectionately nicknamed Bobobase) is integrated. Specifically, other components are ensured to be pickleable and to work with the transaction capabilities of Bobobase.
Below is a brief treatment of Bobo issues:
__allow_groups__
for a
method is assigned outside the method, using a specially-named
variable.
getattr
has created some
curious Bobo hacks, specifically to support lazy access of persistent
data. Furthermore, the act of traversing a hierarchy of objects, as
represented by the URL, could be more tightly controlled using
improved attribute referencing.
Bobo is freely-available. It currently exists as a set of modules, with some extensions for optimization (though the extensions are not necessary). It has been heavily tested, in support of paid-for applications.
However, there has been very little optimization. Currently, we have a research project to add subclassable C types. The hope is that several pieces of Bobo could be tightly-controlled in ways not possible due to the issues raised above.
Finally, we are working on pushing Bobo out of the nest. This first involves getting it working well on Windows. Also, we need to work on our ILU support, and some of the other infrastructure items. As well, we have a lot of documentation and proselytizing to do.