Persistent Objects Spec: A case study :-)

Jim Roskind (jar@infoseek.com)
Wed, 27 Jul 1994 17:31:26 -0700

The following is the spec I made reference to in my earlier email. It
is the spec for a module that supports persistence (a case study?). I
convinced my management to let me post both the source and the spec,
but I thought it would be more interesting to discuss the spec for a
bit without confusing the issue by examining some of the
implementation flaws ;-).

The spec was initially drafted by Ed Miller here at InfoSeek. The
initial implementation was started by me in collaboration with Todd
Jonz, also here at InfoSeek. The current implementation (and
resulting spec) is mostly my fault ;-).

>From my perspective, the interesting commentary will take the form of
"Why the heck did you need ... method when you can ...?" or "How the
heck can you get ... with only ... to help you?" or "What is the
advantage of ... in your ... method over ...?" I know, if I was
reading this spec for the first time, a *lot* of my comments would be
questions of this sort. In truth, it was only *after* we started
implementing this stuff that we started to see how we *should* have
spec'd things (and then we went back and fixed the spec).

It is also fair to argue about our definition of persistence, and ask
questions which (possibly) can't be addressed by our perspective of
the world.

By the way, the current implementation is done using only Python, and
no embedded C code. (i.e., nothing up my sleeves, ... and the hands
never leave the wrists ;-) ). Our intention is to post it (or make
it ftp'able) shortly.

Jim

Jim Roskind
voice: 408.982.4469
fax: 408.986.1889
jar@infoseek.com

cut here--------------------------------------------------------
(c) Copyright InfoSeek Corp 1994, All rights reserved.

Permission is granted to electronically reproduce this document
provided the copyright notice in intact and applicable in all such
copies. Printed redistribution is only permitted with expressed
written consent of InfoSeek Corp.

MODULE
pobject

Module Overview

The pobject module implements the PersistentObject class. All
identifiers described in this specification are bound in the pobject
module scope. A variety of private static functions are included in
this module to provide a persistent representation of data. The
functions all begin Prepr...(). This is a take-off of the Python
repr() function. Data that is much more complex (re: recursive, or
self- referential data structures) than can be processed by repr() is
handled automatically in this module.

There are some restrictions on the objects that may be derived from
this base module, and persistently stored. It must be possible to
refer to the class directly, and create a "hollow" shell of such an
object at reincarnation time. As a result, the most derived class
*must* have a constructor that accepts a void argument list. The
class must be present in the module dictionary for module in which it
is defined. This requirement guarantees that the name of the class,
and the module name, can be used to specify the actual class for
reincarnating an instance. Finally, the most derived class must have
some functions (this is an internal python restriction, in that these
functions are used to find the module name in which the most derived
class is defined).

The current implementation uses a directory to store the persistent
images of objects. This directory is selected by setting the
environment variable 'PYTHON_PDS' to point to the desired directory.
If no enviroment variable is found, the system defaults to using the
/tmp directory.

Module Global Data
DataStorePath # The root directory of the persistent store area.

Module Functions
There are no public functions provided by the pobject module.

Exceptions

The PersistError exception may be raised in certain
conditions. See the Find(), FindAlias(), Save(), and SetAlias()
methods below.

If a programmer attempts to use this module in ways that are
beyond the scope of this module, an EncodingError may result.
This typically can only appear under fairly bizarre circumstances,
such as attempts to save "code objects," or attempts to save a
member of a class with no methods (subtle internal Python
limitation).

CLASS
PersistentObject Basic class
WriteOnceObject Fancy alternate base class

Class Overview

The PersistentObject class methods allow instance objects to be
created in such a way that the state of these objects are
automatically saved persistently. We say such state is saved
persistently when that state survives across process boundaries.

The only possible way to refer to data across process boundaries
is to have a persistent name for an object. Hence, for each
instance, it is possible to retrieve a persistent name that can be
used externally to refer to such persistently stored state. Two
names are available for use. One name is automatically generated
on request (see GetID()), and the other is a caller specified name
(see SetAlias()). For each persistent name, there is a function
that can be used to "find" or "reincarnate" an object given its
persistent name. To accomplish such a task, one of the two
methods Find() and FindAlias() are used.

The PersistentObject class is intended to be a base class from
which other object classes could be derived. These derived classes
will inherit the properties of persistence from this base class.
It is allowable to override the public methods in the derived
class, as long as the semantics (e.g., return value type, and
functionality) are preserved.

In addition to the fundamental persistent object class, there is a
derived class WriteOnceObject that is commonly used in place of
PersistentObject as a base class. This alternate class should
only be used if instances of the programmer's derived class are
only modified in the process in which they are created. This
restriction allows this alternate base class to optimize away the
additional saving of state when secondary processes examine
instances.

In all cases, if an external name is not requested for a given
object type during the life of the creating process, then there is
*no* way to refer to the object in future processes. As a result,
in such situations, the state of the object is never saved (unless
it is referenced by other objects, which *can* be reincarnated.
In such situations, the saved object requests the persistent name
of the otherwise lost object, which in turns forces the lost
object to be saved as well).

Users of derived classes are also given some assurances about the
representation of the reincarnated objects. To be specific, a
user that refers to a single persistent object via several
distinct persistent objects, can be assured that exactly one such
central object will be created during a re-incarnation. Note that
in the case of write once objects, the programmer is *not*
permitted to modify the reincarnated instance (by definition), and
hence the implementation has the right to induce multiple copies
of such a WriteOnceObject if efficiency suggests it would be
useful.

Protected Class Name Space Items

Save() # store persistently the state of an object
Prepr(): # build executable string representation
Load(Data): # load data file (list of lines) into self
Prepr_members(): # build executable list of lines, representing
settings of members to values

Private Class Name Space Items

GetID (ObjectID): # Access method
CreateObjectId (): # Create pseudo random ID
AliasToID(Alias): # Translate an alias into an ID

Public Data Attributes
There is no public global data provided by the PersistentObject
class.

Class Methods

__init__() # initialize a persistent object
Find( id ) # find or load the object associated with an id
FindAlias( alias ) # find or load the object associated with an alias
GetID() # retrieve the persistent id of an object
SetAlias( alias ) # associate an alias with an object

====================================

Details
------------------------------------

Name: __init__

Arguments:

The __init__() method takes no arguments (other than
the implicit instance object).

Description:

This is the instance initializer method for the
PersistentObject class. The new instance registers
itself in such a way that its state will automatically
be saved when the process exits (see also the Save()
method, below) if ever a persistent name is directly
or indirectly acquired.

This function is normally invoked by Python when it
creates a new PersistentObject instance
object. However, implementors of classes derived from
the PersistentObject class should be aware that this
method must be invoked by the initializer method of
the derived class. For example, if the class Foo is
derived from the PersistentObject class, then the
Foo.__init__() method should invoke the method
PersistentObject.__init__(). It is a system error if a
class derived from the PersistentObject class does not
initialize its instance objects with this method.

Return value:

The __init__() method returns no value.

Exceptions:

The __init__() method does not explicitly raise any
exceptions.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Name: Find

Arguments:

The Find() method takes an external, persistent object
name (a string) as its argument (as well as the
implicit instance object). The name should have been
generated with the GetID() method, described below.

Description:

The Find() method typically causes the state of the
persistent object known by the given identifier to be
loaded into the given instance. Any external,
persistent object references stored in the object's
state will be resolved before this method completes.
If the specified object is already resident in RAM,
then this method will discard the supplied instance
(re: self) and return the previously loaded instance.

Return value:

The Find() method returns the instance object whose
state has been found and loaded. In some cases, this
is the instance supplied with the call, in other cases
it is a previously created object.

Exceptions:

The Find() method raises the PersistError exception if
the state of the persistent object can not be
restored.

Since other persistent objects that are referenced by
slots of the loaded object get resolved (loaded) by
this method, other Find() methods may be invoked. It
is possible that those methods may also raise the
PersistError exception. If a PersistError exception
occurs, the Find() method will ensure that no
unresolved persistent object references remain in
memory. Although the Find() method handles such an
exception (only to the extent of returning with a
fully consistent RAM state) , it will pass the
exception on to the caller. This allows the caller to
know that portion of the persistent object could not
be reincarnated, and hence NONE of the object is
reincarnated.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Name: FindAlias

Arguments:

The FindAlias() method takes a string as an argument
(as well as the implicit instance object). This string
represents a unique alias that was associated with an
object via the SetAlias() method.

Description:

The FindAlias() method typically causes the state of
the persistent object known by the given alias to be
loaded into the given instance. If the selected object
is already in RAM, then a reference to the previously
loaded object will be returned. Any external
persistent object references stored in the object's
state will be resolved before this method completes.

Return value:

The FindAlias() method returns either the instance it
is supplied with, or a previously loaded instance. In
any case, the returned reference is the selected
object instance.

Exceptions:

The FindAlias() method will raise a PersistError
exception if the state of the persistent object can
not be restored. See the Find() method, described
above, for details.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Name: GetID

Arguments:

The GetID() method takes no arguments (other than the
implicit instance object).

Description:

The GetID() method returns a string containing a
unique persistent identifier that represents the
instance object. The identifier can then be used to
retrieve the state of the persistent object at a later
date with the Find() method, described above. The
only current constraints on the return string is that
it contains no carriage returns, and contains only
printable ASCII characters plus white space.

Calling this function has a significant performance
side effect. Once the ID for an object has been
extracted, it is necessary for the object to *really*
be saved across process boundaries. If there is no
way to refer to the object in the future (re: either
GetID or SetAlias) then the object can actually be
discarded at the termination of the process.

Return value:

The GetID() method returns a string containing the
persistent identifier.

Exceptions:

The GetID() method does not explicitly raise any
exceptions.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Name: SetAlias

Arguments:

The SetAlias() method takes a string as an argument
(as well as the implicit instance object). This string
represents a unique persistent alias to be associated
with the instance object's state. Note that a single
object can have an arbitrary number of alias. The
only constraint on the selection of alias names is
that for a *single* class, it is impossible to reuse
an alias string. Note that use of an alias string for
one class does NOT prevent reusing the alias for an
instance of another class.

Description:

The SetAlias() method associates the given alias with
the state of the given instance object. The alias may
then be used to restore the state of the instance
object via the FindAlias() method, described above.

Calling this function has a significant performance
side effect. Once the Alias for an object has been
established, it is necessary for the object to *really*
be saved across process boundaries. If there is no
way to refer to the object in the future (re: either
GetID or SetAlias) then the object can actually be
discarded at the termination of the process.

Return value:

The SetAlias() method returns no value.

Exceptions:

The SetAlias() method will raise a PersistError
exception ONLY if the alias string has already been
used for the given class.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Name: Save

Arguments:

The Save() method takes no arguments (other than the
implicit instance object).

Description:

The Save() method causes the state of the instance
object to be stored in a persistent way. This state is
stored such that it can be retrieved via the Find()
method, described above. Note that all
implementations of the Save methods are "lazy," and
hence only perform a save *if* a persistent name
exists for the give object (see GetID and SetAlias).

It is never necessary to invoke this method
explicitly. An instance object's __init__() method
will register each Save() method so that it will
be invoked when the process exits. This means that
persistent objects are saved automatically.

Note that it is not possible to save the entire state
of an instance object. No methods are saved, and only
certain data attribute members can be stored (the
common types of data members *are* saved). The members
that can be stored include those of the data types
None, Numbers, Sequences, and Mapping Types. (See
section 3.2 of the Python Reference Manual for a
description of these data types.) Additionally,
references to other instance objects (that is, data of
the Class Instance type) are automatically stored.
For arbitrary classes, reincarnation of such
referenced instances may cause duplicate copies of
such objects to be constructed. To prevent duplicate
copies from being constructed, such referenced
instances must be instances of a class derived from
the PersistentObject class.

This method does not modify or destroy the instance
object. Thus, it is possible to Save() the instance,
then continue to refer to it. If an object is changed
after it is Save'ed, then such changes might not be
automatically Save'ed. This latter comment is only
significant if a Save method in one object (which
happens at the end of a process) induces a change in
another object which has already been Save'ed. This is
never a problem unless the derived class redefines
this (Save) method.

Return value:

The Save() method returns no value.

Exceptions:

The Save() method will raise a PersistError exception
if the instance object's data attribute members
contain data that cannot be stored persistently. Data
of types: Callable Types, Files, and Internal Types
are all illicit.

Public Data Attributes (side effects):

The PersistentObject class defines no public data
attributes.

Notes
-----

There are several possible settings within this class which can change
the definition of the WriteOnceObject. This various implementations
are not guaranteed to be compatible. The variations are provided to
enable rapid prototyping and optimization experiments.

Bugs
----

Only the data in each object is stored, not the methods. This
requirement saves a lot of storage, but poses some odd (but simple)
requirements on derived classes. Classes that are not visible at file
scope (example: classed defined within functions etc) can typically
not be referred to directly. Hence there is no persistent way to
describe such classes, and an EncodingError will result from an
attempt to save such an object.

Some data structures can be created in C, but not in Python (example:
self referential tuples, with no intervening mutable data-structures).
Such objects cannot be created in Python, and hence are not candidates
for persistent storage (and resurection via Python).

The current implementation does not distinguish (in its generation of
persistent names) between identical class names defined in different
modules. As a result, attempts to reuse class names can cause
attempts to set aliases in these identically named classes to fail.

Change History
--------------

ed: 11 Mar 94 initial draft
ed: 14 Mar 94 several mods; removed Exists() and Load()
ed: 15 Mar 94 copied into new template; several mods; added
SetAlias() and FindAlias()
jar: 24 Apr 94 Updated to reflect status of prototype
jar: 2 Jun 94 Updated toward the current doc template
jar: 6 Jun 94 Updated based on review by group 6/6/94