Re: Persistent Objects Spec: A case study :-)

David Ziegler (david@hookup.net)
Mon, 8 Aug 1994 15:25:43

-------------------------------------------------------------------------

In your Q/A document you say:

>In a future process, to reincarnate the above instance:
>
>a_born_again = MyPer().Find("the name that printed above")
>
>Notice that in the creation of a_born_again, we had to call a
>constructor MyPer() just to make a dummy instance so that we could
>call the Find() method. (Too bad Python doesn't have static member
>functions :-) ).

I would prefer a ``Find'' function that:

1) Looks for an in-core copy of the object, and if found returns this.
2) If an in-core version is not found, look up the object in the object file.
3) Use the type information stored in the filed object to create an
object of the appropriate type and use the Find method in this object
to load the data.

Because:

1) It respects the fact that python is a dynamically-typed language. In
python a variable (parameter, dictionary cell etc) is not constrained
to holding references to only one type of value. Your database system
mostly maintains this property (eg. The typing of objects referred to by
member variables can vary over the lifetime of the program or database)
but being forced to specify a type when an object is explicitly loaded
breaks this. If you want this feature for type safety an optional type
assertion parameter can be added.
2) It allows demand-loaded sub-objects (ie, objects referred to by member
variables) to be loaded using the same mechanism as the explicitly loaded
objects. (There is no way for the programmer to step in and specify the
type for demand-loaded sub-objects so your current implementation will
have to convert sub-object pointers to an objectID/objectTYPE pair).
2) It is consistent with how demand-loaded sub-objects must be loaded. (There
is no way for the programmer to step in and
3) It gets rid of the slightly confusing dummy instance.

However:

One use of a overloaded ``Find'' method that you give that would not be
possible with this implementation is to store the entire object in the
object ID. I do not consider this to be much of a loss however because as
your databases grow to substantial size, and as you want to support multiple
object storage mechanisms I think you will want the contents of the object ID
to be the sole responsibility of the object storage mechanism. (For example
some existing OODBs use a unique 64 bit integer as their object ID, but to make
finding objects as fast as possible, object references also contain the
disk address of the object.)

-------------------------------------------------------------------------

In your spec. you say:

> The Find() method typically causes the state of the
> persistent object known by the given identifier to be
> loaded into the given instance. Any external,
> persistent object references stored in the object's
> state will be resolved before this method completes.

In a system where all sub-object references are automatically loaded, unless
covert tricks are used, the entire (non-garbage) database will be loaded when
the root object is loaded. This is clearly not workable for large
databases or databases that are to be shared with other processes. This will
force programmers to resort to covert tricks like converting object
references to strings before saving an object, then looking up the object just
before use.

Storing database pointers (object IDs) in covert form creates
a situation in which the database system cannot understand the structure of
the data represented in the database (Without approximate techniques similar
to those used for C-language conservative garbage collectors). This makes
database garbage-collectors, packers, recoverers, converters (all basically the
same thing) that much harder to write. It also will make it impossible to
piggy-back on top of one of the many existing OODBs that assume that
the object ID belongs to them.

It could also be much more convenient for the programmer if you provided
some way of specifying sub-objects that were not demand-loaded. This is
a little tricky in python because there is no convenient way to associate
an attribute with a member variable declaration. One (far from ideal)
possibility would be to prefix all demand-loaded member variable names
with the word ``Demand'' (ie. embed the attribute in the name). One could
then make a new class encapsulating the object ID, and give this class a
``Get'' method that would lookup the referred to object (it could also cache
the pointer to this object inside the object ID class so future Gets for
the same member variable would not have to perform the lookup).

In use this mechanism would look like:

students = teacher.students.get()

As opposed to:

students = teacher.students

-------------------------------------------------------------------------

You allow every object to have an ``alias'' name that can also be used to
lookup the object. Why not separate this out from the objects themselves
and allow for general purpose indexes of objects. This is probably not
much more work for your initial version (It sounds like you are planning to
keep things like the alias table in-core anyway (?)), but will result in a
much more general-purpose system (Databases without indexes are much less
useful).

-------------------------------------------------------------------------

One thing that was missing from your spec. was a description of its
interface to the database engine, it would be nice if this interface was
designed to allow for the substitution of different backends.

-------------------------------------------------------------------------

Persistence is one thing I really miss in python, so I am really looking
forward to your system.

- David Ziegler (dziegler@hookup.net)