Re: Persistent Objects Spec: A case study :-)

Jim Roskind (jar@infoseek.com)
Tue, 9 Aug 1994 14:00:20 -0700

Good set of questions. Some I even have good answers for, and some
are a mixed bag of "current implementation" and "we didn't find a need
for it yet."

> From: david@hookup.net (David Ziegler)
> Date: Mon, 8 Aug 1994 15:25:43
>
> >Jim Roskind wrote:
>
> >In a future process, to reincarnate the above instance:
> >
> >a_born_again = MyPer().Find("the name that printed above")
> >
> >Notice that in the creation of a_born_again, we had to call a
> >constructor MyPer() just to make a dummy instance so that we could
> >call the Find() method.
>
> I would prefer a ``Find'' function that:
>
> 1) 2) [same semantics as find method]
>
> 3) Use the type information stored in the filed object to create an
> object of the appropriate type and use the Find method in this object
> to load the data.

Our (questionable) predilection to C/C++ caused us to think more along
the lines of statically typed objects. As a result, there was some
desire to know *something* about the type of the return value from
"Find()". We have yet to reach a problem based on this
implementation, but I can understand your request. We just recently
found a related problem with our FindAlias() method, and an extension
(in the direction you're going) will be discussed RSN in another
posting ;-). For a related reason, I may *soon* have to extend the
contents of the persistent store to contain the type, so your request
may fall out.

I will note that it was *very* convenient to have the store-name
(a.k.a., file-name in our implementation) reflect the type of the
object, as this made it very easy for programmers to examine the
contents of objects between runs (use the file system as a class
browser).

> Because:
>
> 1) It respects the fact that python is a dynamically-typed language.

I think your philosophical point may be correct here. Alas, we
haven't run into a situation where we resurrected an object by name,
and had no idea what it was :-/. I am indeed fearful that our static
typing bias is showing a bit too strongly.

> 2) It allows demand-loaded sub-objects (ie, objects referred to by member
> variables)

Our current spec/implementation supports this. The trick is that the
Prepr() function bundles the requisite dummy instance creation, ...
Bottom line: It already works (but your suggestion *might* be
cleaner).

> 4) It gets rid of the slightly confusing dummy instance.

As would static class methods ;-).

> However:
>
> One use of a overloaded ``Find'' method ... would not be
> possible with this implementation

This seemed like a nice feature. I think we have to go further with
our implementation (re: using in a bigger system) and see just how
powerful a feature it really is. Indeed, if it is not used, then we
will probably be hard pressed to maintain our current approach.

> -------------------------------------------------------------------------
>
> In your spec. you say:
>
> > The Find() method typically causes the state of the
> > persistent object known by the given identifier to be
> > loaded into the given instance. Any external,
> > persistent object references stored in the object's
> > state will be resolved before this method completes.
>
>
> In a system where all sub-object references are automatically loaded, unless
> covert tricks are used, the entire (non-garbage) database will be loaded when
> the root object is loaded.

Yes and no. IF you have a root object that connects everything, you
will indeed have this problem. IF you have (as verbally suggested by
Ed Miller here at InfoSeek) a forest of rooted objects, then you will
*only* get the trees that you'll be working on in your process. We
felt it was pretty critical to *NOT* impose extra coding constraints
on programmers working in our persistent world. It seemed much more
reasonable to have high level design constraints (re: partitioning
your forest intelligently) than low level coding constraints (always
check to be sure a member is loaded before accessing it).

> This will
> force programmers to resort to covert tricks like converting object
> references to strings before saving an object, then looking up the object just
> before use.

Note that such "low level" changes are *only* done during an
optimization phase, and are *not* part of the day-to-day programming
experience. This allows for a much more focused (attentive?) coding
activity during these non-python-like coding frenzies.

> Storing database pointers (object IDs) in covert form creates
> a situation in which the database system cannot understand the structure of
> the data represented in the database (Without approximate techniques similar
> to those used for C-language conservative garbage collectors). This makes
> database garbage-collectors, packers, recoverers, converters (all basically the
> same thing) that much harder to write.

I think that it does make it harder to write a GC, but as you point
out, but techniques similar to what is done in C can be used. We had
anticipated that methods in classes would assist in identifying (as
you call them) "covertly referenced objects," but the "conservative"
approaches taken in C could also be applied (assume the worst case
that every string in every object *is* a persistent ID).

> It also will make it impossible to
> piggy-back on top of one of the many existing OODBs that assume that
> the object ID belongs to them.

You have *definitely* got me here, as we have yet to try to use an
"existing OODB" and have no understanding of the "interface
requirements." We have done all our implementation on top of a unix
file system (before you groan, recall the resulting ease of object
browsing :-) ).

> It could also be much more convenient for the programmer if you provided
> some way of specifying sub-objects that were not demand-loaded.

This approach clearly begins to put major league constraints on the
programmer. The days of "acting like it is ordinary python" would end
:-(.

> This is
> a little tricky in python because there is no convenient way to associate
> an attribute with a member variable declaration. One (far from ideal)
> possibility would be to prefix all demand-loaded member variable names
> with the word ``Demand'' (ie. embed the attribute in the name). One could
> then make a new class encapsulating the object ID, and give this class a
> ``Get'' method that would lookup the referred to object (it could also cache
> the pointer to this object inside the object ID class so future Gets for
> the same member variable would not have to perform the lookup).
>
> In use this mechanism would look like:
>
> students = teacher.students.get()
>
> As opposed to:
>
> students = teacher.students

We were *really* hoping to avoid adding lots of "persistent coding"
tricks to objects. The approach you propose may turn out to be a
pretty coding-efficient way to do so, *when* we see that optimization
requires it.

> -------------------------------------------------------------------------
>
> You allow every object to have an ``alias'' name that can also be used to
> lookup the object. Why not separate this out from the objects themselves
> and allow for general purpose indexes of objects. This is probably not
> much more work for your initial version (It sounds like you are planning to
> keep things like the alias table in-core anyway (?)), but will result in a
> much more general-purpose system (Databases without indexes are much less
> useful).

Our implementation does not keep the Alias lists in core. Since we
currently use the file-system, it is pretty clear that we have a lot
of freedom in how we name objects in the persistent-store. It turned
out that different classes of objects *wanted* their own name space
privacy for naming their objects, and hence it was very useful to have
class-based-aliasing, which is the method SetAlias() and FindAlias().
If I understood your question "Why not separate this out..." then you
are asking for the same function based interface that was requested
for Find(). Here the establishment of separate name spaces is *VERY*
desirable. Our current partitioning, with one name space per class,
and the name space is selected based on the class of an instance, may
be extended to allow repositioning in other names spaces (such as
those associated with a base class). (This is the topic of the other
posting that I promised RSN).

> -------------------------------------------------------------------------
>
> One thing that was missing from your spec. was a description of its
> interface to the database engine, it would be nice if this interface was
> designed to allow for the substitution of different backends.

Our current implementation tries to (functionally) isolate the
interface to the underlying data store. I guess it would be useful to
provide such a spec, so that the underlying code could be effectively
"swapped out," but we (to date) have been more concerned with the
programmer visible functionality. I think we'll soon be concerned
about the speed with a large DB, have to consider a *real* data base
instead of a file system, and then your request will come true. We
also thought that there was a chance that once we published our
module, that some bright folks might do the knitting for us ;-).

> -------------------------------------------------------------------------
>
> - David Ziegler (dziegler@hookup.net)