Re: Pickling classes

Jim Roskind (jar@iapp201.mcom.com)
Thu, 20 Apr 1995 13:22:38 -0700

> Sender: python-list-request@cwi.nl
> From: mufti@cis.ohio-state.edu (saad mufti)
> Date: 20 Apr 1995 13:41:05 -0400
>
> I need to pickle a class instance object that contains a reference to
> another class instance object of a different class imported from a
> different module.
>
> Is this possible using the pickle module? I wasn't sure after reading
> the documentation for the pickle module. It talks about the class
> needing to be defined at the top-level in a module. What exactly does
> this mean?

I *think* that Guido took the same tack as we did with our persistent
store at InfoSeek. When the object is "reincarnated" it must be
possible to construct an identically typed object, hence it must be
possible to refer to the class in question (at some future point). To
achieve this, the restriction is placed that the associated class must
be defined at the "top-level," which might also have been said (less
correctly, but perchance more meaningfully to C programmers) at the
"file scope." To pickle (or save persistently) an instance of an
object, it must be possible to to refer to the associated class via
module_name.class_name, so that this info can be socked away in the
pickled package (i.e., persistent representation).

It might be most simple to give an example:

class top_level_class:
pass

def foo():
class local_fun_class:
pass
return local_fun_class()

tl_i = top_level_class() # can be pickled away
hidden_i = foo() # can't be pickled (I think)

The hassle with reincarnating hidden_i comes when you note that all
you can tell is that the *name* of the class is local_fun_class, but
you have no way of directly referring to this class (in general) based
solely on its name. In fact, given the above example, it would be
impossible to re-incarnate the instance if someone else had already
called foo() in the re-incarnation process :-(. (The subtly in the
above code is that a brand-new and distinct class is created each time
foo() is run! For example:

Python 1.2 (Apr 11 1995) [GCC 2.6.3]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def foo():
... class a:
... pass
... return a()
...
>>> foo()
<a instance at 4000e660>
>>> foo()
<a instance at 4000e710>
>>> a = foo()
>>> b = foo()
>>> a.__class__
<class a at 40017348>
>>> b.__class__
<class a at 40017428>

Notice that the two classes have the same names, but have distinct
ids, and hence are *different* classes :-(.

There are also hassles with stuff like:

class outer:
class inner:
pass
def __init__(self):
self.i = outer.inner()

Notice how hard it is for the unpickler (reincarnation program) to
talk about the class "inner". All that is really known about the
class is that its name is "inner," but since it is not defined at the
top-level (file scope), there is no general way to refer to it. Note
that even if you tried to fix the above situation with:

class outer:
class inner:
pass
def __init__(self):
self.i = outer.inner()

easy_way_to_refer_to_inner = outer.inner

The pickler would have *no* idea about the existence of the "short
cut" reference to the class. Fundamentally, given an instance, all
you can get is info about where that class was defined, and what is
was called when it was *defined*. (such as the "class name" and (with
a little work) the module it was defined in).

Bottom line:

Be nice to your pickler: Define classes at file scope if you want them
to persist.

Jim

-- 
Jim Roskind						voice: 415.528.2546
jar@netscape.com					fax:   415.528.4133
----------------------------------------------------------------------------
PGP 2.6.2 Key fingerprint =  0E 2A B2 35 01 9B 5C 58  2D 52 05 9A 3D 9B 84 DB
To get my PGP 2.6 Public Key, "finger -l jar@infoseek.com | pgp -kaf"