Re: repr question

Guido.van.Rossum@cwi.nl
Tue, 26 Jul 1994 10:41:04 +0200

Jim Roskind:
> >In general, it should be the case (IMHO) that exec'ing the repr string
> >should create an instance of something that (at a minimum) compares
> >equal with the original. The first example requires that the

Jaap Vermeulen:
> I think that repr() was not necessarily designed for recreating
> objects. Although it does so for certain simple built-in objects, it
> doesn't work for a whole slew of other objects (e.g.: sockets, files).
>
> To use a single function for both readability and recreatability (this
> must be a new word... :-) is impossible. I would settle for a function
> that prints a readable description of the object (a la Smalltalk
> 'printString'), and a function that prints a recreatable description of
> the object (a la Smalltalk 'storeString').

Note that there are two functions to convert an object into a string:
repr() and str(). For classes, you can implement them differently
by defining __repr__() and __str__(). The default for __str__() is
__repr__(), the default for __repr__() is the built-in "<ClassName
instance at XXXXXX>". So it's not entirely unreasonable to say that
repr() should yield a string that recreates the object, if possible.

However for general persistency of objects (what this thread is really
about) you need more, e.g. sharing of objects must be represented.

Bill Janssen:
> "marshal.dump" already does this (storeString) for some things. I'd
> like to see it extended so that there's a simple way to have everything
> dump-able and load-able.

Indeed. I am thinking about something like this. Here are some ideas
(thinking aloud):

With every object dumped I also dump its "id" (this is just its
address, but any unique identifier will do). I maintain a hash table
(Python dict) of objects already dumped, and if a request to dump an
object requires me to dump a sub-object that I've already dumped, I
dump a reference to the id instead. (With sub-object I just mean an
object contained in another object, e.g. 1 is a sub-object of
[1,2,3].) This allows me to dump recursive objects (e.g. lists that
have been inserted into themselves), saves space when dumping an
object that contains many references to the same sub-object, and (most
importantly) retain pointer semantics, so that e.g. graphs represented
as objects pointing to each other can be dumped correctly.

I don't know yet what is the best approach to dumping things that
contain references to the OS environment, like sockets, windows, pipes
or open files -- I'll probably punt on these, or create a dummy object
instead. Open files may be recreatable by recording the filename,
mode and seek position, but for the others recreation requires too
much context.

For class instances, a default dumping method will simply dump all the
instance variables. Classes can override this by providing a
__dump__() method (though I'm not sure about the exact interface
here). There will also be a __load__() method. A problem is how to
represent the class to which the instance belongs to. I don't want to
dump the class definition itself -- usually, the receiving end has
already imported the module that contains it anyway, and I feel that
it must be possible to use a different version of the class (as long
as it's backwards compatible; the __load__() method could do
conversion if necessary). (The same applies to functions and modules.)

Representing the class in a form that will enable the receiving end to
find it will be difficult though -- not all classes can be expressed
in the form "modulename.classname", and even this information isn't
readily available (unless I put it in the class definition as a hidden
attribute, I'll have to search the class for a method and then look in
the method's "globals" dictionary...). Possibly the __dump__() method
(or a separate method) will be allowed to override this too, for full
flexibility.

Any bright contributions? Pointers to literature covering persistency
in other languages will be gratefully accepted, but I have a hunch
that most work in this area first requires you to redesign your
language to make their method work :-( I seem to recall that Modula-3
(or was it M2+?) has a concept of "pickled" objects that is similar to
this. I wonder if they ever managed pickling sockets or windows?

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
<URL:http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>