Re: Why don't stringobjects have methods?

Mark Lutz (lutz@KaPRE.COM)
Fri, 1 Apr 94 17:36:39 MST


> On Fri, Apr 1 1994 Mark Lutz wrote:
>
> > I don't have the Python source code with me, so I could be dead-wrong
> > about this, but I suspect there may be some implementation reasons
> > for making strings immutable. (Someone correct me if I'm wrong.)
> >
> > If you store strings in a hash table, you have to delete them, and
> > rehash/re-store them, to change their value (so later occurrences hash
> > to the new bucket).
>
> Strings are not stored in a hash table. The implementation of all
> Python objects (including strings) consist of a header plus data. For
> mutable objects (lists, dictionaries) the data is pointed at by a
> pointer in the header; for immutable objects (strings, tuples, numeric
> types), the data is part of the structure of which the header is the
> start. During the lifetime of an object the address of the header
> does not change. (This header contains information such as the type
> of the object and the reference count.)
>
> Internally, references to objects only contain the pointer to the
> header. Since the header's address does not change during the
> object's lifetime, objects don't have to be chased when they are
> modified.

Ok, good. (open mouth, insert foot... :-)

But what I'd still like to know is this: what's the lifetime/scope of
string objects, or equivalently, where do they live?

As amrit@xvt.com pointed out already, you can use 'is' to compare
strings quickly, in some contexts. But if strings come and go with
a scope, this can't be used reliably: you can't be sure a string
you input or create in one scope will map to the same string
object when input/created in the future, in a different scope.

It would seem that the 'is' test would work, if strings are mapped
globally, and only go away when no longer referenced (if there's no
more references, 'is' makes no sense anyhow). But if strings are
mapped (hashed, whatever) and/or stored locally in a scope, or if they
are not mapped at all (each creation of a string "xyz" makes a new
string object), then you are out-of-luck with 'is', unless you can
determine all your symbols statically (not usually possible).

This may seem an esoteric issue, but it does come up in practice. In a
symbol manipulation program, it would be very handy to assume that all
strings map to the same object always, for the entire course of the
program run. If so, you can essentially use python strings like Lisp
'atoms' (which means alot to AI folks). If not, you need to manually
manage a symbol table (which isn't too bad, given python's dictionaries,
but is still extra work).

For example, given:

x = 'abc'
y = x
z = 'abc'
x is z <<-- fails?

of course x and y 'share' the same object, but do 'x' and 'z' refer to
the same object internally (will 'x is z' work)? If not, 'is' is mostly
useless for symbol comparison. (Again, I don't have access to Python to
test this at the moment.)

Similarly,

def b():
return 'abc' <<-- make/read a string in a different scope

def a():
x = 'abc'
y = b()
x is y <<-- fails?

Am I being just too "Lisp-headed" here?

Mark L.