Good points!
> This behavior actually opens the door for some optimizations in Python.
> ... the idea is that a symbol is like a string but comparison for equal
> is done in constant time because a symbol is really a pointer. You can
> do this in Python simply by using constants for frequently used string
> vaules:
>
> A = 'a'
> B = 'b'
> C = 'c'
>
> x = A
>
> ...
>
> if x is A:
> process_a()
>
> elif x is not C:
> process_c()
>
>
> The key here is that we wonly assign a symbol one of those other constants
> (A, B, C). Then we can do comparisons using "is" and "is not" instead of a
> more expensive ==, != comparison. This is only possible due to the
> reference sharing nature of strings.
There's more than one way to make this work, and while Python's way isn't
as flexible as some (as you say, you have to assign the symbol to one of
the (A,B,C) constants), it's probably the cheapest (as Mark said later,
some languages (like SNOBOL4) guarantee to store a given string uniquely,
which makes _all_ string equality tests simple pointer comparisons (but at
the cost of storing all strings uniquely!)).
In any case, the technique you sketched deserves to be better known --
one of those neat tricks that gets discovered independently over & over.
People should note that Python itself uses this trick for matching
conditions on "except" blocks: that's precisely why
try: ...
except 'NameError': ...
doesn't work (exception strings on except clauses are compared by object
identity instead of by string value, so you _have_ to use the string
object initially bound to __builtin__.NameError).
> [sjoerd]
> ...
> Guido may come up with other good reasons for not allowing mutable
> strings. My guess is that he just doesn't like them.
I've also gotten the impression that string-crunching algorithms weren't
a particular interest of Guido's when he conceived of Python (else, e.g.,
I expect he would have adapted more ideas from Icon).
> [mark]
> But what I'd still like to know is this: what's the lifetime/scope of
> string objects,
"It depends" <wink>.
> ...
> As amrit@xvt.com pointed out already, you can use 'is' to compare
> strings quickly, in some contexts. But if strings come and go with
> a scope, this can't be used reliably:
Amrit@xvt.com's usage was reliable: study his msg carefully, and mix it
in with the understanding that Python's exceptions work in exactly the
same way.
When I've used this trick, I've created a module of constants like
ATOMtag = 'atom'
IMPLtag = 'impl'
NOTtag = 'not'
etc
and then imported that. Assignments of values to variables _must_ use
the imported symbolic names on the RHS, and ditto for comparisons:
import consts
x = ATOMtag
...
if x is ATOMtag:
...
If you think there's nothing magic about using _strings_ here, you
understand it completely: _any_ objects could have been used.
> you can't be sure a string you input or create in one scope will map to
> the same string object when input/created in the future, in a different
> scope.
Exactly right, and that's why everyone has to use the same string
_objects_ consistently for this to work. And defining them in a module
guarantees they'll "stay in scope" for the life of the program.
> It would seem that the 'is' test would work, if strings are mapped
> globally, and only go away when no longer referenced (if there's no
> more references, 'is' makes no sense anyhow).
While true, Python doesn't do this.
> ...
> This may seem an esoteric issue, but it does come up in practice.
In Amrit@xvt.com's and my applications, and in Python's use for exception
catching, the set of strings is pretty much known _in advance_. For an
essentially dynamic application, I think you'll be happier using (as you
suggest) a dictionary.
> ...
> For example, given:
>
> x = 'abc'
> y = x
> z = 'abc'
> x is z <<-- fails?
>
> of course x and y 'share' the same object, but do 'x' and 'z' refer to
> the same object internally (will 'x is z' work)?
It so happens that, today, this returns true if you execute it in a
single function or module or file, but returns false if you enter it
interactively. The Reference Manual is clear that Python reserves the
right to "share storage" for values of an immutable type, but does not
_guarantee_ to.
> If not, 'is' is mostly useless for symbol comparison.
That's true: the purpose of 'is' is _object_ identity comparison, and
everything (both the parts you do & don't like) follows more-or-less
directly from that.
> def b():
> return 'abc' <<-- make/read a string in a different scope
>
> def a():
> x = 'abc'
> y = b()
> x is y <<-- fails?
Today this will "fail"; tomorrow it may not.
> Am I being just too "Lisp-headed" here?
Na, you're just being curious. That's not against the rules <smile>.
the-python-killed-the-cat?-ly y'rs - tim
Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp