Re: Why don't stringobjects have methods?

Mark Lutz (lutz@KaPRE.COM)
Mon, 4 Apr 94 12:01:57 MDT

> > As amrit@xvt.com pointed out already, you can use 'is' to compare
> > strings quickly, in some contexts. But if strings come and go with
> > a scope, this can't be used reliably:
>
> Amrit@xvt.com's usage was reliable: study his msg carefully, and mix it
> in with the understanding that Python's exceptions work in exactly the
> same way.
>
> When I've used this trick, I've created a module of constants like
> ...

Yes, good stuff;


> In Amrit@xvt.com's and my applications, and in Python's use for exception
> catching, the set of strings is pretty much known _in advance_. For an
> essentially dynamic application, I think you'll be happier using (as you
> suggest) a dictionary.

This was my main point: "Amrit's Trick" is clearly useful, but doesn't work
unless you can statically (at program coding time) determine the set of strings
the application will use. This is almost _never_ the case, in typical symbol
manipulation programs.

For instance, in most AI-type programs, you load a knowledge-base from some
external source, which is full of user-defined symbols you can't predict
(database programs may or may not have a set of symbols defined up-front;
language programs almost never do). To make symbol strings map to the same
place reliably, I'd need to augment the reader to store every token/symbol
in some global dictionary, and use a reference to the stored value, instead
of the original string:

dict = {}

def reader:
global dict
pattern = []
while not <end of input>:
x = <read a symbol string>
if not dict.has_key(x) -- map all 'x' to same string
dict[x] = x -- or a property list/dictionary,..
pattern.append(dict[x]) -- use dict entry for the symbol
return pattern

If I do this, I can use 'is' to avoid lexical comparisons in a pattern
matcher:

def matcher(patt1, patt2)
while patt1 and patt2:
if patt1[0] is patt2[0]: -- use 'is', not '=='
<matched literals>
else:
return 0
patt1, patt2 = patt1[1:], patt2[1:]
return (not patt1) and (not patt2)

def main:
while 1:
x = reader()
y = reader()
print matcher(x, y)

Again, not too much work, but it's extra space, time, and complexity
overhead, which may offset the gain from avoiding lexical comparisons
(the time overhead is incurred at read-time, which may or may not
justify the match-time improvement).

So my point is this: if strings in Python are to be immutable (and so
increase the syntactic complexity of programs which change their values),
why couldn't we also automatically map them globally, so 'is' can _always_
be used for string comparisons (i.e., make each occurrence of a string map
to the same object)?

This would avoid slow lexical scans, and both speed-up and simplify
symbol and string manipulation programs in general. One can use a dictionary,
but why should the language force it, when strings can't be changed anyhow?
Mapping strings globally would be handy, whenever you're interested in
comparing 2 strings [of course, if you need to associate additional
information with the string (like a property list, etc.), you have to
manually map symbols to a dictionary anyhow].

I know, I know; I hate it when people try to change an already-great language
too :-) But this seems a minor, backward-compatible semantic extension; there
may be prohibitive implementation compexities I'm unaware of.

Mark Lutz