Re: Unique string storage (was Re: Why don't stringobjects have methods?)

Mark Lutz (lutz@KaPRE.COM)
Mon, 4 Apr 94 15:01:27 MDT


> Whether that's a win overall depends on how often string (in)equality
> testing is done, compared to how often strings are created. I don't
> know, but suspect that _most_ programs have an unfavorable ratio.

Hard to say; I've found performance tuning in Python difficult, so
it's not clear how much of a win/loss this would be.

> I also worry that if Python were changed to uniquify strings, a future
> implementation of a mutable string class would be awful painful, in that
> how to implement
> mutable_str[i] = 'b'
> would become a real baffler if mutable strings and regular (uniquified)
> strings are to live together happily.
>
> How about trying this? Here's module "symbol":
> ...

Good points.

I think I have to concur with you here. Your 'intern()' function
would do the trick, and we wouldn't be adding yet another extension
to Python (simplicity is a Good Thing). But I have a few
qualifications to that statement:

-- It's usually the case that adding a feature in the interpreter
is orders-of-magnitude faster than implementing it in Python
code. So if it's a generally useful feature (and that's not
my call to make), it might be worth some thought (and a few of
Guido's cycles).

-- Space might be a concern for *large* string/symbol based
programs (but probably not: you'd need a global hash-table
to uniquify [a new word is born! :-)] strings internally
anyhow, and Python already imposes alot of space/time cost).

-- I really meant to imply that I'd like to see (as if my opinion
really mattered anyhow :-) strings either be:

[ mutable XOR uniquified ]

Obviously, these schemes would be mutually exclusive. But it's
been my experience that non-mutable strings are somewhat painful
to manipulate (you have to take them apart and put them back
together). It would be nice if mutable strings actually bought
us something; uniquifying them would (albeit, not in every program,
as you point out).

Mark Lutz