If you store strings in a hash table, you have to delete them, and
rehash/re-store them, to change their value (so later occurrences hash
to the new bucket). Now, if string variables point to the hash-table
entry directly, you're in trouble if you delete it, and rehash it:
there's no (easy) way to find all references to the old value, and
make them point to the new, rehashed table entry. It's not as simple
as just changing a shared array of characters. You could implement
some sort of 'forwarding' indirection scheme, but it's non-trivial.
Storing strings as hash-table entries performs well on space-- there's
just a single copy of a given string, in a given scope's string table.
This is how 'symbols' are stored in lisp/prolog implementations, so
each occurrence of a symbol maps to the same memory object. But it
makes it very difficult to allow strings to be changed 'in-place',
since doing so changes the string's hash key.
Some applications can make use of the fact that all occurrences of a
string map to the same item, but this may not work in Python always,
if strings are put in a local table for a scope, not in a global string
table (since a scope's string table is popped when the scope is exited).
Anyone know for sure? It would be nice to do comparisons using 'is',
instead of lexical equality, when those strings are used as symbols.
BTW: if I remember right, strings worked the same way in SNOBOL, because
they were hashed; you needed to make a new string to change a string value.
[Again, not sure; my SNOBOL exposure was awhile ago...]
Mark L.
lutz@kapre.com