Re: Unique string storage (was Re: Why don't stringobjects have methods?)

Mark Lutz (lutz@KaPRE.COM)
Mon, 4 Apr 94 15:49:34 MDT

> SNOBOL4 had a quite different reason for doing this: that language (like
> tcl) allowed any string to be a variable name (e.g., <blank><tab><return>),
> and wanted to store strings uniquely specifically to support that strange
> language feature.

Right; it stored the variable's value under the string name. This
isn't much different than Lisp's "property lists", which allow the
user to associate lists of named things with a symbol.

In fact, in most symbol-manipulation programs, you wind up storing
away data with the symbol/string, which is much more than just doing
symbol equality tests, as you mention. It's really built-in symbol
tables.

So here's a crazy idea: if (and ONLY if) we'd uniquify strings (and
keep them immutable), we could support the idea of data associated with
a string, via a built-in string method:

string.getprop("key")
string.putprop("key", value)

Internally, this might be implemented as a simple Python dictionary,
attached to each string entry in a hash-table, created on demand.

Now, like everything (in a theoretical sense), this could easily be
coded in Python, as an extension to your 'intern()' function:

Module Symbol.py---

class Symbol(str):
def __init__(self):
self.data = intern(str)
def __repr__(self):
return self.getprop('name')
def getprop(self, key):
try:
return self.data[key]
except KeyError:
return None
def putprop(self, key, value):
self.data[key] = value


_tab = {}

def intern(str):
try:
return _tab[str]
except KeyError: /* a new dictionary object */
_tab[str] = {'name':str} /* add a name property first */
return _tab[str] /* dictionary is shared */

Example use---

x = Symbol('abc')
y = Symbol('a' + 'bc')
x is y -> yes: same dictionary/property-list
print y -> 'abc'
y.getprop('name') -> 'abc'
x.putprop('planet', 'mars')
x.getprop('planet') -> 'mars'
y.putprop('value', 99)

which is all well and good, but will probably be *much* slower than
an internal implementation in the interpreter itself, using strings.
Then we could do:

x = 'abc' -- or read a string/token
x.putprop('type', 'int')
x.getprop('type')

If we implemented uniquified strings with property lists, Python would
have _built-in_ support for general 'symbol tables'. Given the prevalence
of programs that use this sort of structure (languages, AI, database,....),
it's worth looking at. It might attract alot of old Lisp hackers, anyhow.

Again, this is a bit "off-in-left-field"; your mileage may vary...

Mark Lutz