Re: Unique string storage

Mark Lutz (lutz@KaPRE.COM)
Wed, 6 Apr 94 10:04:32 MDT

> Still, if the current approaches are "too slow" for people (they aren't
> for my applications -- but then I don't maintain any terabyte databases
> under Python either <grin?>), I suppose that's much the same as if the
> functionality weren't there at all.

See the 'Holmes' expert system I wrote in Python, in the 'demo2'
directory Guido's putting together. Speed does matter some times
(but granted, probably not very often in 'typical' python uses).


> > In my experience, Python almost *never* performs the way you think it
> > will, after performance tuning.
>
> You bet -- & Guido's been known to say much the same (remember xrange?).

Yes; I was one of the 'culprits' behind the idea (amrit@xvt.com actually
implemented it). xrange() still does better on space [One of the first
things I do with a language is see how long a massive loop runs (say,
for 1..1000000); this hung a Sparc station using vanilla 'range()'...]


> Symbol table functionality is already there, so I think this overstates
> the benefit. If the question really is one of _fast_ symbol table
> functionality (faster than dicts, which aren't too shabby given their
> flexibility), I think Don's got a clearer/less-disruptive way to get
> there.

Granted; as I suggested, all we really need is a standard 'intern()'.
Right, it's already here, but it helps alot if a feature is supported
'out-of-the-box', so new users can use it without knowing about classes,
dictionaries, name spaces, etc.

> > But wouldn't it be Good Thing if adding a minor extension to the
> > language could make Python useful as an AI language too?
>
> A) Despite what you & Don have said, no version of this sounds "minor" to
> me. It does seem a lot less work to add a new type than to track down,
> think about, and change all the places Python mucks with strings now,
> though.

Agreed;


> B) "AI language"? Mark, I even have trouble selling wonderful languages
> like _Fortran_ to that lisp-crazed community <wink>.

Well, I've at times been a part of that community (I used to dabble
in Prolog implementations), and I would suggest that Python can be an
excellent symbol-processing language. I've used it successfully for
a fairly-big expert system (again, see the Holmes demo).

Python's already got most of what you need. For example,
we've got Lisp and Prolog lists/cons-cells as tuple-trees:

(car, cdr) <<- cons(car, cdr), [car|cdr]

or via the Cons-cell linked list module amrit@xvt.com posted,
we've got Lisp's property lists as class atrributes:

x = intern(<string>)
x.name <<- getprop('x, 'name)
x.type = 'int' <<- putprop(`x, `type, `int)

and we've got the dynamic/interactive/incremental nature of languages
like Lisp and Prolog (you can construct and execute code, you can access
system data structures, you can test functions in isolation, ...). IMHO,
Python's a good balance between procedural and more exotic languages, and
so would appeal to AI types frustrated with other AI tools (it happens...).

So, as an occasional "AI geek", I'm already sold on Python. But then
I'm probably not a typical Lisp zealot either (I prefer Prolog, and
actually <blush> worked on a Fortran compiler once :-). My push for
'symbols' is based on my belief that Python could appeal to other AI
folks, so long as symbol-processing is built-in, and fast enough to use.
And since strings were immutable anyhow... (ok, not again :-)


> 2) Profiling is a useless approach for guessing how much time is spent on
> "attribute access", cuz the code is spread out all over, and parts of
> it involve the hashing and dict-lookup code that are used for many
> purposes besides attribute access.

Good point; timings from a set of typical programs would be better.

> > 5) It's probably not much slower than an internal implementation:
> > fetching class attributes does the same stuff, roughly.
>
> If only "roughly", then what are the differences? I see them as doing
> exactly the same thing (i.e., mapping a symbol to an arbitrary value).

Granted;

> > 1) Class attribute fetching is currently slow. but so is...
>
> Oh, "slow" compared to what <0.8 grin>? Note that Python _lets_ you get
> away with stuff like
>
> for x, y in key_value_pairs:
> Class_Name.__dict__[x] = y
>
> now, so the implementation is supporting an incredible amount of dynamic
> flexibility. Probably shut down the University of Virginia for a year if
> any part of it changes <grin>.

Right; Guido's got to walk a fine-line between flexibility and
efficiency. It's probably too late to move towards the latter,
and I'm not sure doing so we be in the spirit of the language.

> > 2) It probably would break if Guido ever decides to do static
> > analysis of class attribute references to speed access (index
> > into a table, instead of hashing into a dictionary, like what
> > was done for local function variables).
> >
> > This last point is a can-of-worms I'd rather not open...
>
> I would! If you're going after speed, never stop half way <0.5 grin>.

Same point; it's probably too late to add full static analysis of class
attribute references to the language. But we might be able to just 'cache'
references to statically-known attributes (and use indexed access), and
fall back on the current dynamic look-up logic for others. That would
speed up "normal" class use, but slow down programs that create attributes
dynamically but probably not very much (at least they'd still work).

Mark Lutz