Re: Unique string storage

Jim Roskind (jar@infoseek.com)
Tue, 5 Apr 1994 00:01:10 +0800

> Date: Tue, 05 Apr 94 01:16:21 -0400
> From: Tim Peters <tim@ksr.com>
> Sender: python-list-request@cwi.nl
>
> I can't claim that my programs are typical, but after looking them over I
> didn't find any that I think uniquifying strings would speed up, and
> found many I'm sure would slow down. Jim mentioned short-lived temps
> during string construction as one source of special pain, & I'll add to
> that the billions & billions (well ...) of lines of Python that crunch
> over files.

I take back my first (posted) thought on this topic...

Well, I thought about it some more, and realized that you *could*
probably get away with "lazy uniquification," and not get burnt by the
temp creation. The idea would be to only unique-ify when someone
tried to do a compare, but then you would cache the results of your
efforts :-). When I started looking at it this way, I suddenly
realized what was going on, and I had what seemed like a good question
to ask:

Why do you want Python to do this for you?

The two reasons I came up with are to a) increase speed of string
compare operations; and b) decrease storage of duplicate strings.

With that answer I realized a very fundamental question (which I like
to ask whenever an optimization is suggested):

What is the provable benefit that you will derive from such a
optimization?

This gets back to the basic question of how to optimize: Only optimize
the sections where the time is being spent :-). Since you can almost
never guess where this place is, use a profiler :-o

Hence instead of having Tim scan his code to see that this activity
would not speed up his code, I think the burden of proof falls on the
proponent to show at least some applications where this change would
make a significant positive difference (gut feel doesn't count,
profile data does). The more I thought about it, the more I came to
expect that on most machine architectures, a string compare operation
really flies. Hence I'm a bit doubtful that you'll really save that
much by optimizing away the string compares. If you have an
application where this is *really* all you are doing, then I'd bet
that lex/flex was really the sort of tool you were looking for, and
even the unique-ification stuff won't help you.

Then again, maybe you're after the space savings, hmmmmmmm.... I guess
I'm not on a DOS box today, and I'm less sympathetic ;-) If this if
what you're after, you'd also have to show a sample app that really
filled up memory with strings (for a good reason!). I'd then note
that what you're really after is a space time trade-off
(unique-ification takes time, but will save memory), and you'd have a
harder time selling us a slower python so that that it could shoe-horn
itself into some minimal amount of RAM.

Jim

Jim Roskind
408-982-4469
jar@infoseek.com