Still, if the current approaches are "too slow" for people (they aren't
for my applications -- but then I don't maintain any terabyte databases
under Python either <grin?>), I suppose that's much the same as if the
functionality weren't there at all.
> [jim]
> ...
> Well, I thought about it some more, and realized that you *could*
> probably get away with "lazy uniquification," and not get burnt by the
> temp creation. The idea would be to only unique-ify when someone
> tried to do a compare, but then you would cache the results of your
> efforts :-).
Agreed this is possible, at the cost of giving up the object-identity
approach: it would bust Python's design big-time to go changing object
identity on the fly, so string equality-testing couldn't be done via "a
is b" under lazy uniquification (unless the meaning of "is" were changed
too, and, ... let's leave it at "yeech! phooey!").
> [mark]
> ... I've got a few responses from people claiming that this extension
> would absolutely slow down all Python programs, and so shouldn't be
> explored. I don't wish to be rude, but IMHO the burden-of-proof rests
> on *both* sides of the issue.
Exploring it is fine! The only way to tell for _sure_ is to implement it
and see what happens; & nobody seems to want this enough to take that
simple step <snort>.
> In my experience, Python almost *never* performs the way you think it
> will, after performance tuning.
You bet -- & Guido's been known to say much the same (remember xrange?).
> ... if this extension would be useful for all applications that require
> some sort of symbol table, then it would help open up Python as a tool
> for such applications, if internalizing it performed better than a
> class-based implementation.
Symbol table functionality is already there, so I think this overstates
the benefit. If the question really is one of _fast_ symbol table
functionality (faster than dicts, which aren't too shabby given their
flexibility), I think Don's got a clearer/less-disruptive way to get
there.
> ...
> I have no religious affiliation with Lisp... but my comparison was
> with Lisp 'atoms', not strings.
Right, putative Python "symbols" == Lisp "atoms" in what I was saying;
think we just disconnected there.
> ...
> But wouldn't it be Good Thing if adding a minor extension to the
> language could make Python useful as an AI language too?
A) Despite what you & Don have said, no version of this sounds "minor" to
me. It does seem a lot less work to add a new type than to track down,
think about, and change all the places Python mucks with strings now,
though.
B) "AI language"? Mark, I even have trouble selling wonderful languages
like _Fortran_ to that lisp-crazed community <wink>.
> [don]
> ... I think the way to make attribute access fast enough is to use
> symbols internally. I have not tried profiling Python yet, but I would
> bet that strcmp() is in the top 10%.
Given that apparently none of us has profiled Python, it's fair if I take
the other side, and bet that strcmp typically accounts for little of
Python's runtime:
When doing attribute access on an object with a __dict__, it generally
goes thru a hash table. strcmp isn't involved there until the hash code
hits, and we'll typically have a grand total of one successful strcmp, on
a darned short string ("count", "sort", "num", "val", "n", "append", ...
"number_of_pending_events_for_this_instance" just ain't common <wink>).
Attribute access on a built-in object generally searches thru a short
linear list, doing strcmp until it finds a match. But _most_ strcmps are
unsuccessful, and indeed often return after comparing the first byte (the
compiler may even do that much inline, a neat optimization since a strcmp
usually _does_ fail, & early). Then one short strcmp that succeeds.
Many things do a mixture of these. E.g., a class attribute goes thru a
buried-in-the-code linear list of "special names" that usually fail on
the first byte:
class_getattr(op, name)
...
if (strcmp(name, "__dict__") == 0) { ...; return ...; }
if (strcmp(name, "__bases__") == 0) { ...; return ...; }
if (strcmp(name, "__name__") == 0) { ...; return ...; }
and failing all those goes on to look in the class's dict.
Instance attributes often go thru most of this stuff twice (once in its
own list of special names & its own dict, then once more for its base
class).
I take four morals from all this:
1) strcmp isn't an overwhelming part of it.
2) Profiling is a useless approach for guessing how much time is spent on
"attribute access", cuz the code is spread out all over, and parts of
it involve the hashing and dict-lookup code that are used for many
purposes besides attribute access.
3) Attribute-access code _is_ spread out now, and follows a mixture of
several approaches (dicts, linear lists in structs, linear lists in
code, deep searches). So it will be a Real Project to change it.
4) Attribute access is in fact expensive. But as Mark asks, who
knows how expensive _compared to_ everything else that's going on?
> ...
> For example, in 'obj.attr' attr could be interned during compilation.
Could and probably should! I think it's a major project, though.
> ... [other good stuff] ...
> ... [and a good summary from mark again] ...
> ...
> 5) It's probably not much slower than an internal implementation:
> fetching class attributes does the same stuff, roughly.
If only "roughly", then what are the differences? I see them as doing
exactly the same thing (i.e., mapping a symbol to an arbitrary value).
> The only real drawbacks I can see:
>
> 1) Class attribute fetching is currently slow. but so is...
Oh, "slow" compared to what <0.8 grin>? Note that Python _lets_ you get
away with stuff like
for x, y in key_value_pairs:
Class_Name.__dict__[x] = y
now, so the implementation is supporting an incredible amount of dynamic
flexibility. Probably shut down the University of Virginia for a year if
any part of it changes <grin>.
> 2) It probably would break if Guido ever decides to do static
> analysis of class attribute references to speed access (index
> into a table, instead of hashing into a dictionary, like what
> was done for local function variables).
>
> This last point is a can-of-worms I'd rather not open...
I would! If you're going after speed, never stop half way <0.5 grin>.
fortune-favors-the-foolish-ly y'rs - tim
Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp