Re: Lazy lists for use with large dictionaries

Guido.van.Rossum@cwi.nl
Wed, 28 Sep 1994 14:49:21 +0100

> This discussion brings to mind another thought:
> maybe python should have a generalized interface to the
> *for* loop (analogous to the generalized indexing).
> Here I could write
>
> for key in Dict: blah
>
> instead of
>
> for key in Dict.keys():
>
> which constructs a list that I don't want or need.
> Using the internal structure of dictionaries python
> could take care of the details for me. I think this
> is what the original poster had in mind.

But why would you not want or need something that's not in the way?
Everybody just assumes that this is inefficient because it would be
inefficient in other languages...

> My personal favorite way to do this is to use a
> walker class associated with each container class,
> which knows the internal representation of each container
> and how to move from one element to the next -- in the
> case of dictionaries the "walking" could be done without
> creating *any* new objects. (I actually use walkers now using
> while loops a lot.) Walkers can also be used outside of
> loops for more general purposes.

I understand the principle, and am thinking of adding it to the
language a a replacement of the internals of the for loop. It would
make it possible e.g. to say "for line in file.lines(): ...", to loop
over dictionaries etc.

However I don't understand how you can do it without creating *any*
new objects -- surely two or more loops over the same sequence must be
possible, so an object holding a pointer to the current item must be
made. I also think dictionaries should know that one or more
iterators are active so they can trap modifications to their key set.

> Suitably generalized this could also allow the user to
> define iterations over any container class:
>
> for Node in Graph: blah
> for Leaf in Btree: blah
> for tuple in SQLquery: blah
>
> Of course this wouldn't be a good idea if it slows down
> the main interpretation loop much.

It wouldn't slow down the interpreter when not used, but I have a
feeling that it can easily make things a lot slower when used --
surely sequences implemented as user-defined classes are already a lot
slower than built-in lists because of the method call overhead, and
adding an iterator on top of that will only cause more calls.

I've a feeling that writing out a graph or tree traversal algorithm
more or less explicitly will be a lot faster than hiding the traversal
state in a user-defined class. (Also note that for general Graph
traversal, each active iterator will have to maintain a table of nodes
already traversed which will grow to the size of the graph, if you
want to allow traversing the same graph multiple times simultaneously
(as in for i in G: for j in G: compare i and j).

As I explained in a previous post for dictionaries, building such
tables on the fly will be a lot less efficient than building it ahead
of time, because on each iteration the state of the construction will
have to be retrieved, etc.

Efficiency in Python is a wonderful thing -- your intuition is usually
wrong.

--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>