Re: Lazy lists for use with large dictionaries

Guido.van.Rossum@cwi.nl
Wed, 28 Sep 1994 10:55:16 +0100

> The obvious way (and the only one I'm currently aware of) is to have
> an integer variable ranging from 0 to len(dict.keys()), set by rewind() to
> 0 and incremented by next() but that has the effect that the
> whole list of keys is actually computed. I haven't looked into the code
> but I doubt that dict.keys() returns such a thing like a lazy list, i.e.
> the equivalent to the sequence object returned by xrange.
>
> Perhaps all this isn't a problem because the list code is fast enough and
> memory isn't an issue anymore ;), or is it?

Have you tried? dict.keys() makes one fast scan over the dictionary
(internally, the iteration you want *does* exist) copying the pointers
to the key objects into a pre-allocated list object of the right size.
The iteration time isn't lost (since you'll have to iterate anyway --
unless in the majority of cases your loop terminates very prematurely
(which I doubt since you're getting the keys in random order). Since
the keys aren't copied (only the pointers are) the only cost is 4000
to 40,000 bytes. For the list holding the pointers (assuming a
pointer fits in 4 bytes). That's only a small fraction of the memory
occupied by the dictionary itself (I think I once calculated that
unless your keys and values are minute or very heavily shared it's
less than 5%).

Does this make sense?

(This is actually the second time someone asks this -- maybe I should
add it to the FAQ.)

--Guido van Rossum, CWI, Amsterdam <mailto:Guido.van.Rossum@cwi.nl>
<http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>