Re: Some things I'd like

Guido.van.Rossum@cwi.nl
Tue, 16 Aug 1994 13:33:06 +0200

> However, I think it would really be nice if one could loop over elements
> of a mapping without having to construct a sequence first. One way to do
> it (with modification of Python's runtime system) would be to make "for"
> call the iteration object with a method __cursor__(), and then call what
> this method returned with __item__(n) for all numbers from zero upwards.
> The end could be signalled by an exception. Objects which already have a
> __item__ method (like sequences) would work by just returning themselves
> as __cursor__(), whereas other iterations could easily be programmed, by
> providing a suitable cursor which manages whatever internal state it may
> need to walk over the original object. Allocating everything as a list -
> for example a big database - isn't a very attractive solution, IMO.

Unfortunately the implementation of dictionaries makes this unsafe if
it is modified during the loop: if the modification happens to cause a
reorganization of the dictionary, the state kept for the __cursor__()
method (I presume this would be an index into the internal hash table)
would stop making sense.

I also did a little timing test -- creating the keys() list of a
dictionary with 100,000 items takes a negligeable amount of time
(less than 3%) compared to accessing all items.

The memory used up for the keys() list is also small compared to that
used for the hash table: assuming a hash table which is 2/3 full,
4-byte object pointers, a hash table for 100,000 elements uses 1.2
Mbyte, while the list is 0.4 Mbyte. The actual data will be a fair
multiple of this -- assuming the keys are 5-char strings, each string
object uses 24 bytes (*), or another 2.4 Mbyte (for the keys alone --
some applications might share most values so I won't count them).
This would make the space used up by the keys() list about 11% of the
space used by the dictionary. Not negligeable, but this is using
fairly minimal assumptions -- if you use the dictionary to store
100,000 different values that are 10-byte strings it is already
reduced to 5%.

(*) The object header for a string object looks like this on a typical
32-bit machine:

#bytes description
4 type pointer
4 reference count
4 string length (N)
N+1 string data (rounded up to multiple of 4)

For a 5 byte string this amounts to 20 bytes; add to this 4 bytes of
overhead for malloc() and we have 24 bytes.

Conclusion: don't worry about the cost of using keys() to iterate over
a dictionary unless you are on a memory-starved machine.

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
<URL:http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>