Someone from XVT donated a range object implementation (thanks!) and I
put it in under a different name ("xrange"). Results: a for loop with
an empty body ("pass") over range(100000), which actually creates a
list and 100,000 integer objects, takes about 3.13 seconds on my
machine (fastest of 3 runs) while the same loop using xrange(100000)
takes 3.07 (best of 3 in both cases). That is a speed-up of less
than 2 percent!
This was on an SGI Indigo with a 50 MHZ MIPS R4000 CPU.
On a Sun ELC, the results are even weirder: the loop using xrange()
takes 17.5 secs, while using range() it takes only 12.8 seconds!!!
A possible explanation for this unexpectged behavior might be
instruction cache size on the Sparc chip: creating 100,000 integer
objects in a tight loop (as range() does) may execute entirely out of
the cache and thus be much faster than creating one object on each
iteration through the for loop, where many other lines of C code are
hit in between...
Note that even though this creates 100,000 integer objects, it does
not make 100,000 calls to malloc() -- the allocation of integers is
already optimized (to save 25% on space). The measurements were
actually done using a version of the interpreter with some other
optimizations (such as keeping a cache of small integers) but this
shouldn't affect the measurements much.
My personal conclusion: it's not worth optimizing range() at all...
--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
"Meten is weten"