Re: unevaluated sequences of initially indefinite length and 'filter'

Steven D. Majewski (sdm7g@elvis.med.virginia.edu)
Thu, 24 Feb 1994 12:31:50 -0500

On Feb 24, 12:47, Guido.van.Rossum@cwi.nl wrote:
>
> I dug out your mods for sequences of arbitrary length and they set me
> thinking.
>
> (1) Shouldn't map() and reduce() be treated similarly?

YES. I only ran into the problem on 'filter' (so far)
and I didn't have time to go looking for more trouble!
( map is a bit more awkward to do the way I did filter,
because it could have several lists that would need
updated lengths. )

>
> (2) A possibly even more efficient way of doing this, which might work
> for 'for' loops as well, would be to never ask for the length at all
> -- simply keep asking for the next element until it raises IndexError.
>

YES: This would be better for map ( see above ) and would
speed up for loops, taking all of those extra function calls
to len() out of the loop. ( Lack of redundant len() calls
and preallocation of output list is a performance edge that
'map' has over 'for' - at least in the usual known-length
sequence case.) Using IndexError from a (python-defined)
class to break out of a look was discussed once before when
this topic came up ( Re: 'for' and 'range' ), but I didn't
know how to do the equivalent in C. ( The len change was
a pretty obvious local hack, and didn't require understanding
how the rest of the interpreter hangs together. )

> This would in some cases require extra code (e.g. like what you
> discovered about needing to use additem) but sounds like the most
> general solution and even somewhat elegant...
>
[ ... ]
>
> And we would have introduced generators in the language without any
> major surgery!

Although, I thought the "lying-about-its-length" trick was a "fun-hack",
it *does* take a bit of extra thought to program it into each class that
needs it ( and since the interface is the same, but the implementation
is often NOT, inheritance doesn't always help here! ). And I was never
entirely happy with the semantics of the lie - in my first experiments
I used the more obvious notion that 'len()' FORCED the full evaluation
of the sequence, and so I tried to avoid it, until I discoveded that it
was a part of the 'for' implementation. This would make for,map,etc.
all work without the lie, and leave it up to the implementor whether
he wanted len() to mean CURRENT-LENGTH or ABSOLUTE-LENGTH.

- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics