Re: coroutines and continuations ( in Python? - long discussion )

Tim Peters (tim@ksr.com)
Tue, 03 May 94 00:47:10 -0400

Thanks S & G for the interesting discussion! Will stick to the easist
ones in Guido's msg for now:

> ...
> 1) Restartable functions are really "generators" in disguise.

Yes, as I understand the word, Tim's resfuncs are one way to implement
generators. But I don't know precisely what "generators" means to
anyone on any given day, so used a different name, whose meaning I could
control.

> ...
> 3) Tim proposes a multi-stage interface: first you wrap a function,
> creating an object that stands for the sequence you have in mind
> (e.g. the integers from 5); then you invoke the object to give you a
> sequence generator; finally you call the generator repeatedly to give
> you the successive items.

Wasn't proposing that, just showed code I have that actually _does_ that.
In its original context, that scheme makes good sense. Fully agree that,
when such an elaborate scheme is needed, it's better to build it out of
simpler features (provided suitable ones exist to build it from <wink>).

> Although the cloning can occasionally be useful,

In context, it was essential. Objects had two roles, a passive one in
which they recorded time-invariant information about their purpose in
life, and an active role in which they applied that information to a
time-variant backtracking search. The objects appear in large graph
structures, so sharing objects was hard to stop (nor would I have wanted
to), and in their _active_ roles each "invocation" of an object needed
its own execution environment. No amount of static duplication could
achieve that, because an object's _active_ state could cause any number
of indirect (recursive) invocations of itself.

> I would prefer to tackle such a feature separately, and concentrate on
> generators right now.

That is the key problem! "Cloning" is easily implemented when needed.

> Using an object-oriented approach, maintaining the current state as
> data member(s) of self greatly simplifies the implementation, e.g.:
>
> class IntsFrom:
> def __init__(self, n):
> self.n = n
> def next(self):
> n = self.n
> self.n = n + 1
> return n

So long as we stick to trivial examples, it is indeed tempting to think
that's good enough <0.9 grin>. As a slightly non-trivial example, note
that this implementation of IntsFrom would not work correctly in the
runnable "pipe" example I gave: an instance can be shared in its
_passive_ state (merely recording that someday it may be asked to
generate integers starting from n) fine, but sharing the _active_ state
(actually generating integers) is disastrous.

In simple examples, or simple uses, it's reasonable to tell a user "don't
do anything that will break the implementation" <wink>; in complex
settings that advice is just impossible to follow. I didn't wind up with
all the "_wrap" and "_work" machinery because I thought it was clear <grin>.

A more OO-like alternative would have been for "passive" objects to
create "active" objects on the fly, as needed, from a parallel or derived
hierarchy. Tried those too, & just didn't like 'em.

> ...
> 5) Likewise I am horrified by the idea that there may somehow be a
> possibility for a caller to make my function continue after it
> returns.

Ditto -- introducing "suspend" isn't _logically_ necessary, but the
design sucks big-time without it (the notion _was_ that, given "suspend",
a "return" would mark the resfunc as unresumable).

> ....
> 6) All this notwithstanding, I can sympathesize with the difficulty of
> having to "save the program counter" in certain generators (not
> intsfrom()).

Forget intsfrom already <grin>. The real pain indeed comes in examples
like the pi.py one, with nested loops, or other non-trivial control flow.

> ...
> 7) Here's a proposal for a possible interface which I think can be
> implemented with not too many changes to the interpreter's internal
> machinery:
>
> There's a built-in Generator class (or type or whatever fits the
> bill) with get() and put() methods. To create a Generator instance,
> you pass the class a function and an argument list for the function.
> The function could be something like pi() above except that it is
> passed the generator instance as additional argument. We could
> adapt the example above like this:
>
> def pi(g):
> ...
> while 1:
> ...
> while ...:
> ==> g.put(int(d))
> ...
>
> The user would use it as follows:
>
> g = Generator(pi, ()) # pass pi an empty argument list
> while 1:
> ==> d = g.get()
> print int(d),
>
> Each call to g.get() resumes pi() until it calls g.put(); the
> argument to g.put() is returned by g.get(). When the function
> returns, get() raises an exception (we could use None but I would
> prefer not to disallow for sequences containing None).

This is fine. Replace "Generator" with "resfunc", "put" with "suspend",
and "g=Generator(func, args); ... g.get()" with "g = resfunc(f, args); ...
g()", and it looks pretty familiar <grin>. Seriously, the functionality
is there, and the spelling is pleasant.

> 8) An implementation of this interface using threads is possible in
> today's Python (if you have threads), as follows:

Very pretty, Guido! It's enough to make me finally bring up the threads
package (busman's holiday ...).

> class Generator:
> def __init__(self, func, args):
> self.getlock = thread.allocate_lock()
> self.putlock = thread.allocate_lock()
> self.getlock.acquire()
> self.putlock.acquire()
> self.done = 0
> thread.start_new_thread(self.start, (func, args))

If, before that, you add
self.func = func
self.args = args

then you can also have

def clone(self): # return fresh version of same generator
return Generator(self.func, self.args)

> def start(self, func, args):
> # Set the wheels in proper motion
> try:
> self.putlock.acquire() # Wait for first get()
> apply(func, args + (self,))

Believe that line would be better as

apply(func, (self,) + args)

I.e., the called function surely knows it needs a Generator argument, but
other arguments may be optional.

> finally:
> self.done = 1
> self.getlock.release()
> def get(self):
> self.putlock.release() # Resume producer thread
> self.getlock.acquire() # Wait for produced value
> if self.done:
> raise EOFError # Sequence finished
> return self.value
> def put(self, value):
> self.value = value
> self.getlock.release() # Resume consumer thread
> self.putlock.acquire() # Wait for next get() call
>
> 9) This is probably not an ideal implementation (threads have more
> power and consequential overhead than necessary to implement
> coroutines),

The worst aspect is that if the caller doesn't run the generator to
completion (and it can't if the generator is modeling an infinite
sequence, and it merely won't if the caller is doing a search for a
sequence element that meets some condition -- both are common uses), the
spawned thread will hang forever at a putlock acquire; unless I'm
mistaken, the thread won't go away even if the generator is garbage-
collected. OS's are improving, but a few thousand stray threads still
give most of 'em indigestion <wink>.

Perhaps adding "self.die_early = 0" in __init__, introducing

def __del__(self):
self.die_early = 1
self.putlock.release() # let the spawned thread see it's time to die
self.getlock.acquire() # wait for it to see the flag before nuking me

adding

if self.die_early:
raise 'DieEarly' # abort thread, to the "finally" block in "start"

at the end of "put" (the point of using a string literal on the raise is
to minimize the chance that something in func will catch it, although a
perverse func could still do "try: g.put(x); except: pass" -- hmm!)

and changing start's "try:" block to

try:
self.putlock.acquire()
if not self.die_early:
apply(func, (self,) + args)

would usually suffice.

> but as a proof of concept and as a tool to prototype the interface it
> is good enough.

Absolutely!

> The pi example is entirely compute bound (almost all time is spent in
> long integer arithmetic) so it doesn't matter so much here anyway.

My experience is that real-life generators usually do little work per
invocation; their real value is that they're an achingly _natural_ way to
approach some otherwise-daunting problems. The pi example isn't horribly
atypical, though. So, as usual, every conceivable use is vital to
someone <grin/sigh>.

> 10) I have also thought of an implementation in ceval.c. This would
> require an extension to eval_code() to execute "continuations".

I saw Steve's eyes light up all the way from Boston <wink>. I'll study
this one later -- outta time.

> ...
> 11) It should be possible that g.put() be called not just by the
> generating function (pi() in our example) but also by any function
> that it calls. ... [various complications and irregularities]

I don't have a clear use in mind for this capability -- do you? Not to
say I wouldn't stumble into one if it was there.

BTW, I looked up Scheme's continuation gimmicks, and confess my head's
still spinning too fast to think straight.

> 12) I forgot what I wanted to say here :-)

Ditto my response.

> 13) It is a matter of taste whether to pass the generator instance or
> its put() method to the generating function. I like to pass the
> generator instance, so that if in a future release generators get
> additional methods or data attributes, these may be be used/inspected
> by the generating function.

Sorry, but you're dead wrong about this one: your preference is
objectively correct, so it's not a matter of taste <smile>.

OTOH, how to deal with "get()" is more troublesome. If a "consumer"
function is invoked via

g = Generator(producer, args)
consumer(g)

then it's likely to be littered with g.get(), and this stops you from
invoking it with (say)

consumer(stdin.readline)

So I'd favor calling via consumer(g.get). The similarity between
generators and file input is well worth exploiting! For that reason I
liked overloading EOFError with "generator exhausted" too.

> 14) I'm sure Steve can use this interface to reach new heights of
> obfuscated Python -- how about generators returning generators...

Well, _recursive_ generators are natural as heck -- and pretty darn
obscure until the light clicks on <wink>. E.g.

g = Generator(powerset, ([1,2,3],))
while 1:
next = g.get()
print next,
if next is None: break
print

should print
[], [1], [2], [1,2], [3], [1,3], [2,3], [1,2,3], None

def powerset(g, set):
if len(set) == 0:
g.put( [] )
g.put( None )
first, rest = set[0], set[1:]
h = Generator(powerset, (rest,))
while 1:
next = h.get()
g.put( next )
g.put( [first] + next )

Will leave it as a disgusting exercise for the reader why this works even
though nothing checks for a None return inside powerset's "while" loop.

steve's-turn<grin>-ly y'rs - tim the obscure

Tim Peters tim@ksr.com
not speaking for Kendall Square Research Corp