PEP:	521
Title:	Managing global context via 'with' blocks in generators and coroutines
Version:	c244d09c7874
Last-Modified:	2016-06-09 12:08:48 +0300 (Thu, 09 Jun 2016)
Author:	Nathaniel J. Smith <njs at pobox.com>
Status:	Deferred
Type:	Standards Track
Content-Type:	text/x-rst
Created:	27-Apr-2015
Python-Version:	3.6
Post-History:	29-Apr-2015

Contents

Abstract

While we generally try to avoid global state when possible, there nonetheless exist a number of situations where it is agreed to be the best approach. In Python, the standard way of handling such cases is to store the global state in global or thread-local storage, and then use with blocks to limit modifications of this global state to a single dynamic scope. Examples where this pattern is used include the standard library's warnings.catch_warnings and decimal.localcontext, NumPy's numpy.errstate (which exposes the error-handling settings provided by the IEEE 754 floating point standard), and the handling of logging context or HTTP request context in many server application frameworks.

However, there is currently no ergonomic way to manage such local changes to global state when writing a generator or coroutine. For example, this code:

def f():
    with warnings.catch_warnings():
        for x in g():
            yield x

may or may not successfully catch warnings raised by g(), and may or may not inadverdantly swallow warnings triggered elsewhere in the code. The context manager, which was intended to apply only to f and its callees, ends up having a dynamic scope that encompasses arbitrary and unpredictable parts of its callers. This problem becomes particularly acute when writing asynchronous code, where essentially all functions become coroutines.

Here, we propose to solve this problem by notifying context managers whenever execution is suspended or resumed within their scope, allowing them to restrict their effects appropriately.

Specification

Two new, optional, methods are added to the context manager protocol: __suspend__ and __resume__. If present, these methods will be called whenever a frame's execution is suspended or resumed from within the context of the with block.

More formally, consider the following code:

with EXPR as VAR:
    PARTIAL-BLOCK-1
    f((yield foo))
    PARTIAL-BLOCK-2

Currently this is equivalent to the following code copied from PEP 343:

mgr = (EXPR)
exit = type(mgr).__exit__  # Not calling it yet
value = type(mgr).__enter__(mgr)
exc = True
try:
    try:
        VAR = value  # Only if "as VAR" is present
        PARTIAL-BLOCK-1
        f((yield foo))
        PARTIAL-BLOCK-2
    except:
        exc = False
        if not exit(mgr, *sys.exc_info()):
            raise
finally:
    if exc:
        exit(mgr, None, None, None)

This PEP proposes to modify with block handling to instead become:

mgr = (EXPR)
exit = type(mgr).__exit__  # Not calling it yet
### --- NEW STUFF ---
if the_block_contains_yield_points:  # known statically at compile time
    suspend = getattr(type(mgr), "__suspend__", lambda: None)
    resume = getattr(type(mgr), "__resume__", lambda: None)
### --- END OF NEW STUFF ---
value = type(mgr).__enter__(mgr)
exc = True
try:
    try:
        VAR = value  # Only if "as VAR" is present
        PARTIAL-BLOCK-1
        ### --- NEW STUFF ---
        suspend()
        tmp = yield foo
        resume()
        f(tmp)
        ### --- END OF NEW STUFF ---
        PARTIAL-BLOCK-2
    except:
        exc = False
        if not exit(mgr, *sys.exc_info()):
            raise
finally:
    if exc:
        exit(mgr, None, None, None)

Analogous suspend/resume calls are also wrapped around the yield points embedded inside the yield from, await, async with, and async for constructs.

Nested blocks

Given this code:

def f():
    with OUTER:
        with INNER:
            yield VALUE

then we perform the following operations in the following sequence:

INNER.__suspend__()
OUTER.__suspend__()
yield VALUE
OUTER.__resume__()
INNER.__resume__()

Note that this ensures that the following is a valid refactoring:

def f():
    with OUTER:
        yield from g()

def g():
    with INNER
        yield VALUE

Similarly, with statements with multiple context managers suspend from right to left, and resume from left to right.

Other changes

__suspend__ and __resume__ methods are added to warnings.catch_warnings and decimal.localcontext.

Rationale

In the abstract, we gave an example of plausible but incorrect code:

def f():
    with warnings.catch_warnings():
        for x in g():
            yield x

To make this correct in current Python, we need to instead write something like:

def f():
    with warnings.catch_warnings():
        it = iter(g())
    while True:
        with warnings.catch_warnings():
            try:
                x = next(it)
            except StopIteration:
                break
        yield x

OTOH, if this PEP is accepted then the original code will become correct as-is. Or if this isn't convincing, then here's another example of broken code; fixing it requires even greater gyrations, and these are left as an exercise for the reader:

def f2():
    with warnings.catch_warnings(record=True) as w:
        for x in g():
            yield x
    assert len(w) == 1
    assert "xyzzy" in w[0].message

And notice that this last example isn't artificial at all -- if you squint, it turns out to be exactly how you write a test that an asyncio-using coroutine g correctly raises a warning. Similar issues arise for pretty much any use of warnings.catch_warnings, decimal.localcontext, or numpy.errstate in asyncio-using code. So there's clearly a real problem to solve here, and the growing prominence of async code makes it increasingly urgent.

Alternative approaches

The main alternative that has been proposed is to create some kind of "task-local storage", analogous to "thread-local storage" [1]. In essence, the idea would be that the event loop would take care to allocate a new "task namespace" for each task it schedules, and provide an API to at any given time fetch the namespace corresponding to the currently executing task. While there are many details to be worked out [2], the basic idea seems doable, and it is an especially natural way to handle the kind of global context that arises at the top-level of async application frameworks (e.g., setting up context objects in a web framework). But it also has a number of flaws:

It only solves the problem of managing global state for coroutines that yield back to an asynchronous event loop. But there actually isn't anything about this problem that's specific to asyncio -- as shown in the examples above, simple generators run into exactly the same issue.
It creates an unnecessary coupling between event loops and code that needs to manage global state. Obviously an async web framework needs to interact with some event loop API anyway, so it's not a big deal in that case. But it's weird that warnings or decimal or NumPy should have to call into an async library's API to access their internal state when they themselves involve no async code. Worse, since there are multiple event loop APIs in common use, it isn't clear how to choose which to integrate with. (This could be somewhat mitigated by CPython providing a standard API for creating and switching "task-local domains" that asyncio, Twisted, tornado, etc. could then work with.)
It's not at all clear that this can be made acceptably fast. NumPy has to check the floating point error settings on every single arithmetic operation. Checking a piece of data in thread-local storage is absurdly quick, because modern platforms have put massive resources into optimizing this case (e.g. dedicating a CPU register for this purpose); calling a method on an event loop to fetch a handle to a namespace and then doing lookup in that namespace is much slower.

More importantly, this extra cost would be paid on every access to the global data, even for programs which are not otherwise using an event loop at all. This PEP's proposal, by contrast, only affects code that actually mixes with blocks and yield statements, meaning that the users who experience the costs are the same users who also reap the benefits.

On the other hand, such tight integration between task context and the event loop does potentially allow other features that are beyond the scope of the current proposal. For example, an event loop could note which task namespace was in effect when a task called call_soon, and arrange that the callback when run would have access to the same task namespace. Whether this is useful, or even well-defined in the case of cross-thread calls (what does it mean to have task-local storage accessed from two threads simultaneously?), is left as a puzzle for event loop implementors to ponder -- nothing in this proposal rules out such enhancements as well. It does seem though that such features would be useful primarily for state that already has a tight integration with the event loop -- while we might want a request id to be preserved across call_soon, most people would not expect:

with warnings.catch_warnings():
    loop.call_soon(f)

to result in f being run with warnings disabled, which would be the result if call_soon preserved global context in general.

Backwards compatibility

Because __suspend__ and __resume__ are optional and default to no-ops, all existing context managers continue to work exactly as before.

Speed-wise, this proposal adds additional overhead when entering a with block (where we must now check for the additional methods; failed attribute lookup in CPython is rather slow, since it involves allocating an AttributeError), and additional overhead at suspension points. Since the position of with blocks and suspension points is known statically, the compiler can straightforwardly optimize away this overhead in all cases except where one actually has a yield inside a with.

Interaction with PEP 492

PEP 492 added new asynchronous context managers, which are like regular context managers but instead of having regular methods __enter__ and __exit__ they have coroutine methods __aenter__ and __aexit__.

There are a few options for how to handle these:

Add __asuspend__ and __aresume__ coroutine methods.

One potential difficulty here is that this would add a complication to an already complicated part of the bytecode interpreter. Consider code like:
```
async def f():
    async with MGR:
        await g()

@types.coroutine
def g():
     yield 1
```
In 3.5, f gets desugared to something like:
```
@types.coroutine
def f():
    yield from MGR.__aenter__()
    try:
        yield from g()
    finally:
        yield from MGR.__aexit__()
```
With the addition of __asuspend__ / __aresume__, the yield from would have to replaced by something like:
```
for SUBVALUE in g():
    yield from MGR.__asuspend__()
    yield SUBVALUE
    yield from MGR.__aresume__()
```
Notice that we've had to introduce a new temporary SUBVALUE to hold the value yielded from g() while we yield from MGR.__asuspend__(). Where does this temporary go? Currently yield from is a single bytecode that doesn't modify the stack while looping. Also, the above code isn't even complete, because it skips over the issue of how to direct send/throw calls to the right place at the right time...
Add plain __suspend__ and __resume__ methods.
Leave async context managers alone for now until we have more experience with them.

It isn't entirely clear what use cases even exist in which an async context manager would need to set coroutine-local-state (= like thread-local-state, but for a coroutine stack instead of an OS thread), and couldn't do so via coordination with the coroutine runner. So this draft tentatively goes with option (3) and punts on this question until later.

References

[1]	https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg https://github.com/python/asyncio/issues/165

[2]

For example, we would have to decide whether there is a single task-local namespace shared by all users (in which case we need a way for multiple third-party libraries to adjudicate access to this namespace), or else if there are multiple task-local namespaces, then we need some mechanism for each library to arrange for their task-local namespaces to be created and destroyed at appropriate moments. The preliminary patch linked from the github issue above doesn't seem to provide any mechanism for such lifecycle management.

Copyright

This document has been placed in the public domain.

Python Wiki

Python Insider Blog

Python 2 or 3?

Help Fund Python

Non-English Resources