From a3644e514a5ca619a0451ad7ecce5ee8cb76d928 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Sun, 5 Jan 2025 20:40:48 +0000 Subject: [PATCH 1/4] gh-119786: added InternalDocs/generators.md --- InternalDocs/README.md | 2 +- InternalDocs/generators.md | 59 ++++++++++++++++++++++++++++++++++---- 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 794b4f3c6aad42..4502902307cd5c 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -25,7 +25,7 @@ Runtime Objects - [Code Objects](code_objects.md) -- [Generators (coming soon)](generators.md) +- [Generators](generators.md) - [Frames](frames.md) diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md index afa8b8f4bb8040..82510bea8bc510 100644 --- a/InternalDocs/generators.md +++ b/InternalDocs/generators.md @@ -1,8 +1,57 @@ + +Generators and Coroutines +========================= + Generators -========== +---------- + +The implementation of generators in CPython consists of the builtin object type +`PyGenObject` and bytecode instructions that operate on instances of this type. + +A generator object executes in its own [`frame`](frames.md), like a function. +The difference is that a function returns only once, while a generator +"returns" to the caller every time it emits a new item with a +[`yield` expression](https://docs.python.org/dev/reference/expressions.html#yield-expressions). +This is implemented by the +[`YIELD_VALUE`](https://docs.python.org/dev/library/dis.html#opcode-YIELD_VALUE) +bytecode, which is similar to +[`RETURN_VALUE`](https://docs.python.org/dev/library/dis.html#opcode-RETURN_VALUE) +in the sense that it puts a value on the stack and returns execution to the +calling frame, but it also needs to perform additional work to leave the generator +frame in a state that allows it to be resumed. In particular, it updates the frame's +instruction pointer and stores the interpreter's exception state on the generator +object. When the generator it resumed, this exception state is copied back to the +interpreter state. + +The `frame` of a generator is embedded in the generator object struct (see +`_PyGenObject_HEAD` in [`pycore_genobject.h`](../Include/internal/pycore_genobject.h)). +This means that we can get the frame from the generator or the generator +from the frame (see `_PyGen_GetGeneratorFromFrame` in the same file). +Other fields of the generator struct include metadata (such as the name of +the generator function) and runtime state information (such as whether its +frame is executing, suspended, cleared, etc.). + +Chained Generators +------------------ + +A `yield from` expression creates a generator that efficiently yields the +sequence created by another generator. This is implemented with the +[`SEND` instruction](https://docs.python.org/dev/library/dis.html#opcode-SEND), +which pushes the value of its arg to the stack of the generator's frame, sets +the exception state on this frame, and resumes execution of the chained generator. + +Coroutines +---------- -Coming soon. +Coroutines are generators that use the value returned from a `yield` expression, +i.e., the argument that was passed to the `.send()` call that resumed it after +it yielded. This makes it possible for data to flow in both directions: from +the generator to the caller via the argument of the `yield` expression, and +from the caller to the generator via the send argument to the `send()` call. +A `yield from` expression passes the `send` argument to the chained generator, +so this data flow works along the chain (see `gen_send_ex2()` in +[`genobject.c`](../Objects/genobject.c)). - +Recall that a generator's `__next__` function simply calls `self.send(None)`, +so all this works the same in generators and coroutines, but only coroutines +use the value of the argument to `send`. From dbee274decbf193d4db13f8c279182eb6cd843b7 Mon Sep 17 00:00:00 2001 From: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Date: Sun, 5 Jan 2025 21:20:13 +0000 Subject: [PATCH 2/4] Update generators.md Co-authored-by: Tomas R. --- InternalDocs/generators.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md index 82510bea8bc510..ad55186402f1e3 100644 --- a/InternalDocs/generators.md +++ b/InternalDocs/generators.md @@ -20,7 +20,7 @@ in the sense that it puts a value on the stack and returns execution to the calling frame, but it also needs to perform additional work to leave the generator frame in a state that allows it to be resumed. In particular, it updates the frame's instruction pointer and stores the interpreter's exception state on the generator -object. When the generator it resumed, this exception state is copied back to the +object. When the generator is resumed, this exception state is copied back to the interpreter state. The `frame` of a generator is embedded in the generator object struct (see From 628a3a4059dec430b0c3a36cc9a942409089e01b Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Wed, 15 Jan 2025 01:40:40 +0000 Subject: [PATCH 3/4] added stuff as suggested by Mark --- InternalDocs/generators.md | 60 +++++++++++++++++++++++++++++++++----- 1 file changed, 52 insertions(+), 8 deletions(-) diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md index ad55186402f1e3..5a73623a2ecb17 100644 --- a/InternalDocs/generators.md +++ b/InternalDocs/generators.md @@ -5,12 +5,13 @@ Generators and Coroutines Generators ---------- -The implementation of generators in CPython consists of the builtin object type -`PyGenObject` and bytecode instructions that operate on instances of this type. +The implementation of generators in CPython consists of instances of `PyGenObject` +and bytecode instructions that operate on instances of this type. -A generator object executes in its own [`frame`](frames.md), like a function. -The difference is that a function returns only once, while a generator -"returns" to the caller every time it emits a new item with a +A generator object is invoked in a [`frame`](frames.md), like a function. +The difference is that a function returns to the calling frame only once, +while a generator "returns" to the caller every time it emits a new item +with a [`yield` expression](https://docs.python.org/dev/reference/expressions.html#yield-expressions). This is implemented by the [`YIELD_VALUE`](https://docs.python.org/dev/library/dis.html#opcode-YIELD_VALUE) @@ -23,14 +24,47 @@ instruction pointer and stores the interpreter's exception state on the generato object. When the generator is resumed, this exception state is copied back to the interpreter state. -The `frame` of a generator is embedded in the generator object struct (see -`_PyGenObject_HEAD` in [`pycore_genobject.h`](../Include/internal/pycore_genobject.h)). +The `frame` of a generator is embedded in the generator object struct as a +[`_PyInterpreterFrame`](frames.md) (see `_PyGenObject_HEAD` in +[`pycore_genobject.h`](../Include/internal/pycore_genobject.h)). This means that we can get the frame from the generator or the generator from the frame (see `_PyGen_GetGeneratorFromFrame` in the same file). Other fields of the generator struct include metadata (such as the name of the generator function) and runtime state information (such as whether its frame is executing, suspended, cleared, etc.). +Generator Object Creation and Destruction +----------------------------------------- + +The bytecode of a generator function begins with a +[`RETURN_GENERATOR`](https://docs.python.org/dev/library/dis.html#opcode-RETURN_GENERATOR) +instruction, which creates a generator object, along with its embedded frame. +The generator's frame is initialized as a copy of the frame in which +`RETURN_GENERATOR` is executing, but its `owner` field is overwritten to indicate +that it is owned by a generator. Finally, `RETURN_GENERATOR` pushes the new generator +object to the stack and returns to the caller of the generator function. When the +generator is next resumed by [`gen_send_ex2()`](../Objects/genobject.c), +`_PyEval_EvalFrame()` is called to continue executing the generator function, +in the frame that is embedded in the generator object. + +When a generator object is destructed in [`gen_dealloc`](../Objects/genobject.c), +its embedded `_PyInterpreterFrame` field may need to be preserved, if it is exposed +to Python as part of a [`PyFrameObject`](frames.md#frame-objects). This is detected +in [`_PyFrame_ClearExceptCode`](../Python/frame.c) by the fact that the interpreter +frame's `frame_obj` field is set, and the frame object it points to has refcount +greater than 1. If so, the `take_ownership()` function is called to create a new +copy of the interpreter frame and transfer ownership of it from the generator to +the frame object. + +Iteration +--------- + +The [`FOR_ITER`](https://docs.python.org/dev/library/dis.html#opcode-FOR_ITER) +instruction calls `__next__` on the iterator which is on the top of the stack, +and pushes the result to the stack. It has [`specializations`](adaptive.md) +for a few common iterator types, including `FOR_ITER_GEN`, for iterating over +a generator. + Chained Generators ------------------ @@ -38,7 +72,17 @@ A `yield from` expression creates a generator that efficiently yields the sequence created by another generator. This is implemented with the [`SEND` instruction](https://docs.python.org/dev/library/dis.html#opcode-SEND), which pushes the value of its arg to the stack of the generator's frame, sets -the exception state on this frame, and resumes execution of the chained generator. +the exception state on this frame, and resumes execution of the chained generator. +On return from `SEND`, the value at the top of the stack is sent back up +the generator chain with a `YIELD_VALUE`. This sequence of `SEND` followed by +`YIELD_VALUE` is repeated in a loop, until a `StopIteration` exception is +raised to indicate that the generator has no more values to emit. + +The [`CLEANUP_THROW`](https://docs.python.org/dev/library/dis.html#opcode-CLEANUP_THROW) +instruction is used to handle exceptions raised from the send-yield loop. +Exceptions of type `StopIteration` is handled, their `value` field hold the +value to be returned by the generator's `close()` function. Any other +exception is re-raised by `CLEANUP_THROW`. Coroutines ---------- From 9450013b673ff4a68ab779aedd34e1a924e4a0db Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Thu, 16 Jan 2025 13:15:12 +0000 Subject: [PATCH 4/4] apply suggestions from code review --- InternalDocs/generators.md | 33 +++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md index 5a73623a2ecb17..608bd215aae65a 100644 --- a/InternalDocs/generators.md +++ b/InternalDocs/generators.md @@ -5,13 +5,15 @@ Generators and Coroutines Generators ---------- -The implementation of generators in CPython consists of instances of `PyGenObject` -and bytecode instructions that operate on instances of this type. - -A generator object is invoked in a [`frame`](frames.md), like a function. -The difference is that a function returns to the calling frame only once, -while a generator "returns" to the caller every time it emits a new item -with a +Generators in CPython are implemented with the struct `PyGenObject`. +They consist of a [`frame`](frames.md) and metadata about the generator's +execution state. + +A generator object resumes execution in its frame when its `send()` +method is called. This is analogous to a function executing in its own +fram when it is called, but a function returns to the calling frame only once, +while a generator "returns" execution to the caller's frame every time +it emits a new item with a [`yield` expression](https://docs.python.org/dev/reference/expressions.html#yield-expressions). This is implemented by the [`YIELD_VALUE`](https://docs.python.org/dev/library/dis.html#opcode-YIELD_VALUE) @@ -38,16 +40,17 @@ Generator Object Creation and Destruction The bytecode of a generator function begins with a [`RETURN_GENERATOR`](https://docs.python.org/dev/library/dis.html#opcode-RETURN_GENERATOR) -instruction, which creates a generator object, along with its embedded frame. +instruction, which creates a generator object, including its embedded frame. The generator's frame is initialized as a copy of the frame in which `RETURN_GENERATOR` is executing, but its `owner` field is overwritten to indicate that it is owned by a generator. Finally, `RETURN_GENERATOR` pushes the new generator -object to the stack and returns to the caller of the generator function. When the -generator is next resumed by [`gen_send_ex2()`](../Objects/genobject.c), -`_PyEval_EvalFrame()` is called to continue executing the generator function, -in the frame that is embedded in the generator object. +object to the stack and returns to the caller of the generator function (at +which time its frame is destroyed). When the generator is next resumed by +[`gen_send_ex2()`](../Objects/genobject.c), `_PyEval_EvalFrame()` is called +to continue executing the generator function, in the frame that is embedded in +the generator object. -When a generator object is destructed in [`gen_dealloc`](../Objects/genobject.c), +When a generator object is destroyed in [`gen_dealloc`](../Objects/genobject.c), its embedded `_PyInterpreterFrame` field may need to be preserved, if it is exposed to Python as part of a [`PyFrameObject`](frames.md#frame-objects). This is detected in [`_PyFrame_ClearExceptCode`](../Python/frame.c) by the fact that the interpreter @@ -63,7 +66,9 @@ The [`FOR_ITER`](https://docs.python.org/dev/library/dis.html#opcode-FOR_ITER) instruction calls `__next__` on the iterator which is on the top of the stack, and pushes the result to the stack. It has [`specializations`](adaptive.md) for a few common iterator types, including `FOR_ITER_GEN`, for iterating over -a generator. +a generator. `FOR_ITER_GEN` bypasses the call to `__next__`, and instead +directly pushes the generator stack and resumes its execution from the +instruction that follows the last yield. Chained Generators ------------------