Skip to content

Conversation

@mpage
Copy link
Contributor

@mpage mpage commented Apr 28, 2023

When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.

T0    +---> T1
|     |     |
C0    |     C2
|     |     |
v     |     v
C1    |     C3
|     |
+-----|

The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.

This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.

Intermediate awaitables (e.g. Task, _GatheringFuture) complicate
maintaining a complete chain of awaiters. A special method, __set_awaiter__
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.

mpage added 4 commits April 28, 2023 10:35
When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.

    T0    +---> T1
    |     |     |
    C0    |     C2
    |     |     |
    v     |     v
    C1    |     C3
    |     |
    +-----|

The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.

This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.

Intermediate awaitables (e.g. `Task`, `_GatheringFuture`) complicate
maintaining a complete chain of awaiters. A special method, `__set_awaiter__`
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.
@mpage mpage marked this pull request as ready for review April 28, 2023 20:33
Copy link
Member

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Do you have pyperformance benchmark results?

@mpage
Copy link
Contributor Author

mpage commented May 1, 2023

Nice! Do you have pyperformance benchmark results?

Nope, will run them.

@arhadthedev
Copy link
Member

This branch has conflicts that must be resolved
[...]
Python/generated_cases.c.h

I tried to regenerate Python/generated_cases.c.h by porting the make rule:

cpython/Makefile.pre.in

Lines 1490 to 1503 in a679c3d

regen-cases:
# Regenerate Python/generated_cases.c.h
# and Python/opcode_metadata.h
# from Python/bytecodes.c
# using Tools/cases_generator/generate_cases.py
PYTHONPATH=$(srcdir)/Tools/cases_generator \
$(PYTHON_FOR_REGEN) \
$(srcdir)/Tools/cases_generator/generate_cases.py \
--emit-line-directives \
-o $(srcdir)/Python/generated_cases.c.h.new \
-m $(srcdir)/Python/opcode_metadata.h.new \
$(srcdir)/Python/bytecodes.c
$(UPDATE_FILE) $(srcdir)/Python/generated_cases.c.h $(srcdir)/Python/generated_cases.c.h.new
$(UPDATE_FILE) $(srcdir)/Python/opcode_metadata.h $(srcdir)/Python/opcode_metadata.h.new

to the Windows cmd. However, I get an exception:

D:\Oleg\cpython>set PYTHONPATH=D:\Oleg\cpython

D:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h.new Python/opcode_metadata.h.new
Running Release|x64 interpreter...
Traceback (most recent call last):
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>
    main()
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main
    a.parse()  # Raises SyntaxError on failure
    ^^^^^^^^^
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse
    self.parse_file(filename, instrs_idx)
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 568, in parse_file
    with open(filename) as file:
         ^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'Python/generated_cases.c.h.new'

D:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h Python/opcode_metadata.h
Running Release|x64 interpreter...
Traceback (most recent call last):
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>
    main()
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main
    a.parse()  # Raises SyntaxError on failure
    ^^^^^^^^^
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse
    self.parse_file(filename, instrs_idx)
  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 583, in parse_file
    raise psr.make_syntax_error(
  File "Python/generated_cases.c.h", line 4658
    }
    ^
SyntaxError: Couldn't find '// BEGIN BYTECODES //' in Python/generated_cases.c.h

D:\Oleg\cpython>python
Running Release|x64 interpreter...
Python 3.12.0a5+ (tags/v3.12.0a5-186-gf300a1fa4c:f300a1fa4c, Apr 30 2023, 22:46:31) [MSC v.1929 64 bit (AMD64)] on win32

How should I call it correctly?

@carljm
Copy link
Member

carljm commented May 1, 2023

@arhadthedev it would be better to make a new separate issue for difficulties regenerating cases on Windows, rather than conflating it into this PR, which isn't really related.

@gvanrossum
Copy link
Member

FYI you should be able to run the script without any parameters and it will use the right files.

Also I thought it is run automatically by the Windows build.bat script.

@mpage
Copy link
Contributor Author

mpage commented May 1, 2023

PyPerformance results: https://gist.github.com/mpage/b8f712396755c3fee277f84352058608

Looks neutral except for unpack sequence (1.23x faster) and unpickle (1.30x slower). I'm surprised that this change would affect either of these benchmarks. Are they considered to be reliable?

@carljm
Copy link
Member

carljm commented May 1, 2023

Just chatted with @mpage and there are a few things we could do to streamline this more:

  1. The FRAME_OWNED_BY_GENERATOR pre-check shouldn't be necessary for correctness, we can probably just eliminate it?
  2. Since we are doing a runtime check here based on characteristics of the code object (is it an async def) it would make more sense to push that into the compiler. We could extract this into a dedicated SET_AWAITER opcode that the compiler emits only in async defs; then no runtime check is needed.
  3. Alternatively if we don't want it in a separate opcode, we could at least specialize away the check by having two versions of SEND_GEN, one where the caller is known to be an async def and one where it isn't.
  4. We could decide to have this feature enabled only by a runtime flag (since it's for improved debugging/profiling). In that case we probably wouldn't want it handled in the compiler, since then we'd need different sets of pycs for flag on/off.

My inclination is for #2 (do it in the compiler, new opcode.) Would welcome comments from other reviewers (e.g. @markshannon, @gvanrossum, @willingc).

@markshannon
Copy link
Member

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines.
It would be more efficient to link the tasks.
So instead of

T0    +---> T1
|     |     |
C0    |     C2
|     |     |
v     |     v
C1    |     C3
|     |
+-----|

We would have

T0  ---> T1
|        |
C0       C2
|        |
v        v
C1       C3

The original could then be reconstructed as needed.

@njsmith
Copy link
Contributor

njsmith commented May 3, 2023

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task does await some_task, then it invokes asyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regular Futures, not just tasks.

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:

async def wait_on(task):
    await task

async def main():
    sleep_task = asyncio.ensure_future(asyncio.sleep(10))
    task1 = asyncio.ensure_future(wait_on(sleep_task))
    task2 = asyncio.ensure_future(wait_on(sleep_task))

I think in this case the link from task1 -> sleep_task would get overwritten and lost?

@mpage
Copy link
Contributor Author

mpage commented May 8, 2023

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines. It would be more efficient to link the tasks. So instead of

The original could then be reconstructed as needed.

It's worth mentioning that the choice to maintain the async stacks on coroutines was to make it as simple as possible to reconstruct the stack from an eBPF probe. I think that linking tasks should be doable, it'll mean that we have to recreate the logic of _PyGen_yf in a probe, but that isn't terribly complicated.

Where were you thinking that we would update the links between tasks?

@mpage
Copy link
Contributor Author

mpage commented May 8, 2023

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task does await some_task, then it invokes asyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regular Futures, not just tasks.

It probably doesn't matter in practice, but since __await__ is an ordinary method there's nothing that guarantees that whatever calls __await__ will await the awaitable that is returned (wow that was a mouthful!).

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:

async def wait_on(task):
    await task

async def main():
    sleep_task = asyncio.ensure_future(asyncio.sleep(10))
    task1 = asyncio.ensure_future(wait_on(sleep_task))
    task2 = asyncio.ensure_future(wait_on(sleep_task))

I think in this case the link from task1 -> sleep_task would get overwritten and lost?

Yep, this is a limitation of the implementation, the last awaiter wins. This has worked out OK for us so far.

@gvanrossum
Copy link
Member

I'm not going to be able to give this enough attention any time soon. Would it be okay to consider this for 3.13 instead of 3.12? 3.12 feature freeze (beta 1) was supposed to be today, and while the RM has postponed that by two weeks, he also gave guidance that this is not meant as an encouragement for new features (the SC ruled on a few pending PEPs, allowing some, postponing others to 3.13).

@carljm
Copy link
Member

carljm commented May 9, 2023

I was already assuming this was under discussion for 3.13, not a candidate for 3.12.

@jbower-fb
Copy link
Contributor

I'm having another stab at this taking into account what was discussed here. For more details see my comment here: #91048 (comment).

@ambv
Copy link
Contributor

ambv commented Jan 22, 2025

Thanks for this, we went with gh-124640 instead.

@ambv ambv closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants