Skip to content

non-awaitable object makes asyncio.wait hang if there are async subprocess tasks #105288

@keuin

Description

@keuin

asyncio.waitaccepts a Iterable[Awaitable[_T]] as its first parameter. However it will hang forever if we passes an invalid Awaitable to it and there are subprocesses wrapped in asyncio.Task running.

To sum up, we have to meet these conditions at once:

  1. There are tasks for subprocesses managed by asyncio.subprocess. It does not matter if we asyncio.wait for it or not.
  2. The program attempts to perform asyncio.wait on iterators that can yield non-awaitable objects (such as integers).

Here is a litmus program that reproduces this problem:

import asyncio
from asyncio import subprocess


async def whoami():
    print('Finding out who am I...')
    proc = await subprocess.create_subprocess_exec(
        'whoami',
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
    )
    stdout, _ = await proc.communicate()
    print(f'I am {stdout}')


async def main():
    t1 = asyncio.create_task(asyncio.sleep(0))
    t2 = asyncio.create_task(whoami())
    # Both `[0, t1]` and `[0, t2]` can cause the problem
    await asyncio.wait([0, t2])


if __name__ == '__main__':
    asyncio.run(main())

When running it on Linux, this program is expected to terminate immediately after throwing an exception. Because we are attempting to wait for 0, which does not implement Awaitable[_T]. However the program will barely output:

Finding out who am I...

and hangs forever. Hitting Ctrl-C does terminate it, but the stack trace points to somewhere far away from the real cause:

Traceback (most recent call last):
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/litmus.py", line 19, in main
    await asyncio.wait([0, t2])
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/tasks.py", line 418, in wait
    return await _wait(fs, timeout, return_when, loop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/tasks.py", line 522, in _wait
    f.add_done_callback(_on_completion)
    ^^^^^^^^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'add_done_callback'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/cpython/bisect/cpython/litmus.py", line 23, in <module>
    asyncio.run(main())
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 189, in run
    with Runner(debug=debug) as runner:
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 63, in __exit__
    self.close()
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 71, in close
    _cancel_all_tasks(loop)
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 201, in _cancel_all_tasks
    loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/base_events.py", line 640, in run_until_complete
    self.run_forever()
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/base_events.py", line 607, in run_forever
    self._run_once()
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/base_events.py", line 1884, in _run_once
    event_list = self._selector.select(timeout)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Task was destroyed but it is pending!
task: <Task cancelling name='Task-3' coro=<whoami() running at /tmp/cpython/bisect/cpython/litmus.py:7> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[gather.<locals>._done_callback() at /tmp/cpython/bisect/cpython/Lib/asyncio/tasks.py:754]>

Process finished with exit code 130 (interrupted by signal 2: SIGINT)

It does not matter if we wait for [0, t1], or [0, t2]. As long as t2 (task of the subprocess) is created, the program will hang when awaiting on that list.

If we run it on previous versions of Python (such as the latest branch 3.10, 2c9b0f30), it will cause an exception thrown in the standard library, though the exact behavior seems to be undefined, since I have observed many different locations that throws some kind of exceptions like AttributeError in different locations, when bisecting using git. The correct output looks like:

/tmp/cpython/bisect/cpython/python /tmp/cpython/bisect/cpython/litmus.py 
Finding out who am I...
Traceback (most recent call last):
  File "/tmp/cpython/bisect/cpython/litmus.py", line 23, in <module>
    asyncio.run(main())
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/litmus.py", line 19, in main
    await asyncio.wait([0, t2])
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/tasks.py", line 426, in wait
    return await _wait(fs, timeout, return_when, loop)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/cpython/bisect/cpython/Lib/asyncio/tasks.py", line 530, in _wait
    f.add_done_callback(_on_completion)
    ^^^^^^^^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'add_done_callback'

Process finished with exit code 1

My test environment is Intel 12th gen core with the latest Archlinux & zen kernel (6.3.5-zen1-1-zen). I have rerun the test on an updated Debian 11 VM and got the same result. So I don't think it is related to distros and kernel versions.

Bisecting shows this issue was directly introduced in commit 7015e137: gh-88050: Fix asyncio subprocess to kill process cleanly when process is blocked (#32073), which attempted to fix a previous asyncio issue. After investigating the source code, I believe this was caused by this commit, which has two major problems:

  1. It doesn't validate the object before accessing its attribute pipe. If we look at method _try_finish, we find that method has validated the same objects, so this should be fixed. This causes an unrelated exception and interrupts the clean-up process _process_exited. However this piece of code has been removed in a subsequent commit. So it is not relevant today.
  2. It doesn't wake up waiters in self._exit_waiters after subprocess exited. This directly causes the program hang forever.

I will post my fix and detailed explanation in a PR shortly. Please let me know if there are problems.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions