-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context
This might be a known issue, as it seems to be mentioned in the readme:
Keep in mind, that this may lead to some problems or infinite locks, even if timeouts have been added.
We found a workaround for our project, but I still decided to open this ticket to possibly help resolve this issue or help other users work around it.
Steps to reproduce
Set your token and run the script.
from functools import partial
from threading import Thread
from nebius.sdk import SDK
TOKEN = ...
sdk = SDK(credentials=TOKEN)
def test(i):
sdk.whoami(timeout=5).wait()
print(f"Thread {i} done")
threads = [Thread(target=partial(test, i)) for i in range(10)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()Expected behavior
Each of the threads prints a message, the script exits.
Thread 2 done
Thread 1 done
Thread 0 done
Thread 7 done
Thread 6 done
Thread 4 done
Thread 8 done
Thread 9 done
Thread 3 done
Thread 5 done
Actual behavior
Some threads print a message, the script hangs and never exits.
Exception in callback PollerCompletionQueue._handle_events()()
handle: <Handle PollerCompletionQueue._handle_events()()>
Traceback (most recent call last):
File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
self._context.run(self._callback, *self._args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 11] Resource temporarily unavailable
Thread 4 done
Exception in callback PollerCompletionQueue._handle_events()()
handle: <Handle PollerCompletionQueue._handle_events()()>
Traceback (most recent call last):
File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
self._context.run(self._callback, *self._args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 11] Resource temporarily unavailable
Thread 6 done
Exception in callback PollerCompletionQueue._handle_events()()
handle: <Handle PollerCompletionQueue._handle_events()()>
Traceback (most recent call last):
File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
self._context.run(self._callback, *self._args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 11] Resource temporarily unavailable
Thread 3 done
Exception in callback PollerCompletionQueue._handle_events()()
handle: <Handle PollerCompletionQueue._handle_events()()>
Traceback (most recent call last):
File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
self._context.run(self._callback, *self._args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 11] Resource temporarily unavailable
Thread 1 done
Thread 7 done
Exception in callback PollerCompletionQueue._handle_events()()
handle: <Handle PollerCompletionQueue._handle_events()()>
Traceback (most recent call last):
File "/usr/lib64/python3.13/asyncio/events.py", line 89, in _run
self._context.run(self._callback, *self._args)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/completion_queue.pyx.pxi", line 147, in grpc._cython.cygrpc.PollerCompletionQueue._handle_events
BlockingIOError: [Errno 11] Resource temporarily unavailable
Thread 9 done
Thread 8 done
Thread 0 doneYou may need to run the script a few times to reproduce.
The error messages are likely caused by grpc/grpc#25364 and not related to deadlocks. We've seen deadlocks without these error messages too.
Workaround
While it's not feasible for us to rewrite a big chunk of our project using the asynchronous stack, we solved the problem by running an event loop dedicated to the SDK in a separate thread and passing async SDK calls to that loop from our sync code running in other threads.
import asyncio
from functools import partial
from threading import Thread
from nebius.sdk import SDK
TOKEN = ...
sdk = SDK(credentials=TOKEN)
loop = asyncio.new_event_loop()
Thread(target=lambda: loop.run_forever(), daemon=True).start()
async def coroutine(awaitable):
return await awaitable
def test(i):
asyncio.run_coroutine_threadsafe(coroutine(sdk.whoami(timeout=5)), loop).result()
print(f"Thread {i} done")
threads = [Thread(target=partial(test, i)) for i in range(10)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()Not sure what the root cause of the deadlocks is, but maybe an approach similar to this workaround could be used by the SDK internally to provide a thread-safe synchronous API.