-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-5021 - Fix usages of getaddrinfo to be non-blocking #2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Typing failures fixed by #2060. |
pymongo/asynchronous/auth.py
Outdated
hostname, None, 0, 0, socket.IPPROTO_TCP, socket.AI_CANONNAME | ||
)[0] | ||
if not _IS_SYNC: | ||
loop = asyncio.get_event_loop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be get_running_loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "Because this function has rather complex behavior (especially when custom event loop policies are in use), using the get_running_loop() function is preferred to get_event_loop() in coroutines and callbacks.".
https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find--we use asyncio.get_event_loop()
elsewhere in the code, I'll open a separate ticket to change those uses as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed for the rest of the codebase in #2063.
pymongo/asynchronous/auth.py
Outdated
if not _IS_SYNC: | ||
loop = asyncio.get_event_loop() | ||
af, socktype, proto, canonname, sockaddr = ( | ||
await loop.getaddrinfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI:
Note Both getaddrinfo and getnameinfo internally utilize their synchronous versions through the loop’s default thread pool executor. When this executor is saturated, these methods may experience delays, which higher-level networking libraries may report as increased timeouts. To mitigate this, consider using a custom executor for other user tasks, or setting a default executor with a larger number of workers.
https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.getaddrinfo
Which means our users will eventually run into this issue: python/cpython#112169
This is still better than blocking the loop of course but I wonder if we need to warn of this potential problem or if we should test it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would explicitly testing it provide an actionable solution? We could increase the default number of workers to help mitigate this, but warning users might only confuse them since this is an internal API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead use run_in_executor
and have our own executor? We use run_in_executor
in _configured_socket
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current uses of run_in_executor
also utilize the default thread pool executor. We could configure the executor to have a higher default number of workers, but we'd still hit the same issue depending on the system's resource limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that user's code will default to the default executor, so we'd be contending with its resources. We'd be essentially taking Guido's advice and applying it to a library so it doesn't interfere with a default user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're saying we use our own ThreadPoolExecutor
instance to avoid competing with the default executor? Does that make a difference when the underlying OS threads are still shared between the executors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does if the OS thread limit is much higher than the default thread executor's thread limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I like that idea then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest making a utility function for getaddrinfo
to avoid repeating this ugly block everywhere. ;)
pymongo/asynchronous/helpers.py
Outdated
async def getaddrinfo(host, port, **kwargs): | ||
if not _IS_SYNC: | ||
loop = asyncio.get_running_loop() | ||
return await loop.getaddrinfo( # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be using run_in_executor
here instead as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch sorry, juggling too many changes at once 😅
pymongo/_asyncio_executor.py
Outdated
|
||
from concurrent.futures import ThreadPoolExecutor | ||
|
||
_PYMONGO_EXECUTOR = ThreadPoolExecutor(thread_name_prefix="PYMONGO_EXECUTOR-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like this approach because we now have a thread pool that hangs around forever even after all clients have been closed.
My other comment was more around adding guidance for potential errors, not for changing our implementation. Like something that says "if your app runs into "XXX" error consider this may mean your app's default loop executor is under provisioned. Consider increasing the size of this thread pool or ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be difficult to distinguish when this issue occurs, having a separate thread pool for our internal use will help mitigate how common it is. An extra thread pool instance shouldn't be expensive to have a reference to for the lifetime of the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer we go with the loop.getaddrinfo approach because it avoids adding the complexity of managing our own thread pool. It's not really kosher to leave a threadpool open even if the threads are "idle". The limitation in loop.getaddrinfo is also implementation detail that could be fixed at any point (even in a python bugfix release).
I expect it will be clear when this issue occurs because a timeout error caused by threadpool starvation looks different than a real DNS timeout error. It should be simple to add an example to our docs by:
- saturating the executor with long running tasks
- then attempting run a client command
- record the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much complexity in managing our own thread pool, but I totally understand the desire to not have an extra pool lying around. I'll revert back to using loop.getaddrinfo()
once I have a good example for our docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigating, I believe the docs are slightly misleading: what actually happens when the executor pool is fully saturated is any loop.getaddrinfo()
calls block until a thread is freed up for use. There's no timeout mechanism inherent to the executor pool. We could add our own timeout to every loop.run_in_executor()
call to prevent users from accidentally blocking the driver forever if they saturate the default executor permanently, but then we would cause timeouts to occur whenever the response is too slow.
If we don't add any timeouts to those calls, users will experience slowdowns whenever they perform a driver operation while the default executor pool is fully saturated. That's preferable to spurious timeouts in my opinion, especially when the user's own code is what determines the frequency of the timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating, I agree with that. The cpython ticket referencing anyio so that could explain the difference.
pymongo/synchronous/helpers.py
Outdated
@@ -68,6 +70,24 @@ def inner(*args: Any, **kwargs: Any) -> Any: | |||
return cast(F, inner) | |||
|
|||
|
|||
def getaddrinfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's rename this _getaddrinfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
No description provided.