Skip to content

PYTHON-5044 - Fix successive AsyncMongoClients on a single loop always ti… #2065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pymongo/asynchronous/mongo_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -1559,6 +1559,8 @@ async def close(self) -> None:
# Stop the periodic task thread and then send pending killCursor
# requests before closing the topology.
self._kill_cursors_executor.close()
if not _IS_SYNC:
await self._kill_cursors_executor.join()
await self._process_kill_cursors()
await self._topology.close()
if self._encrypter:
Expand Down
4 changes: 4 additions & 0 deletions pymongo/asynchronous/monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,8 @@ def gc_safe_close(self) -> None:

async def close(self) -> None:
self.gc_safe_close()
if not _IS_SYNC:
await self._executor.join()
Copy link
Member

@ShaneHarvey ShaneHarvey Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can call join here because there are cases where close() gets called by the monitor thread/task itself. Joining on yourself will cause the thread/task to hang.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove these changes an open a new issue to track improving the cleanup behavior? Then this PR can be focused on just the network_layer changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not invest time in fixing the cleanup behavior for a test suite we're already working on refactoring. If we're fine with the tests throwing some warnings during the conversion to pytest I'd prefer to just let them throw.

Copy link
Member

@ShaneHarvey ShaneHarvey Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment. Don't we eventually need client.close() to await all the background tasks? That's what I think we need a new ticket to track. This isn't really a test suite issue, it's something end users will encounter when closing clients too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more referring to the work required to get the correct cleanup behavior functioning in our existing test suite. We already have quite a few workarounds for the async test suite to work within the current structure. I expect fixing this issue will only add onto that burden at the same time we're also refactoring the suite entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the core issue of AsyncMongoClient.close() not awaiting all its background tasks needs to be addressed. I'm worried that in doing so with our current test suite, we'll be doing significant additional work that will be thrown out as part of the pytest refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Yes I agree with you on the unittest specific changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate ticket for the AsyncMongoClient.close() changes, but we'll need to decide what to do if the unittest suite doesn't work with them.

await self._rtt_monitor.close()
# Increment the generation and maybe close the socket. If the executor
# thread has the socket checked out, it will be closed when checked in.
Expand Down Expand Up @@ -458,6 +460,8 @@ def __init__(self, topology: Topology, topology_settings: TopologySettings, pool

async def close(self) -> None:
self.gc_safe_close()
if not _IS_SYNC:
await self._executor.join()
# Increment the generation and maybe close the socket. If the executor
# thread has the socket checked out, it will be closed when checked in.
await self._pool.reset()
Expand Down
31 changes: 19 additions & 12 deletions pymongo/network_layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,18 +267,25 @@ async def async_receive_data(
else:
read_task = create_task(_async_receive(sock, length, loop)) # type: ignore[arg-type]
tasks = [read_task, cancellation_task]
done, pending = await asyncio.wait(
tasks, timeout=timeout, return_when=asyncio.FIRST_COMPLETED
)
for task in pending:
task.cancel()
if pending:
await asyncio.wait(pending)
if len(done) == 0:
raise socket.timeout("timed out")
if read_task in done:
return read_task.result()
raise _OperationCancelled("operation cancelled")
try:
done, pending = await asyncio.wait(
tasks, timeout=timeout, return_when=asyncio.FIRST_COMPLETED
)
for task in pending:
task.cancel()
if pending:
await asyncio.wait(pending)
if len(done) == 0:
raise socket.timeout("timed out")
if read_task in done:
return read_task.result()
raise _OperationCancelled("operation cancelled")
except asyncio.CancelledError:
for task in tasks:
task.cancel()
await asyncio.wait(tasks)
raise

finally:
sock.settimeout(sock_timeout)

Expand Down
11 changes: 3 additions & 8 deletions pymongo/periodic_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,17 +75,12 @@ def close(self, dummy: Any = None) -> None:
callback; see monitor.py.
"""
self._stopped = True
if self._task is not None:
self._task.cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how this relates to the issue described in DRIVERS-3076. Like will calling cancel here change the user visible events a Monitor emits on close()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a task is cancelled, it should stop executing on the next iteration of the event loop. Since I believe the CancelledError is thrown from the next await call inside the cancelled task, it's possible that the Monitor emits events differently between cancellations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if we use this approach then we won't emit the expected ServerHeartbeatFailedEvent on cancellation. Do we need this change in this PR anymore? Can we defer it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Now that we aren't awaiting background tasks on close in this PR, this change is unneeded.


async def join(self, timeout: Optional[int] = None) -> None:
if self._task is not None:
try:
await asyncio.wait_for(self._task, timeout=timeout) # type-ignore: [arg-type]
except asyncio.TimeoutError:
# Task timed out
pass
except asyncio.exceptions.CancelledError:
# Task was already finished, or not yet started.
raise
await asyncio.wait([self._task], timeout=timeout) # type-ignore: [arg-type]

def wake(self) -> None:
"""Execute the target function soon."""
Expand Down
2 changes: 2 additions & 0 deletions pymongo/synchronous/mongo_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -1553,6 +1553,8 @@ def close(self) -> None:
# Stop the periodic task thread and then send pending killCursor
# requests before closing the topology.
self._kill_cursors_executor.close()
if not _IS_SYNC:
self._kill_cursors_executor.join()
self._process_kill_cursors()
self._topology.close()
if self._encrypter:
Expand Down
4 changes: 4 additions & 0 deletions pymongo/synchronous/monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,8 @@ def gc_safe_close(self) -> None:

def close(self) -> None:
self.gc_safe_close()
if not _IS_SYNC:
self._executor.join()
self._rtt_monitor.close()
# Increment the generation and maybe close the socket. If the executor
# thread has the socket checked out, it will be closed when checked in.
Expand Down Expand Up @@ -458,6 +460,8 @@ def __init__(self, topology: Topology, topology_settings: TopologySettings, pool

def close(self) -> None:
self.gc_safe_close()
if not _IS_SYNC:
self._executor.join()
# Increment the generation and maybe close the socket. If the executor
# thread has the socket checked out, it will be closed when checked in.
self._pool.reset()
Expand Down
12 changes: 0 additions & 12 deletions test/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -864,16 +864,6 @@ def max_message_size_bytes(self):
client_context = ClientContext()


def reset_client_context():
if _IS_SYNC:
# sync tests don't need to reset a client context
return
elif client_context.client is not None:
client_context.client.close()
client_context.client = None
client_context._init_client()


class PyMongoTestCase(unittest.TestCase):
def assertEqualCommand(self, expected, actual, msg=None):
self.assertEqual(sanitize_cmd(expected), sanitize_cmd(actual), msg)
Expand Down Expand Up @@ -1136,8 +1126,6 @@ class IntegrationTest(PyMongoTestCase):

@client_context.require_connection
def setUp(self) -> None:
if not _IS_SYNC:
reset_client_context()
if client_context.load_balancer and not getattr(self, "RUN_ON_LOAD_BALANCER", False):
raise SkipTest("this test does not support load balancers")
if client_context.serverless and not getattr(self, "RUN_ON_SERVERLESS", False):
Expand Down
12 changes: 0 additions & 12 deletions test/asynchronous/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -866,16 +866,6 @@ async def max_message_size_bytes(self):
async_client_context = AsyncClientContext()


async def reset_client_context():
if _IS_SYNC:
# sync tests don't need to reset a client context
return
elif async_client_context.client is not None:
await async_client_context.client.close()
async_client_context.client = None
await async_client_context._init_client()


class AsyncPyMongoTestCase(unittest.IsolatedAsyncioTestCase):
def assertEqualCommand(self, expected, actual, msg=None):
self.assertEqual(sanitize_cmd(expected), sanitize_cmd(actual), msg)
Expand Down Expand Up @@ -1154,8 +1144,6 @@ class AsyncIntegrationTest(AsyncPyMongoTestCase):

@async_client_context.require_connection
async def asyncSetUp(self) -> None:
if not _IS_SYNC:
await reset_client_context()
if async_client_context.load_balancer and not getattr(self, "RUN_ON_LOAD_BALANCER", False):
raise SkipTest("this test does not support load balancers")
if async_client_context.serverless and not getattr(self, "RUN_ON_SERVERLESS", False):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from test.asynchronous import (
AsyncIntegrationTest,
async_client_context,
reset_client_context,
unittest,
)
from test.asynchronous.helpers import async_repl_set_step_down
Expand Down
5 changes: 3 additions & 2 deletions test/asynchronous/utils_spec_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -647,10 +647,11 @@ async def setup_scenario(self, scenario_def):
async def run_scenario(self, scenario_def, test):
self.maybe_skip_scenario(test)

# Kill all sessions before and after each test to prevent an open
# Kill all sessions before (sync only) and after each test to prevent an open
# transaction (from a test failure) from blocking collection/database
# operations during test set up and tear down.
await self.kill_all_sessions()
if _IS_SYNC:
await self.kill_all_sessions()
self.addAsyncCleanup(self.kill_all_sessions)
await self.setup_scenario(scenario_def)
database_name = self.get_scenario_db_name(scenario_def)
Expand Down
1 change: 0 additions & 1 deletion test/test_connections_survive_primary_stepdown_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
from test import (
IntegrationTest,
client_context,
reset_client_context,
unittest,
)
from test.helpers import repl_set_step_down
Expand Down
5 changes: 3 additions & 2 deletions test/utils_spec_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -647,10 +647,11 @@ def setup_scenario(self, scenario_def):
def run_scenario(self, scenario_def, test):
self.maybe_skip_scenario(test)

# Kill all sessions before and after each test to prevent an open
# Kill all sessions before (sync only) and after each test to prevent an open
# transaction (from a test failure) from blocking collection/database
# operations during test set up and tear down.
self.kill_all_sessions()
if _IS_SYNC:
self.kill_all_sessions()
self.addCleanup(self.kill_all_sessions)
self.setup_scenario(scenario_def)
database_name = self.get_scenario_db_name(scenario_def)
Expand Down
Loading