Skip to content

INTPYTHON-574 Add support for connection pooling #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 12, 2025

Conversation

timgraham
Copy link
Collaborator

@timgraham timgraham commented Apr 18, 2025

@timgraham timgraham force-pushed the cache-mongoclient branch 2 times, most recently from 2037b16 to 8178344 Compare April 29, 2025 15:15
@timgraham timgraham changed the title Improve performance by caching MongoClient INTPYTHON-574 Add support for connection pooling Apr 29, 2025
# setdefault() ensures that multiple threads don't set this in
# parallel.
self._connection_pools.setdefault(self.alias, conn)
return self._connection_pools[self.alias]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever possible for this cache to become out of sync? Does the user ever have raw access to the MongoClient (where they could call client.close())?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the MonogClient is available at django.db.connection.connection, but I don't think it's our job to consider situations where the user does something unexpected like calling close().

@@ -176,7 +177,12 @@ def get_connection_params(self):

@async_unsafe
def get_new_connection(self, conn_params):
return MongoClient(**conn_params, driver=self._driver_info())
if self.alias not in self._connection_pools:
Copy link
Member

@ShaneHarvey ShaneHarvey Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Django (or Django apps) commonly use fork() or multiprocessing? If so we should consider clearing this cache in the child process. Perhaps using

if hasattr(os, "register_at_fork"):
    os.register_at_fork(after_in_child=clear_client_cache)

https://docs.python.org/3/library/os.html#os.register_at_fork:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something I've had experience with. Google AI overview says:

Django, when deployed using WSGI servers like Gunicorn or uWSGI, operates in a prefork model. This means the server forks multiple worker processes to handle incoming requests concurrently. Each worker process runs independently, allowing Django to manage multiple requests simultaneously. However, directly using os.fork within a Django application is generally discouraged due to potential conflicts with Django's request handling and database connections. For managing background tasks or parallel processing, libraries like multiprocessing or task queues such as Celery are recommended instead of direct forking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good to me. We can reevaluate if someone reports it as a problem.

Comment on lines 205 to 208
def close(self):
super().close()
# MongoClient is a connection pool and, unlike database drivers that
# implement PEP 249, shouldn't be closed by connection.close().
pass
Copy link
Collaborator Author

@timgraham timgraham May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding close() to do nothing omits the call to validate_thread_sharing() in the superclass implementation. If it's important, we could add that here. There's currently a test failure without it.

Threading, connection sharing between threads... it's all not much I have experience with, so I'm not too confident about any of this. Perhaps you can help to educate me if any of it is obvious to you, Shane.
django/django@34e248e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip those threaded tests since pymongo does not implement PEP 249. We intentionally want to always share the same client among all threads.

Copy link
Collaborator Author

@timgraham timgraham May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the client is already shared among threads as it's stored in the DatabaseWrapper._connection_pools class variable.

The validate_thread_sharing() logic ensures that separate instances of DatabaseWrapper aren't accessed in separate threads. The use case for disabling this is to allow in-memory sqlite database connections being shared between multiple threads for Selenium tests (#2879). I'm doubtful that disabling this logic for MongoDB is appropriate, so unless you can say definitively, I'd suggest we leave it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the answer to this.

@timgraham timgraham force-pushed the cache-mongoclient branch 2 times, most recently from bde56c5 to 98b937c Compare May 7, 2025 00:31
@timgraham timgraham marked this pull request as ready for review May 8, 2025 12:40
@timgraham timgraham requested review from WaVEV and Jibola and removed request for WaVEV May 8, 2025 18:44
Comment on lines 201 to 204
def _close(self):
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What function calls this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally DatabaseWrapper.close() but it's also called by some tests.

@timgraham timgraham force-pushed the cache-mongoclient branch from 98b937c to 2cd94a5 Compare May 12, 2025 16:09
@timgraham timgraham force-pushed the cache-mongoclient branch from 2cd94a5 to a643d98 Compare May 12, 2025 16:13
@timgraham timgraham merged commit a643d98 into mongodb:main May 12, 2025
15 checks passed
@timgraham timgraham deleted the cache-mongoclient branch May 19, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants