-
Notifications
You must be signed in to change notification settings - Fork 557
fix(django): Db query recording improvements. #4681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4681 +/- ##
==========================================
+ Coverage 85.02% 85.12% +0.10%
==========================================
Files 156 156
Lines 15856 15917 +61
Branches 2684 2699 +15
==========================================
+ Hits 13481 13550 +69
+ Misses 1595 1585 -10
- Partials 780 782 +2
|
I used this example benchmarking repo to see how these improvements do. Here are the results: v1 = current v1 vs v2 (v2 averaged across 3 runs)
Notes:
Bottom line: v2 is consistently faster. Throughput +8–14%, mean latency −8–12%, tail latency improves 3–14%. |
from django.db import connections | ||
|
||
for alias, db_config in settings.DATABASES.items(): | ||
if not db_config: # Skip empty default configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with these, caching-wise? Will they always go through the slow fallback?
if cached_config.get("db_name"): | ||
span.set_data(SPANDATA.DB_NAME, cached_config["db_name"]) | ||
if cached_config.get("host"): | ||
span.set_data(SPANDATA.SERVER_ADDRESS, cached_config["host"]) | ||
if cached_config.get("port"): | ||
span.set_data(SPANDATA.SERVER_PORT, str(cached_config["port"])) | ||
if cached_config.get("unix_socket"): | ||
span.set_data(SPANDATA.SERVER_SOCKET_ADDRESS, cached_config["unix_socket"]) | ||
if cached_config.get("vendor") and span._data.get(SPANDATA.DB_SYSTEM) is None: | ||
span.set_data(SPANDATA.DB_SYSTEM, cached_config["vendor"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these to a separate function that just picks out the fields from a dictionary? Then we can use it here with cached_config
as well as below with connection_params
. Otherwise it's easy to miss that you have to update three places if you ever make a change to this
# Fallback to dynamic database metadata collection. | ||
# This is the edge case where db configuration is not in Django's `DATABASES` setting. | ||
try: | ||
# Fallback 1: Try db.get_connection_params() first (NO CONNECTION ACCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also cache the results of Fallback 1 and 2? Otherwise we keep potentially hitting the database in these cases
return # Success - exit early to avoid connection access | ||
|
||
except (KeyError, ImproperlyConfigured, AttributeError): | ||
# Fallback 2: Last resort - direct connection access (CONNECTION POOL RISK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should definitely be cached IMO
if cached_config.get("vendor") and span._data.get(SPANDATA.DB_SYSTEM) is None: | ||
span.set_data(SPANDATA.DB_SYSTEM, cached_config["vendor"]) | ||
|
||
return # Success - exit early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is a bit superfluous
return # Success - exit early | |
return |
logger.debug( | ||
"Cached db connection params retrieval failed for %s. Trying db.get_connection_params().", | ||
db_alias, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: logging and async code, would remove
if server_socket_address is not None: | ||
span.set_data(SPANDATA.SERVER_SOCKET_ADDRESS, server_socket_address) | ||
except Exception as e: | ||
logger.debug("Failed to get database connection params for %s: %s", db_alias, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this log we could keep since it's just emitted if there was an error
assert mock.call("server.socket.address", None) not in span.set_data.call_args_list | ||
|
||
|
||
def test_set_db_data_with_cached_config_unix_socket(sentry_init): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does setting unix_socket
need a separate test case? This looks like the previous test case
"""Test _set_db_data falls back to db.get_connection_params() when cache misses.""" | ||
sentry_init(integrations=[DjangoIntegration()]) | ||
|
||
_cached_db_configs.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be good as a fixture after every test case to ensure clean state
@@ -1,12 +1,15 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are mocking A LOT in these tests and they're very unit-testy, any way we can have some full end-to-end tests?
Improve how we record db query connection parameters. This reads the database connection configuration on integration startup from the Django
settings.py
file and caches them in a thread safe manner.When db operations are performed we read the connection data from the cache with a fallback to an improved version to the old logic. We first try to call
db.get_connection_params()
and if this also fails access the connection directly.