-
Notifications
You must be signed in to change notification settings - Fork 0
feat(upsampling) - Support upsampled error count with performance optimizations #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
|
@claude review this PR |
|
Claude finished @akshayutture-augment's task —— View job Code Review - Error Upsampling with Performance OptimizationsI've completed my review of this PR. Overall, the implementation is solid with good test coverage, but I've identified several important issues that should be addressed. ✅ Strengths
🔴 Critical Issues1. Race Condition in Caching Logic (src/sentry/api/helpers/error_upsampling.py:27-40)The caching implementation has a critical race condition that can cause inconsistent query results: # Line 27-40
cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
# Check cache first for performance optimization
cached_result = cache.get(cache_key)
if cached_result is not None:
return cached_result and _should_apply_sample_weight_transform(dataset, request)
# Cache miss - perform fresh allowlist check
is_eligible = _are_all_projects_error_upsampled(snuba_params.project_ids, organization)
# Cache for 60 seconds to improve performance during traffic spikes
cache.set(cache_key, is_eligible, 60)
return is_eligible and _should_apply_sample_weight_transform(dataset, request)Problem: The function caches
If the dataset or request query changes between calls with the same cache key, you'll get inconsistent results. Solution: Cache the full eligibility decision: cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}:{dataset.__name__}:{request.GET.get('query', '')}"
cached_result = cache.get(cache_key)
if cached_result is not None:
return cached_result
is_eligible = _are_all_projects_error_upsampled(snuba_params.project_ids, organization)
should_transform = is_eligible and _should_apply_sample_weight_transform(dataset, request)
cache.set(cache_key, should_transform, 60)
return should_transform2. Potential Data Inconsistency (src/sentry/api/helpers/error_upsampling.py:43-64)The comment on line 50-53 acknowledges a serious issue but doesn't address it: # NOTE: This function reads the allowlist configuration fresh each time,
# which means it can return different results between calls if the
# configuration changes during request processing. This is intentional
# to ensure we always have the latest configuration state.Problem: Combined with the cache, this creates a window where:
Solution: Either:
3. Missing Null Safety (src/sentry/api/helpers/error_upsampling.py:87-100)def transform_query_columns_for_error_upsampling(
query_columns: Sequence[str],
) -> list[str]:
# ...
# Note: We rely on the database schema to ensure sample_weight exists for all
# events in allowlisted projects, so no additional null checks are needed here.Problem: The comment claims schema guarantees, but:
Impact: Queries will break with NULL errors if Solution: Add defensive null handling in the SQL function definition in discover.py:1046-1050: snql_aggregate=lambda args, alias: Function(
"toInt64",
[Function("sum", [Function("coalesce", [Column("sample_weight"), 1])])],
alias,
),
|
Test 3
Replicated from ai-code-review-evaluation/sentry-greptile#3