Support upsampled error count with performance optimizations #3

hussam789 · 2025-10-30T21:43:38Z

User description

PR #3

PR Type

Enhancement, Tests

Description

Implement error upsampling support in events-stats API with upsampled_count() function
Add 60-second caching for upsampling eligibility checks to optimize performance
Transform count() aggregations to use sum(sample_weight) for allowlisted projects
Add comprehensive test coverage for upsampling logic and edge cases
Extract sample_rate from error_sampling context in event factory

Diagram Walkthrough

flowchart LR
  A["Events Stats API"] --> B["Check Upsampling Eligibility"]
  B --> C["Cache Result 60s"]
  C --> D["Transform Query Columns"]
  D --> E["Apply upsampled_count()"]
  E --> F["Execute Snuba Query"]
  F --> G["Return Results"]

File Walkthrough

Relevant files

Enhancement

organization_events_stats.py `Integrate upsampling eligibility checks and column transformation` src/sentry/api/endpoints/organization_events_stats.py Import error upsampling helper functions for eligibility checks and query transformation Add early upsampling eligibility check with caching for performance optimization Apply query column transformation before executing top events and standard timeseries queries Fix error message grammar in validation response Pass transformed columns (`final_columns`) to all query execution paths	+33/-5
error_upsampling.py `Create error upsampling helper with caching and transformations` src/sentry/api/helpers/error_upsampling.py Create new helper module for error upsampling logic with caching support Implement `is_errors_query_for_error_upsampled_projects()` with 60-second cache for allowlist eligibility Add `_are_all_projects_error_upsampled()` to validate all projects are in allowlist Implement `transform_query_columns_for_error_upsampling()` to convert `count()` to `upsampled_count()` Add `_should_apply_sample_weight_transform()` to determine dataset applicability Provide `invalidate_upsampling_cache()` utility for configuration management	+140/-0
discover.py `Add upsampled_count SnQL function for error aggregation` src/sentry/search/events/datasets/discover.py Add new `upsampled_count` SnQL function to function converter mapping Implement aggregation using `toInt64(sum(sample_weight))` for upsampled error counting Set default result type to `number` for the new function	+12/-0
factories.py `Extract sample_rate from error_sampling context in events` src/sentry/testutils/factories.py Add `MutableMapping` to imports from collections.abc Create `_set_sample_rate_from_error_sampling()` helper function to extract client_sample_rate from error_sampling context Call helper in `store_event()` to populate `sample_rate` field from error_sampling context data	+20/-1

Tests

test_error_upsampling.py `Add unit tests for error upsampling helper functions` tests/sentry/api/helpers/test_error_upsampling.py Create comprehensive test suite for error upsampling helper functions Test `_are_all_projects_error_upsampled()` with various allowlist scenarios Test `transform_query_columns_for_error_upsampling()` with case insensitivity and whitespace handling Test `_is_error_focused_query()` for error vs transaction detection Test `_should_apply_sample_weight_transform()` across different datasets	+101/-0
test_organization_events_stats.py `Add integration tests for error upsampling in events-stats` tests/snuba/api/endpoints/test_organization_events_stats.py Add `OrganizationEventsStatsErrorUpsamplingTest` test class with integration tests Test upsampling with fully allowlisted projects showing upsampled counts Test partial allowlist scenario where upsampling is not applied Test that transaction events use regular count regardless of allowlist Test behavior when no projects are allowlisted	+171/-0

Configuration changes

pyproject.toml `Update mypy configuration for new modules` pyproject.toml Add `sentry.api.helpers.error_upsampling` to mypy ignore list Add `tests.sentry.api.helpers.test_error_upsampling` to mypy ignore list	+2/-0

Miscellaneous

sentry-repo `Update subproject commit reference` sentry-repo Update subproject commit reference to latest version	+1/-0

…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.

- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.

qodo-code-review · 2025-10-30T21:44:25Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: No audit logs: The new upsampling eligibility and transformation logic performs configuration-based decisions and caching without emitting audit logs for critical actions or changes, and the diff does not show any added audit trail. Referred Code cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" # Check cache first for performance optimization cached_result = cache.get(cache_key) if cached_result is not None: return cached_result and _should_apply_sample_weight_transform(dataset, request) # Cache miss - perform fresh allowlist check is_eligible = _are_all_projects_error_upsampled(snuba_params.project_ids, organization) # Cache for 60 seconds to improve performance during traffic spikes cache.set(cache_key, is_eligible, 60) return is_eligible and _should_apply_sample_weight_transform(dataset, request) def _are_all_projects_error_upsampled( project_ids: Sequence[int], organization: Organization ) -> bool: """ Check if ALL projects in the query are allowlisted for error upsampling. ... (clipped 28 lines)
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Cache key risks: Caching eligibility with a hash of sorted project IDs lacks collision handling and does not validate empty/missing IDs, which may cause subtle misapplication of upsampling without explicit error handling. Referred Code cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" # Check cache first for performance optimization cached_result = cache.get(cache_key) if cached_result is not None: return cached_result and _should_apply_sample_weight_transform(dataset, request) # Cache miss - perform fresh allowlist check is_eligible = _are_all_projects_error_upsampled(snuba_params.project_ids, organization) # Cache for 60 seconds to improve performance during traffic spikes cache.set(cache_key, is_eligible, 60) return is_eligible and _should_apply_sample_weight_transform(dataset, request)
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input assumptions: The transformation assumes presence of sample_weight and trusts request query to detect error events without sanitization of inputs beyond simple string checks, which may require validation against unsupported datasets or malformed inputs. Referred Code def _should_apply_sample_weight_transform(dataset: Any, request: Request) -> bool: """ Determine if we should apply sample_weight transformations based on the dataset and query context. Only apply for error events since sample_weight doesn't exist for transactions. """ from sentry.snuba import discover, errors # Always apply for the errors dataset if dataset == errors: return True from sentry.snuba import transactions # Never apply for the transactions dataset if dataset == transactions: return False # For the discover dataset, check if we're querying errors specifically if dataset == discover: result = _is_error_focused_query(request) ... (clipped 17 lines)

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-10-30T21:45:22Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Improve exception handling and code clarity Replace broad `try...except Exception` blocks with safer dictionary access using `.get()` and more specific exception handling for type conversion. This will improve error handling and code robustness. src/sentry/testutils/factories.py [344-357] def _set_sample_rate_from_error_sampling(normalized_data: MutableMapping[str, Any]) -> None: """Set 'sample_rate' on normalized_data if contexts.error_sampling.client_sample_rate is present and valid.""" - client_sample_rate = None - try: - client_sample_rate = ( - normalized_data.get("contexts", {}).get("error_sampling", {}).get("client_sample_rate") - ) - except Exception: - pass - if client_sample_rate: + error_sampling = normalized_data.get("contexts", {}).get("error_sampling") + if not isinstance(error_sampling, Mapping): + return + + client_sample_rate = error_sampling.get("client_sample_rate") + if client_sample_rate is not None: try: normalized_data["sample_rate"] = float(client_sample_rate) - except Exception: + except (ValueError, TypeError): pass Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out the use of broad `except Exception` blocks and proposes replacing them with safer dictionary access and more specific exception handling (`ValueError`, `TypeError`), which is a significant improvement for code robustness and maintainability.	Medium
General	Refactor to avoid redundant logic Refactor the code to eliminate redundant column transformation logic. Apply the transformation once at the beginning based on the `upsampling_enabled` flag. src/sentry/api/endpoints/organization_events_stats.py [226-305] upsampling_enabled = should_upsample -final_columns = query_columns + +if upsampling_enabled: + final_columns = transform_query_columns_for_error_upsampling(query_columns) +else: + final_columns = query_columns if top_events > 0: - # Apply upsampling transformation just before query execution - # This late transformation ensures we use the most current schema assumptions - if upsampling_enabled: - final_columns = transform_query_columns_for_error_upsampling(query_columns) - if use_rpc: return scoped_dataset.run_top_events_timeseries_query( params=snuba_params, query_string=query, y_axes=final_columns, ... ) return scoped_dataset.top_events_timeseries( timeseries_columns=final_columns, ... ) if use_rpc: - # Apply upsampling transformation just before RPC query execution - if upsampling_enabled: - final_columns = transform_query_columns_for_error_upsampling(query_columns) - return scoped_dataset.run_timeseries_query( params=snuba_params, query_string=query, y_axes=final_columns, ... ) -# Apply upsampling transformation just before standard query execution -if upsampling_enabled: - final_columns = transform_query_columns_for_error_upsampling(query_columns) - return scoped_dataset.timeseries_query( selected_columns=final_columns, ... ) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies repeated logic and proposes a valid refactoring to apply the column transformation once, which improves code readability and maintainability by reducing redundancy.	Low
More

yuvmen and others added 2 commits July 25, 2025 09:48

qodo-code-review bot added the Review effort 3/5 label Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support upsampled error count with performance optimizations #3

Support upsampled error count with performance optimizations #3

Uh oh!

hussam789 commented Oct 30, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Oct 30, 2025

Uh oh!

qodo-code-review bot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support upsampled error count with performance optimizations #3

Are you sure you want to change the base?

Support upsampled error count with performance optimizations #3

Uh oh!

Conversation

hussam789 commented Oct 30, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Oct 30, 2025

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Oct 30, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hussam789 commented Oct 30, 2025 •

edited by qodo-code-review bot

Loading