Conversation
| literal(page_token.last_seen_timestamp), | ||
| ), | ||
| f.greater( | ||
| f.reinterpretAsUInt128(f.reverse(f.unhex(column("item_id")))), |
There was a problem hiding this comment.
took this from get trace pagination
There was a problem hiding this comment.
Pull request overview
This PR implements a GDPR export endpoint for EAP (Event Analytics Platform) and fixes an issue where the query pipeline was transforming columns in the ORDER BY clause, which was defeating ClickHouse optimizations and making pagination ineffective.
Key Changes:
- Added new
EndpointExportTraceItemsRPC endpoint with pagination support for exporting trace items - Introduced
skip_transform_order_byflag in query settings to preserve ORDER BY clause column names for optimal ClickHouse performance - Refactored common array processing functions (
process_arrays,transform_array_value) andBUCKET_COUNTconstant to shared utilities
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Updated sentry-protos dependency from 0.4.8 to 0.4.9 to support new export endpoint protobuf definitions |
| pyproject.toml | Updated sentry-protos version requirement to match lock file |
| tests/web/rpc/v1/test_utils.py | Added BASE_TIME constant definition and timezone import for test consistency |
| tests/web/rpc/v1/test_endpoint_export_trace_items.py | Added comprehensive tests for the new export endpoint including pagination and order by transformation validation |
| snuba/web/rpc/v1/endpoint_export_trace_items.py | Implemented new GDPR export endpoint with cursor-based pagination and proper attribute handling |
| snuba/web/rpc/common/common.py | Extracted shared array processing utilities and BUCKET_COUNT constant for reuse across endpoints |
| snuba/web/rpc/v1/endpoint_trace_item_details.py | Updated to use shared BUCKET_COUNT constant from common module |
| snuba/web/rpc/v1/endpoint_get_trace.py | Refactored to use shared process_arrays function from common module |
| snuba/query/query_settings.py | Added skip_transform_order_by setting to control ORDER BY clause transformation |
| snuba/query/init.py | Added skip_transform_order_by parameter to transform_expressions method |
| snuba/query/processors/physical/type_converters.py | Updated to respect skip_transform_order_by setting during query processing |
| snuba/pipeline/stages/query_processing.py | Modified storage processing stage to pass skip_transform_order_by setting through transformation pipeline |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert response.page_token.end_pagination == False | ||
| else: | ||
| assert response.page_token.end_pagination == True |
There was a problem hiding this comment.
Use is False instead of == False for boolean comparisons. This is a Python best practice that avoids potential issues with truthy/falsy values and is more explicit about comparing with the boolean singleton.
| assert response.page_token.end_pagination == False | |
| else: | |
| assert response.page_token.end_pagination == True | |
| assert response.page_token.end_pagination is False | |
| else: | |
| assert response.page_token.end_pagination is True |
| ts = row.pop("timestamp") | ||
| client_sample_rate = float(1.0 / row.pop("sampling_weight", 1.0)) | ||
| server_sample_rate = float(1.0 / row.pop("sampling_weight", 1.0)) | ||
| sampling_factor = row.pop("sampling_factor", 1.0) # noqa: F841 |
There was a problem hiding this comment.
Variable sampling_factor is not used.
| sampling_factor = row.pop("sampling_factor", 1.0) # noqa: F841 | |
| row.pop("sampling_factor", None) |
| integers = row.pop("attributes_int", {}) or {} | ||
| floats = row.pop("attributes_float", {}) or {} | ||
|
|
||
| breakpoint() |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| client_sample_rate = float(1.0 / row.pop("client_sample_rate", 1.0)) | ||
| server_sample_rate = float(1.0 / row.pop("server_sample_rate", 1.0)) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| item_id = row.pop("id") | ||
| item_type = row.pop("item_type") | ||
| ts = row.pop("timestamp") | ||
| client_sample_rate = float(1.0 / row.pop("client_sample_rate", 1.0)) |
| *[column(f"attributes_float_{n}") for n in range(BUCKET_COUNT)], | ||
| alias="attributes_float", | ||
| ), | ||
| ), | ||
| SelectedExpression("attributes_int", column("attributes_int", alias="attributes_int")), | ||
| SelectedExpression("attributes_bool", column("attributes_bool", alias="attributes_bool")), | ||
| SelectedExpression( | ||
| "attributes_array", | ||
| FunctionCall("attributes_array", "toJSONString", (column("attributes_array"),)), | ||
| ), | ||
| ] |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
Product team reports pagination to be slow, and we think making the order by clause of the query match the sort key of the CH table will help (consequently we change the pagination to match the order by). This is related to the pre-existing bug uncovered as part of #7586 We will be measuring the speed up by using our internal DD metrics rollout plan: - monitor error count via DD: https://app.datadoghq.com/s/FH6-Y3/vet-mu7-eq2 - monitor sentry issues - if things go wrong, click revert --------- Co-authored-by: Rachel Chen <rachelchen@PL6VFX9HP4.local> Co-authored-by: Rachel Chen <rachelchen@MacBookPro.attlocal.net>
https://linear.app/getsentry/issue/EAP-320/data-export-endpoint
While working on this PR, we also uncovered that the query pipeline transforms the columns in the order by clause. This is unintended for EAP at least because this defeats the CH optimizations with sort keys, and makes pagination ineffective. My fix for this was to pass in a flag that tells the query pipeline whether or not to do the transformation on the order by clause. This is the safest, simplest, and fastest solution that is also clean (the query pipeline is used by other Snuba stuff that I'm unfamiliar with).