feat(eap): gdpr export endpoint by xurui-c · Pull Request #7586 · getsentry/snuba

xurui-c · 2025-12-10T20:07:07Z

https://linear.app/getsentry/issue/EAP-320/data-export-endpoint

While working on this PR, we also uncovered that the query pipeline transforms the columns in the order by clause. This is unintended for EAP at least because this defeats the CH optimizations with sort keys, and makes pagination ineffective. My fix for this was to pass in a flag that tells the query pipeline whether or not to do the transformation on the order by clause. This is the safest, simplest, and fastest solution that is also clean (the query pipeline is used by other Snuba stuff that I'm unfamiliar with).

Will fix the get trace endpoint in a follow up PR in the interest of keeping this PR small

snuba/web/rpc/common/common.py

snuba/web/rpc/v1/endpoint_export_trace_items.py

xurui-c · 2025-12-11T21:19:56Z

snuba/web/rpc/v1/endpoint_export_trace_items.py

+                        literal(page_token.last_seen_timestamp),
+                    ),
+                    f.greater(
+                        f.reinterpretAsUInt128(f.reverse(f.unhex(column("item_id")))),


took this from get trace pagination

snuba/web/rpc/v1/endpoint_export_trace_items.py

snuba/web/rpc/v1/endpoint_get_trace.py

tests/web/rpc/v1/test_endpoint_get_trace.py

snuba/web/rpc/v1/endpoint_export_trace_items.py

snuba/web/db_query.py

tests/web/rpc/v1/test_endpoint_export_trace_items.py

almost done get rid of some stuff clean item id item id page token done? pagination cleanup cleanup pagination strings remove comment index optimize idk smth up with item_id fixed cleanup c

snuba/web/rpc/v1/endpoint_export_trace_items.py

Copilot

Pull request overview

This PR implements a GDPR export endpoint for EAP (Event Analytics Platform) and fixes an issue where the query pipeline was transforming columns in the ORDER BY clause, which was defeating ClickHouse optimizations and making pagination ineffective.

Key Changes:

Added new EndpointExportTraceItems RPC endpoint with pagination support for exporting trace items
Introduced skip_transform_order_by flag in query settings to preserve ORDER BY clause column names for optimal ClickHouse performance
Refactored common array processing functions (process_arrays, transform_array_value) and BUCKET_COUNT constant to shared utilities

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
uv.lock	Updated sentry-protos dependency from 0.4.8 to 0.4.9 to support new export endpoint protobuf definitions
pyproject.toml	Updated sentry-protos version requirement to match lock file
tests/web/rpc/v1/test_utils.py	Added BASE_TIME constant definition and timezone import for test consistency
tests/web/rpc/v1/test_endpoint_export_trace_items.py	Added comprehensive tests for the new export endpoint including pagination and order by transformation validation
snuba/web/rpc/v1/endpoint_export_trace_items.py	Implemented new GDPR export endpoint with cursor-based pagination and proper attribute handling
snuba/web/rpc/common/common.py	Extracted shared array processing utilities and BUCKET_COUNT constant for reuse across endpoints
snuba/web/rpc/v1/endpoint_trace_item_details.py	Updated to use shared BUCKET_COUNT constant from common module
snuba/web/rpc/v1/endpoint_get_trace.py	Refactored to use shared process_arrays function from common module
snuba/query/query_settings.py	Added skip_transform_order_by setting to control ORDER BY clause transformation
snuba/query/init.py	Added skip_transform_order_by parameter to transform_expressions method
snuba/query/processors/physical/type_converters.py	Updated to respect skip_transform_order_by setting during query processing
snuba/pipeline/stages/query_processing.py	Modified storage processing stage to pass skip_transform_order_by setting through transformation pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

snuba/web/rpc/v1/endpoint_export_trace_items.py

tests/web/rpc/v1/test_utils.py

snuba/web/rpc/v1/endpoint_export_trace_items.py

Copilot · 2025-12-29T20:35:10Z

tests/web/rpc/v1/test_endpoint_export_trace_items.py

+                assert response.page_token.end_pagination == False
+            else:
+                assert response.page_token.end_pagination == True


Use is False instead of == False for boolean comparisons. This is a Python best practice that avoids potential issues with truthy/falsy values and is more explicit about comparing with the boolean singleton.

Suggested change

assert response.page_token.end_pagination == False

else:

assert response.page_token.end_pagination == True

assert response.page_token.end_pagination is False

else:

assert response.page_token.end_pagination is True

tests/web/rpc/v1/test_endpoint_export_trace_items.py

snuba/web/rpc/common/common.py

snuba/web/rpc/v1/endpoint_export_trace_items.py

Copilot · 2025-12-29T20:35:11Z

snuba/web/rpc/v1/endpoint_export_trace_items.py

+        ts = row.pop("timestamp")
+        client_sample_rate = float(1.0 / row.pop("sampling_weight", 1.0))
+        server_sample_rate = float(1.0 / row.pop("sampling_weight", 1.0))
+        sampling_factor = row.pop("sampling_factor", 1.0)  # noqa: F841


Variable sampling_factor is not used.

Suggested change

sampling_factor = row.pop("sampling_factor", 1.0) # noqa: F841

row.pop("sampling_factor", None)

snuba/web/rpc/v1/endpoint_export_trace_items.py

+        integers = row.pop("attributes_int", {}) or {}
+        floats = row.pop("attributes_float", {}) or {}
+
+        breakpoint()


snuba/web/rpc/v1/endpoint_export_trace_items.py

+        client_sample_rate = float(1.0 / row.pop("client_sample_rate", 1.0))
+        server_sample_rate = float(1.0 / row.pop("server_sample_rate", 1.0))


xurui-c · 2026-01-05T19:33:54Z

snuba/web/rpc/v1/endpoint_export_trace_items.py

+        item_id = row.pop("id")
+        item_type = row.pop("item_type")
+        ts = row.pop("timestamp")
+        client_sample_rate = float(1.0 / row.pop("client_sample_rate", 1.0))


change this

snuba/web/rpc/v1/endpoint_export_trace_items.py

+                *[column(f"attributes_float_{n}") for n in range(BUCKET_COUNT)],
+                alias="attributes_float",
+            ),
+        ),
+        SelectedExpression("attributes_int", column("attributes_int", alias="attributes_int")),
+        SelectedExpression("attributes_bool", column("attributes_bool", alias="attributes_bool")),
+        SelectedExpression(
+            "attributes_array",
+            FunctionCall("attributes_array", "toJSONString", (column("attributes_array"),)),
+        ),
+    ]


Product team reports pagination to be slow, and we think making the order by clause of the query match the sort key of the CH table will help (consequently we change the pagination to match the order by). This is related to the pre-existing bug uncovered as part of #7586 We will be measuring the speed up by using our internal DD metrics rollout plan: - monitor error count via DD: https://app.datadoghq.com/s/FH6-Y3/vet-mu7-eq2 - monitor sentry issues - if things go wrong, click revert --------- Co-authored-by: Rachel Chen <rachelchen@PL6VFX9HP4.local> Co-authored-by: Rachel Chen <rachelchen@MacBookPro.attlocal.net>

xurui-c commented Dec 10, 2025

View reviewed changes

snuba/web/rpc/common/common.py Show resolved Hide resolved

xurui-c commented Dec 10, 2025

View reviewed changes

snuba/web/rpc/v1/endpoint_export_trace_items.py Outdated Show resolved Hide resolved

xurui-c commented Dec 10, 2025

View reviewed changes

snuba/web/rpc/v1/endpoint_export_trace_items.py Outdated Show resolved Hide resolved

xurui-c changed the title ~~Rachel/gdpr~~ feat(eap): gdpr export endpoint Dec 10, 2025

xurui-c commented Dec 11, 2025

View reviewed changes

snuba/web/rpc/v1/endpoint_export_trace_items.py Show resolved Hide resolved

phacops reviewed Dec 18, 2025

View reviewed changes

c

cf92fb6

almost done get rid of some stuff clean item id item id page token done? pagination cleanup cleanup pagination strings remove comment index optimize idk smth up with item_id fixed cleanup c

xurui-c force-pushed the rachel/gdpr branch from f409b03 to cf92fb6 Compare December 18, 2025 22:18

import

697dbac

xurui-c force-pushed the rachel/gdpr branch from 4132e0c to 697dbac Compare December 18, 2025 22:27

Rachel Chen added 4 commits December 18, 2025 14:59

no transformation

412ce3f

fixed order by

fff3a26

orderby flag

a869dcc

flag works

0b2ba6f

xurui-c marked this pull request as ready for review December 22, 2025 21:28

xurui-c requested review from a team as code owners December 22, 2025 21:28

order by test

6756cc5

Copilot AI review requested due to automatic review settings December 29, 2025 20:31

Copilot started reviewing on behalf of xurui-c December 29, 2025 20:31 View session

sentry bot reviewed Dec 29, 2025

View reviewed changes

snuba/web/rpc/v1/endpoint_export_trace_items.py Outdated Show resolved Hide resolved

Copilot AI reviewed Dec 29, 2025

View reviewed changes

sample rate columns

03ed92c

sentry bot reviewed Dec 29, 2025

View reviewed changes

get rid of breakpoint

dd786ae

xurui-c commented Jan 5, 2026

View reviewed changes

phacops approved these changes Jan 5, 2026

View reviewed changes

Rachel Chen added 2 commits January 5, 2026 11:37

fix

2686b29

fix

a4b9d41

sentry bot reviewed Jan 5, 2026

View reviewed changes

Rachel Chen added 2 commits January 5, 2026 11:53

fix

972c184

error message

7dc5d8e

xurui-c merged commit 9f6c68b into master Jan 5, 2026
34 checks passed

xurui-c deleted the rachel/gdpr branch January 5, 2026 21:39

xurui-c mentioned this pull request Jan 5, 2026

fix(eap): speed up GetTrace endpoint pagination #7614

Merged

github-actions bot mentioned this pull request Jan 9, 2026

ci(release): Switch from action-prepare-release to Craft #7631

Merged

	sampling_factor = row.pop("sampling_factor", 1.0) # noqa: F841
	row.pop("sampling_factor", None)

		client_sample_rate = float(1.0 / row.pop("client_sample_rate", 1.0))
		server_sample_rate = float(1.0 / row.pop("server_sample_rate", 1.0))

Uh oh!

Conversation

xurui-c commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xurui-c Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

xurui-c Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xurui-c commented Dec 10, 2025 •

edited

Loading