-
-
Notifications
You must be signed in to change notification settings - Fork 60
fix(gdpr): paginate with project ID #7650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a GDPR pagination issue by adding project_id as part of the pagination token, ensuring correct cursor-based pagination across multiple projects. The pagination query previously only considered item_type, timestamp, trace_id, and item_id, but since the ORDER BY clause includes project_id (line 299), omitting it from the pagination cursor could result in incorrect pagination behavior when trace items span multiple projects.
Changes:
- Added
project_idas the first field in theExportTraceItemsPageTokenclass - Updated pagination filter logic to include
project_idcomparisons in the proper hierarchical order - Modified test fixtures to use multiple project IDs (
i % 3 + 1) to properly test multi-project pagination
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| snuba/web/rpc/v1/endpoint_export_trace_items.py | Added last_seen_project_id parameter throughout the pagination logic, including token class, protobuf serialization/deserialization, query filter construction, and result processing |
| tests/web/rpc/v1/test_endpoint_export_trace_items.py | Added project_id assignments to test data using modulo pattern to distribute items across projects 1, 2, and 3 |
| tests/web/rpc/v1/test_endpoint_get_trace.py | Added project_id assignments to test data using modulo pattern to distribute items across projects 1, 2, and 3 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| or_cond( | ||
| # (project_id = page_token.last_seen_project_id AND item_type = page_token.last_seen_item_type AND timestamp = page_token.last_seen_timestamp AND trace_id > page_token.last_seen_trace_id) | ||
| and_cond( | ||
| f.equals( | ||
| column("project_id"), literal(page_token.last_seen_project_id) | ||
| ), | ||
| f.equals( | ||
| column("item_type"), literal(page_token.last_seen_item_type) | ||
| ), | ||
| f.equals( | ||
| column("timestamp"), literal(page_token.last_seen_timestamp) | ||
| ), | ||
| f.greater( | ||
| column("trace_id"), literal(page_token.last_seen_trace_id) | ||
| ), | ||
| ), | ||
| # (project_id = page_token.last_seen_project_id AND item_type = page_token.last_seen_item_type AND timestamp = page_token.last_seen_timestamp AND trace_id = page_token.last_seen_trace_id AND item_id > page_token.last_seen_item_id) | ||
| and_cond( | ||
| f.equals( | ||
| column("project_id"), literal(page_token.last_seen_project_id) | ||
| ), | ||
| f.equals( | ||
| column("item_type"), literal(page_token.last_seen_item_type) | ||
| ), | ||
| f.equals( | ||
| column("timestamp"), literal(page_token.last_seen_timestamp) | ||
| ), | ||
| f.equals( | ||
| column("trace_id"), literal(page_token.last_seen_trace_id) | ||
| ), | ||
| f.greater( | ||
| f.reinterpretAsUInt128(f.reverse(f.unhex(column("item_id")))), | ||
| literal(page_token.last_seen_item_id), | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand why you need to OR together these and conditions for pagination. can you explain please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, the next row will satisfy one of these "first-difference" positions that are OR'd together (aka greater than the token). This is because the next row will either be:
- the start of the next project ID
- the start of the next item type within the same project ID
- the start of the next timestamp within the same project and item type
- the start of the next trace within the same project, item, and timestamp
- the start of the next item id within the same project, item, timestamp, and trace
thanks to the sort key of the table
Forgot to include project ID in pagination. Added it & modified the test to reflect multiple projects