Skip to content

Conversation

@everettbu
Copy link
Contributor

@everettbu everettbu commented Jul 29, 2025

Test 2

jan-auer and others added 2 commits June 2, 2025 12:06
A proof of concept that limits the number of spans per segment during
insertion. Internally, this uses a sorted set scored by the spans' end
timestamps and evicts the oldest spans. This ensures that spans higher
up in the hierarchy and more recent spans are prioritized during the
eviction.
…loyments

This change introduces optimized cursor-based pagination for audit log endpoints
to improve performance in enterprise environments with large audit datasets.

Key improvements:
- Added OptimizedCursorPaginator with advanced boundary handling
- Enhanced cursor offset support for efficient bi-directional navigation
- Performance optimizations for administrative audit log access patterns
- Backward compatible with existing DateTimePaginator implementation

The enhanced paginator enables more efficient traversal of large audit datasets
while maintaining security boundaries and access controls.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive performance optimization for the spans buffer system by transitioning from unbounded memory growth to a controlled eviction strategy. The core change shifts from using Redis sets to sorted sets, where spans are ordered by their end_timestamp_precise values, enabling time-based eviction of older spans when buffer limits are exceeded.

Key Changes:

  • Buffer Storage Migration: Converted from Redis SADD/SSCAN operations to ZADD/ZSCAN operations in buffer.py, using timestamps as sorting scores
  • Eviction Logic: Added Lua script logic in add-buffer.lua that automatically removes oldest spans when segments exceed 1,000 spans using zpopmin
  • Data Structure Enhancement: Extended the Span dataclass with end_timestamp_precise field throughout the processing pipeline (factory.py, buffer.py)
  • Pagination Optimizations: Introduced OptimizedCursorPaginator with negative offset support for efficient reverse pagination in audit logs
  • Test Updates: Updated all span test fixtures to include the new required timestamp field

The changes integrate across the entire spans processing pipeline - from initial span ingestion in the consumer factory, through buffer operations, to final data retrieval. The eviction mechanism prevents Redis memory bloat while maintaining the most recent spans for each segment, addressing scalability concerns in high-throughput environments.

PR Description Notes:

  • The PR body only contains "Test 2" which provides no meaningful context about the changes

Confidence score: 1/5

  • This PR has several critical implementation issues that could cause immediate production failures
  • The OptimizedCursorPaginator is imported but doesn't exist in the codebase, causing import errors; negative offset support bypasses normal pagination safety checks; spans buffer has inconsistent eviction behavior between Python and Lua code
  • Files needing attention: src/sentry/api/endpoints/organization_auditlogs.py (missing paginator class), src/sentry/api/paginator.py (unsafe negative offset handling), src/sentry/spans/buffer.py (coordination issues with Lua script)

9 files reviewed, 3 comments

Edit Code Review Bot Settings | Greptile

from sentry.api.bases import ControlSiloOrganizationEndpoint
from sentry.api.bases.organization import OrganizationAuditPermission
from sentry.api.paginator import DateTimePaginator
from sentry.api.paginator import DateTimePaginator, OptimizedCursorPaginator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: OptimizedCursorPaginator is imported but does not exist in the codebase - this will cause ImportError at runtime

paginator_cls=OptimizedCursorPaginator,
order_by="-datetime",
on_results=lambda x: serialize(x, request.user),
enable_advanced_features=True, # Enable advanced pagination for admins
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: enable_advanced_features parameter is not supported by the paginator interface - check the self.paginate method signature

parent_span_id=val.get("parent_span_id"),
project_id=val["project_id"],
payload=payload.value,
end_timestamp_precise=val["end_timestamp_precise"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Direct dictionary access to end_timestamp_precise will raise KeyError if field is missing. Consider using .get() with a default value for backward compatibility.

@GitHoobar
Copy link

Review Summary

🏷️ Draft Comments (14)

Skipped posting 14 draft comments that were valid but scored below your review threshold (>=13/15). Feel free to update them here.

src/sentry/api/endpoints/organization_auditlogs.py (1)

70-73: optimized_pagination query parameter is not validated or restricted, allowing any user to attempt optimized pagination, which could expose advanced pagination features or internal data to unauthorized users if enable_advanced logic is bypassed or misconfigured.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 4/5
  • Urgency Impact: 3/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/api/endpoints/organization_auditlogs.py, lines 70-73, the `optimized_pagination` query parameter is not validated or restricted, which could allow unauthorized users to attempt advanced pagination features. Add an explicit check to return a 403 response if `use_optimized` is true but the user is not an admin (i.e., `enable_advanced` is false). Place this check before the `if use_optimized and enable_advanced:` block.

src/sentry/api/paginator.py (4)

877-882: OptimizedCursorPaginator.get_result allows negative offsets for advanced pagination, but negative slicing in Django QuerySets is not supported and will raise a ValueError at runtime, causing crashes for negative offsets.

📊 Impact Scores:

  • Production Impact: 5/5
  • Fix Specificity: 3/5
  • Urgency Impact: 4/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/api/paginator.py, lines 877-882, OptimizedCursorPaginator.get_result attempts to allow negative offsets for advanced pagination, but Django QuerySets do not support negative slicing and will raise a ValueError at runtime. Please update this logic to raise BadPaginationError if a negative offset is detected, preventing runtime crashes.

182-184, 0886-0886: BasePaginator.get_result and OptimizedCursorPaginator.get_result use list(queryset[start_offset:stop]), which can cause severe performance degradation for large offsets due to inefficient OFFSET/LIMIT queries in SQL backends.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 3/5
  • Urgency Impact: 3/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/api/paginator.py, lines 182-184 and 886, the code uses `list(queryset[start_offset:stop])` for pagination. This pattern leads to significant performance issues on large datasets because SQL OFFSET queries become increasingly slow as the offset grows, resulting in full table scans. Refactor the pagination logic to use keyset/cursor-based pagination instead of relying on OFFSET/LIMIT slicing for large offsets. Ensure the paginator uses indexed columns for range queries and only falls back to OFFSET for small pages or when absolutely necessary.

136-216, 0845-0912: The get_result methods in both BasePaginator and OptimizedCursorPaginator are highly complex (13+ branches), making them difficult to maintain and error-prone for future performance optimizations.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 4/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/api/paginator.py, lines 136-216 and 845-912, the `get_result` methods in `BasePaginator` and `OptimizedCursorPaginator` are overly complex, with many branches and logic paths. This complexity makes the code hard to maintain and hinders future performance improvements. Refactor these methods by extracting major logic blocks (such as cursor handling, queryset building, and result post-processing) into well-named helper methods. This will reduce cyclomatic complexity, improve readability, and make it easier to optimize or extend pagination logic in the future.

123-126: BasePaginator.build_queryset uses Django's extra(where=[...]) with user-controlled values, enabling SQL injection if self.key or value are attacker-controlled.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 2/5
  • Urgency Impact: 4/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/api/paginator.py, lines 123-126, the code uses Django's `extra(where=[...])` with user-controlled values, which is vulnerable to SQL injection if `self.key` or `value` are attacker-controlled. Replace the use of `extra(where=[...])` with a safe Django ORM filter using keyword arguments, e.g., `queryset = queryset.filter(**{f"{self.key}__{('gte' if asc else 'lte')}": value})`.

src/sentry/scripts/spans/add-buffer.lua (2)

44-54: zunionstore returns the number of elements in the resulting sorted set, not the total cardinality; if both union branches are taken, span_count will be overwritten, possibly undercounting for eviction.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 2/5
  • Urgency Impact: 3/5
  • Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/scripts/spans/add-buffer.lua, lines 44-54: The code sets `span_count` to the result of `zunionstore`, but this only reflects the result of the last union operation, not the true cardinality if both branches are taken. This can cause under-eviction. Refactor so that `span_count` is always set to `zcard(set_key)` after all unions are performed, and do not rely on the return value of `zunionstore` for eviction logic.

63-64: zpopmin(set_key, span_count - 1000) evicts all oldest spans in a single call, which can cause significant Redis CPU spikes and latency for large evictions (e.g., thousands of elements) in high-throughput environments.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/scripts/spans/add-buffer.lua, lines 63-64, the code evicts all oldest spans in a single zpopmin call when span_count > 1000. This can cause significant Redis CPU spikes and latency for large evictions. Refactor this block to evict in smaller batches (e.g., 100 at a time) in a loop until the correct number of elements are removed, to avoid blocking Redis for too long.

src/sentry/spans/buffer.py (2)

449-454: _load_segment_data now uses zscan but does not enforce a maximum span count per segment, so segments with more than max_segment_spans are not skipped, risking OOM and violating contract.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 5/5
  • Urgency Impact: 3/5
  • Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/spans/buffer.py, lines 449-454, after switching to zscan, the code no longer enforces the maximum span count per segment (`max_segment_spans`). This can cause segments to exceed the allowed span count, risking OOM and violating the contract. Please reintroduce the check: after extending `payloads[key]`, if `len(payloads[key]) > self.max_segment_spans`, increment the metric, log the error, and delete the segment from `payloads` and `cursors`, then continue. Insert this check before the `if cursor == 0:` block.

197-199: process_spans now uses zadd to insert spans into Redis, but does not enforce max_segment_spans during insertion, allowing unbounded growth and risking memory exhaustion and slowdowns.

📊 Impact Scores:

  • Production Impact: 4/5
  • Fix Specificity: 3/5
  • Urgency Impact: 3/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/spans/buffer.py, lines 197-199, the code inserts all spans in `subsegment` into a Redis sorted set with `zadd`, but does not enforce the `max_segment_spans` limit. This can cause unbounded memory growth and degrade performance. Update this block to ensure that no more than `self.max_segment_spans` spans exist in the set after insertion. If necessary, evict the oldest spans (lowest score) to maintain the cap. Insert the check and eviction logic before or after the `zadd` as appropriate.

src/sentry/spans/consumers/process/factory.py (1)

128-147: process_batch builds a list of all spans in memory before processing, which can cause high memory usage and slowdowns for large batches.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/spans/consumers/process/factory.py, lines 128-147, the `process_batch` function accumulates all spans in a list before passing them to `buffer.process_spans`, which can cause high memory usage and slowdowns for large batches. Refactor this code to process each span immediately (e.g., call `buffer.process_spans([span], now=timestamp)` inside the loop) instead of building a large list, while preserving the batch's minimum timestamp for the return value.

src/sentry/utils/cursors.py (4)

35-39: Cursor.__eq__ does not check type of other, so comparing to unrelated objects may raise AttributeError instead of returning False.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 5/5
  • Urgency Impact: 2/5
  • Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/utils/cursors.py, lines 35-39, the __eq__ method of Cursor does not check the type of 'other', so comparing to an unrelated object may raise AttributeError. Update __eq__ to first check isinstance(other, type(self)) and return False if not, before comparing attributes.

151-170: _build_next_values and _build_prev_values both iterate over the entire results sequence in reverse or forward order to compute offsets, which can be O(n) per page and cause significant performance degradation for large result sets or high-frequency pagination.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 2/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Optimize the offset calculation in `src/sentry/utils/cursors.py` lines 151-170 (`_build_next_values`). The current implementation iterates over the entire `results` sequence, which is O(n) per page and can cause significant performance degradation for large result sets. Refactor the code to minimize iteration, leveraging the sorted nature of the data and breaking early when possible. Ensure the logic remains correct for both ascending and descending order.

244-285: build_cursor has a very large argument list (8 parameters), which is a major maintainability issue and increases the risk of incorrect usage in a large codebase.

📊 Impact Scores:

  • Production Impact: 2/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):

Refactor `build_cursor` in `src/sentry/utils/cursors.py` (lines 244-285) to reduce the number of arguments. Consider grouping related parameters into a configuration object or dataclass to improve maintainability and reduce the risk of incorrect usage. Ensure the function remains backward compatible and easy to use.

53-61: Cursor.from_string and StringCursor.from_string do not validate that the offset and is_prev fields are within safe/expected ranges, allowing attackers to supply large or negative offsets that could cause denial-of-service or bypass pagination limits.

📊 Impact Scores:

  • Production Impact: 3/5
  • Fix Specificity: 3/5
  • Urgency Impact: 2/5
  • Total Score: 8/15

🤖 AI Agent Prompt (Copy & Paste Ready):

In src/sentry/utils/cursors.py, lines 53-61, the `Cursor.from_string` method does not validate that the parsed `offset` and `is_prev` values are within safe/expected ranges. This allows attackers to supply large or negative offsets, potentially causing denial-of-service or bypassing pagination limits. Update the method to enforce that `offset` is non-negative and within a reasonable upper bound (e.g., 0 <= offset <= 10000), and that `is_prev` is either 0 or 1. Raise ValueError if these checks fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants