Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds lightweight instrumentation to the gettrace endpoint to identify performance bottlenecks in the request processing pipeline. The instrumentation is focused on the _process_results function, which is hypothesized to be a bottleneck, while also adding spans to other key functions for high-level visibility.
Changes:
- Added
@with_span(op="function")decorators to_build_query,_process_results, and_query_item_groupfunctions for Sentry APM tracing - Added detailed timing measurements within
_process_resultsto track time spent adding attributes and sorting attributes per row - Wrapped the response assembly logic in
_executewith a span for visibility - Added span data to record rows processed and cumulative time spent on attribute processing operations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -527,8 +541,23 @@ def add_attribute(key: str, value: Any) -> None: | |||
| key=attrgetter("key.name"), | |||
| ), | |||
| ) | |||
|
|
|||
| # End timing sorting | |||
| t2 = time.perf_counter() | |||
There was a problem hiding this comment.
The timing measurement for "sort_attributes_seconds" includes not only the sorting operation but also the construction of the GetTraceResponse.Item object. If the goal is to specifically measure the sorting performance, consider moving t2 = time.perf_counter() immediately after the sorted() operation completes, before the GetTraceResponse.Item construction. This would provide more accurate timing for the sorting operation alone versus the entire item construction process.
There was a problem hiding this comment.
Protobuf message construction is negligible overhead
| ) | ||
|
|
||
| # End timing sorting | ||
| t2 = time.perf_counter() |
There was a problem hiding this comment.
Why don't we have 2 spans here instead of manually getting timings for this? That's what spans are for.
There was a problem hiding this comment.
We do the attribute adding and sorting for every row, so if we have 2 spans here, we'd have 2 spans for every row, which is a lot of spans and will bloat up the trace waterfall. We also don't care about the adding and sorting time for individual rows, we care about the aggregated times across all rows in the returned data.
A span represents a contiguous operation, but the loop interleaves attributes addition and attributes sort across all the rows.
I could create two spans after the loop completes and manually set timestamps to reflect the aggregated durations, but that felt hacky.
There was a problem hiding this comment.
Then separate the 2 operations in 2 different loops. It's important to be able to visualize where the time is spent in the trace waterfall, so we need spans for this. Having attributes with some data is not fully getting us there and people will still have ask where the time is spent.
Plus, if sorting ends up being a problem, easier to remove too.
743ee28 to
e1313eb
Compare
This reverts commit e1313eb.
385d9ba to
aac6a31
Compare
| value=attribute_value, | ||
| ) | ||
| # First pass: parse rows and build attribute dicts | ||
| parsed_rows: list[tuple[str, Timestamp, dict[str, GetTraceResponse.Item.Attribute]]] = [] |
There was a problem hiding this comment.
alternatively, we can just create the GetTraceResponse.Item with unsorted attributes, and then sort them later. I decided against this approach because Protobuf's RepeatedCompositeContainer.sort() internally copies elements to a Python list, sorts it, clears the container, and re-adds them all. So each attribute gets inserted into the protobuf container twice — once during construction (unsorted), and once during the sort (after clear + re-add).
Is this going to be surpassed by the sorting overhead? Maybe. Best to keep things as same as before (single loop)
aac6a31 to
3a7151d
Compare
Approach:
Goal: get a high-level signal on where the slowdown is (e.g., validate whether post-processing is the issue), not detailed diagnostics (what is the longest it took to process a row or how many attributes such a row has)
Plan to iterate across multiple PRs to progressively narrow down the bottleneck.
Avoid instrumenting every part of the endpoint to prevent trace waterfall bloat.
Hypothesis: _process_results is the likely bottleneck, so instrumentation is focused there.
Instrumentation is intentionally lightweight until we have stronger evidence.
Some instrumentation may be removed as we rule out potential bottlenecks.
execute
├── _query_item_group ×N (one per item)
│ ├── _build_query
│ ├── run_query
│ └── _process_results
│ ├── add_attributes
│ └── sort_attributes
└── assemble_response