Skip to content

Comments

instrument gettrace endpoint more#7741

Merged
xurui-c merged 5 commits intomasterfrom
rachel/instrument
Feb 17, 2026
Merged

instrument gettrace endpoint more#7741
xurui-c merged 5 commits intomasterfrom
rachel/instrument

Conversation

@xurui-c
Copy link
Member

@xurui-c xurui-c commented Feb 17, 2026

Approach:

  • Goal: get a high-level signal on where the slowdown is (e.g., validate whether post-processing is the issue), not detailed diagnostics (what is the longest it took to process a row or how many attributes such a row has)

  • Plan to iterate across multiple PRs to progressively narrow down the bottleneck.

  • Avoid instrumenting every part of the endpoint to prevent trace waterfall bloat.

  • Hypothesis: _process_results is the likely bottleneck, so instrumentation is focused there.

  • Instrumentation is intentionally lightweight until we have stronger evidence.

  • Some instrumentation may be removed as we rule out potential bottlenecks.

    execute
    ├── _query_item_group ×N (one per item)
    │ ├── _build_query
    │ ├── run_query
    │ └── _process_results
    │ ├── add_attributes
    │ └── sort_attributes
    └── assemble_response

@xurui-c xurui-c marked this pull request as ready for review February 17, 2026 16:36
@xurui-c xurui-c requested a review from a team as a code owner February 17, 2026 16:36
Copilot AI review requested due to automatic review settings February 17, 2026 16:36
@xurui-c xurui-c requested a review from a team as a code owner February 17, 2026 16:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds lightweight instrumentation to the gettrace endpoint to identify performance bottlenecks in the request processing pipeline. The instrumentation is focused on the _process_results function, which is hypothesized to be a bottleneck, while also adding spans to other key functions for high-level visibility.

Changes:

  • Added @with_span(op="function") decorators to _build_query, _process_results, and _query_item_group functions for Sentry APM tracing
  • Added detailed timing measurements within _process_results to track time spent adding attributes and sorting attributes per row
  • Wrapped the response assembly logic in _execute with a span for visibility
  • Added span data to record rows processed and cumulative time spent on attribute processing operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 536 to 546
@@ -527,8 +541,23 @@ def add_attribute(key: str, value: Any) -> None:
key=attrgetter("key.name"),
),
)

# End timing sorting
t2 = time.perf_counter()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timing measurement for "sort_attributes_seconds" includes not only the sorting operation but also the construction of the GetTraceResponse.Item object. If the goal is to specifically measure the sorting performance, consider moving t2 = time.perf_counter() immediately after the sorted() operation completes, before the GetTraceResponse.Item construction. This would provide more accurate timing for the sorting operation alone versus the entire item construction process.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf message construction is negligible overhead

)

# End timing sorting
t2 = time.perf_counter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we have 2 spans here instead of manually getting timings for this? That's what spans are for.

Copy link
Member Author

@xurui-c xurui-c Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do the attribute adding and sorting for every row, so if we have 2 spans here, we'd have 2 spans for every row, which is a lot of spans and will bloat up the trace waterfall. We also don't care about the adding and sorting time for individual rows, we care about the aggregated times across all rows in the returned data.

A span represents a contiguous operation, but the loop interleaves attributes addition and attributes sort across all the rows.

I could create two spans after the loop completes and manually set timestamps to reflect the aggregated durations, but that felt hacky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then separate the 2 operations in 2 different loops. It's important to be able to visualize where the time is spent in the trace waterfall, so we need spans for this. Having attributes with some data is not fully getting us there and people will still have ask where the time is spent.

Plus, if sorting ends up being a problem, easier to remove too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid reasons 👍

This reverts commit e1313eb.
value=attribute_value,
)
# First pass: parse rows and build attribute dicts
parsed_rows: list[tuple[str, Timestamp, dict[str, GetTraceResponse.Item.Attribute]]] = []
Copy link
Member Author

@xurui-c xurui-c Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, we can just create the GetTraceResponse.Item with unsorted attributes, and then sort them later. I decided against this approach because Protobuf's RepeatedCompositeContainer.sort() internally copies elements to a Python list, sorts it, clears the container, and re-adds them all. So each attribute gets inserted into the protobuf container twice — once during construction (unsorted), and once during the sort (after clear + re-add).

Is this going to be surpassed by the sorting overhead? Maybe. Best to keep things as same as before (single loop)

@xurui-c xurui-c requested a review from phacops February 17, 2026 21:46
@xurui-c xurui-c merged commit 72eae32 into master Feb 17, 2026
34 checks passed
@xurui-c xurui-c deleted the rachel/instrument branch February 17, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants