Skip to content

ref(explorer): update issue tags and timeseries by time range#106316

Closed
aliu39 wants to merge 12 commits intomasterfrom
aliu/tags-overview
Closed

ref(explorer): update issue tags and timeseries by time range#106316
aliu39 wants to merge 12 commits intomasterfrom
aliu/tags-overview

Conversation

@aliu39
Copy link
Member

@aliu39 aliu39 commented Jan 14, 2026

Updates issue tag distribution and timeseries to respect the optional start/end params passed by the agent. Achieves this by hitting the events-facets endpoint for topValues, and doing an aggregate table query for total event count. The response is reshaped and passed to the same autofix tags overview util. If start/end isn't passed we still do the original tagstore query, as it's more performant.

Facets (tags) returned is limited to 1000, which matches tagstore

todo: update tool code or desc?

@aliu39 aliu39 requested a review from a team as a code owner January 14, 2026 20:58
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 14, 2026
)
)

return get_all_tags_overview(group, tag_keys=tag_keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this replicate the "other" value that the tagstore endpoint does? we should make sure the output formats are the same in general

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah "other" is just the difference between the TagKey.count and sum(topValues.count). By using the total event count of the time range, I'm leaving room for other (possible overestimate if not all events contain the key), but if I used sum(topValues.count) then there would never be an other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaga the current autofix tag percentages use sum(count) for value in top_values as the denominator. The overall key count isn't used for anything except a display in seer formatted output. I think most cases this sum is close enough to the key's count, so we can use it as an approximation or just omit it, does that make sense?

Another approximation would be the total event count for the time range

), f"Expected total count {expected_total}, got {total_count}"


class TestGetGroupTagsOverview(APITestCase, SnubaTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see tests for cases where there are many values and we would get "other" and for cases with "(empty)" values. and also generally verifying both approaches return the same thing if the time range is the same

@aliu39 aliu39 requested a review from Zylphrex January 15, 2026 23:30
cursor[bot]

This comment was marked as outdated.

Comment on lines +467 to +471
def get_retention_boundary(organization: Organization, has_timezone: bool) -> datetime:
"""Get the minimum datetime within retention, based on current time."""
retention_days = quotas.backend.get_event_retention(organization=organization) or 90
now = datetime.now(UTC) if has_timezone else datetime.now(UTC).replace(tzinfo=None)
return now - timedelta(days=retention_days)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event retention actually changes based on the event type now. Is this specifically for errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the get_event_retention fx, looks like this is the org-level policy, not for any specific event type. By event type do you mean the data category, or errors vs performance?

        Returns the retention for events in the given organization in days.
        Returns ``None`` if events are to be stored indefinitely.

        :param organization: The organization model.
        :param category: Return the retention policy for this data category.
                         If this is not given, return the org-level policy.

start_dt = datetime.fromisoformat(start) if start else None
end_dt = datetime.fromisoformat(end) if end else None
start_dt, end_dt = get_group_date_range(group, organization, start_dt, end_dt)
if start_dt >= end_dt:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible in practice? If so, when does this happen? Does it make sense to just swap the 2 and add some buffer to force it into a valid range?

Logging a warning is fine too, just trying to understand why this exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it happens when the passed start/end are both out of retention range, or the first and last seen are both out of range. I think I'd rather have it so the util raises in these cases, updating..

selected_period, selected_delta, interval = p, d, i
break
stats_period = stats_period or "90d"
selected_period = selected_period or "90d"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to 90d seems very expensive. Is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is mainly for typecheck and shouldn't happen in practice with retention clamping. selected_period is the minimum period (24h, 7d, 14d, 30d, 90d) that's >= end_dt - start_dt. So it'd only be 90d if we pass a really long date range

Comment on lines +959 to +971
# Aggregate query for total events with the tag key in the time range.
# TODO: assess performance, parallelize or find an alternative
# As an estimate, could make a single query and no has filter
count_result = execute_table_query(
org_id=organization.id,
dataset=dataset,
fields=["count()"],
query=f"issue:{group.qualified_short_id} has:{key}",
project_ids=[group.project_id],
start=start,
end=end,
per_page=1,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very expensive way to query tag frequencies as you'll be reading the tags column (which can be huge) repeatedly. Looks like the max is 100? We should consider other ways of doing this.

end=end,
per_page=1,
)
total_count = (count_result or {}).get("data", [{}])[0].get("count()", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code may raise an IndexError when execute_table_query returns a response with an empty data list, as [0] is accessed on an empty list.
Severity: MEDIUM

Suggested Fix

Modify the logic to safely handle an empty data list. First, retrieve the list, then check if it's non-empty before accessing its first element. For example: data = (count_result or {}).get("data") followed by total_count = data[0].get("count()", 0) if data else 0.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/explorer/tools.py#L972

Potential issue: The line `total_count = (count_result or {}).get("data",
[{}])[0].get("count()", 0)` can raise an `IndexError`. This occurs when the
`execute_table_query` function returns a result where the "data" key contains an empty
list, such as `{"data": []}`. This can happen when a query for a tag key finds no
matching events. The code attempts to access the first element of this empty list,
causing the error. While the exception is caught and results in a graceful degradation
where `tags_overview` becomes `None`, it is still an unexpected runtime error that
impacts reliability.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

end=end,
per_page=1,
)
total_count = (count_result or {}).get("data", [{}])[0].get("count()", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty data array causes IndexError crash

Low Severity

The expression (count_result or {}).get("data", [{}])[0] will raise IndexError if count_result contains {"data": []} (empty list). The default [{}] is only used when the "data" key is missing, not when it exists but is empty. If execute_table_query ever returns an empty data array, this line crashes.

Fix in Cursor Fix in Web

@aliu39 aliu39 marked this pull request as draft January 23, 2026 00:03
@aliu39 aliu39 closed this Jan 26, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Feb 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants