ref(explorer): update issue tags and timeseries by time range#106316
ref(explorer): update issue tags and timeseries by time range#106316
Conversation
| ) | ||
| ) | ||
|
|
||
| return get_all_tags_overview(group, tag_keys=tag_keys) |
There was a problem hiding this comment.
does this replicate the "other" value that the tagstore endpoint does? we should make sure the output formats are the same in general
There was a problem hiding this comment.
yeah "other" is just the difference between the TagKey.count and sum(topValues.count). By using the total event count of the time range, I'm leaving room for other (possible overestimate if not all events contain the key), but if I used sum(topValues.count) then there would never be an other
There was a problem hiding this comment.
@roaga the current autofix tag percentages use sum(count) for value in top_values as the denominator. The overall key count isn't used for anything except a display in seer formatted output. I think most cases this sum is close enough to the key's count, so we can use it as an approximation or just omit it, does that make sense?
Another approximation would be the total event count for the time range
| ), f"Expected total count {expected_total}, got {total_count}" | ||
|
|
||
|
|
||
| class TestGetGroupTagsOverview(APITestCase, SnubaTestCase): |
There was a problem hiding this comment.
would like to see tests for cases where there are many values and we would get "other" and for cases with "(empty)" values. and also generally verifying both approaches return the same thing if the time range is the same
| def get_retention_boundary(organization: Organization, has_timezone: bool) -> datetime: | ||
| """Get the minimum datetime within retention, based on current time.""" | ||
| retention_days = quotas.backend.get_event_retention(organization=organization) or 90 | ||
| now = datetime.now(UTC) if has_timezone else datetime.now(UTC).replace(tzinfo=None) | ||
| return now - timedelta(days=retention_days) |
There was a problem hiding this comment.
The event retention actually changes based on the event type now. Is this specifically for errors?
There was a problem hiding this comment.
from the get_event_retention fx, looks like this is the org-level policy, not for any specific event type. By event type do you mean the data category, or errors vs performance?
Returns the retention for events in the given organization in days.
Returns ``None`` if events are to be stored indefinitely.
:param organization: The organization model.
:param category: Return the retention policy for this data category.
If this is not given, return the org-level policy.
src/sentry/seer/explorer/tools.py
Outdated
| start_dt = datetime.fromisoformat(start) if start else None | ||
| end_dt = datetime.fromisoformat(end) if end else None | ||
| start_dt, end_dt = get_group_date_range(group, organization, start_dt, end_dt) | ||
| if start_dt >= end_dt: |
There was a problem hiding this comment.
Is this possible in practice? If so, when does this happen? Does it make sense to just swap the 2 and add some buffer to force it into a valid range?
Logging a warning is fine too, just trying to understand why this exists.
There was a problem hiding this comment.
it happens when the passed start/end are both out of retention range, or the first and last seen are both out of range. I think I'd rather have it so the util raises in these cases, updating..
| selected_period, selected_delta, interval = p, d, i | ||
| break | ||
| stats_period = stats_period or "90d" | ||
| selected_period = selected_period or "90d" |
There was a problem hiding this comment.
Defaulting to 90d seems very expensive. Is this necessary?
There was a problem hiding this comment.
this line is mainly for typecheck and shouldn't happen in practice with retention clamping. selected_period is the minimum period (24h, 7d, 14d, 30d, 90d) that's >= end_dt - start_dt. So it'd only be 90d if we pass a really long date range
| # Aggregate query for total events with the tag key in the time range. | ||
| # TODO: assess performance, parallelize or find an alternative | ||
| # As an estimate, could make a single query and no has filter | ||
| count_result = execute_table_query( | ||
| org_id=organization.id, | ||
| dataset=dataset, | ||
| fields=["count()"], | ||
| query=f"issue:{group.qualified_short_id} has:{key}", | ||
| project_ids=[group.project_id], | ||
| start=start, | ||
| end=end, | ||
| per_page=1, | ||
| ) |
There was a problem hiding this comment.
This is a very expensive way to query tag frequencies as you'll be reading the tags column (which can be huge) repeatedly. Looks like the max is 100? We should consider other ways of doing this.
| end=end, | ||
| per_page=1, | ||
| ) | ||
| total_count = (count_result or {}).get("data", [{}])[0].get("count()", 0) |
There was a problem hiding this comment.
Bug: The code may raise an IndexError when execute_table_query returns a response with an empty data list, as [0] is accessed on an empty list.
Severity: MEDIUM
Suggested Fix
Modify the logic to safely handle an empty data list. First, retrieve the list, then check if it's non-empty before accessing its first element. For example: data = (count_result or {}).get("data") followed by total_count = data[0].get("count()", 0) if data else 0.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/seer/explorer/tools.py#L972
Potential issue: The line `total_count = (count_result or {}).get("data",
[{}])[0].get("count()", 0)` can raise an `IndexError`. This occurs when the
`execute_table_query` function returns a result where the "data" key contains an empty
list, such as `{"data": []}`. This can happen when a query for a tag key finds no
matching events. The code attempts to access the first element of this empty list,
causing the error. While the exception is caught and results in a graceful degradation
where `tags_overview` becomes `None`, it is still an unexpected runtime error that
impacts reliability.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| end=end, | ||
| per_page=1, | ||
| ) | ||
| total_count = (count_result or {}).get("data", [{}])[0].get("count()", 0) |
There was a problem hiding this comment.
Empty data array causes IndexError crash
Low Severity
The expression (count_result or {}).get("data", [{}])[0] will raise IndexError if count_result contains {"data": []} (empty list). The default [{}] is only used when the "data" key is missing, not when it exists but is empty. If execute_table_query ever returns an empty data array, this line crashes.
Updates issue tag distribution and timeseries to respect the optional start/end params passed by the agent. Achieves this by hitting the events-facets endpoint for topValues, and doing an aggregate table query for total event count. The response is reshaped and passed to the same autofix tags overview util. If start/end isn't passed we still do the original tagstore query, as it's more performant.
Facets (tags) returned is limited to 1000, which matches tagstore
todo: update tool code or desc?