Skip to content
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 22 additions & 13 deletions src/sentry/tasks/delete_pending_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

from sentry.api.helpers.group_index.delete import schedule_group_deletion_tasks
from sentry.models.group import Group, GroupStatus
from sentry.models.grouphistory import GroupHistory, GroupHistoryStatus
from sentry.silo.base import SiloMode
from sentry.tasks.base import instrumented_task
from sentry.taskworker.namespaces import deletion_tasks
Expand Down Expand Up @@ -34,29 +35,37 @@ def delete_pending_groups() -> None:
and schedules deletion tasks for them. Groups are batched by project to ensure
efficient deletion processing.

Only processes groups with last_seen between 6 hours and 90 days ago to avoid
processing very recent groups (safety window) or groups past retention period.
Only processes groups where status was changed between 6 hours and 90 days ago
to avoid processing very recent groups (safety window) or groups past retention period.
"""
statuses_to_delete = [GroupStatus.PENDING_DELETION, GroupStatus.DELETION_IN_PROGRESS]

# Just using status to take advantage of the status DB index
groups = Group.objects.filter(status__in=statuses_to_delete).values_list(
"id", "project_id", "last_seen"
)[:BATCH_LIMIT]

if not groups:
logger.info("delete_pending_groups.no_groups_found")
return
groups = list(
Group.objects.filter(status__in=statuses_to_delete).values_list(
"id", "project_id", "last_seen"
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing batch limit causes memory issues

The BATCH_LIMIT slice was removed from the query, causing all groups with deletion statuses to be loaded into memory instead of the first 1000. This transforms a batched query into a full table scan that could load thousands or millions of groups, potentially exhausting server memory and causing severe performance degradation.

Fix in Cursor Fix in Web


# Process groups between 6 hours and 90 days old
now = timezone.now()
min_last_seen = now - timedelta(days=MAX_LAST_SEEN_DAYS)
max_last_seen = now - timedelta(hours=MIN_LAST_SEEN_HOURS)
status_change_threshold = now - timedelta(hours=MIN_LAST_SEEN_HOURS)

# Group by project_id to ensure all groups in a batch belong to the same project
groups_by_project: dict[int, list[int]] = defaultdict(list)
for group_id, project_id, last_seen in groups:
if last_seen >= min_last_seen and last_seen <= max_last_seen:
groups_by_project[project_id].append(group_id)
if last_seen >= min_last_seen:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Still filtering by last_seen contradicts PR intent

The code still filters groups by last_seen >= min_last_seen, excluding groups with old last_seen values. This contradicts the PR description stating "Groups requested for deletion can have any last_seen value." A group deleted recently but with very old last_seen would incorrectly be excluded from processing.

Fix in Cursor Fix in Web

group_history = GroupHistory.objects.filter(
group_id=group_id,
status__in=[GroupHistoryStatus.DELETED],
date_added__lte=status_change_threshold,
).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Filtering by nonexistent GroupHistory records

The code filters for GroupHistoryStatus.DELETED records, but no GroupHistory entry with this status appears to be created when groups are marked as PENDING_DELETION or DELETION_IN_PROGRESS. The issue_deleted signal only records analytics events, not GroupHistory. This means the filter will never match any records, preventing any groups from being processed for deletion.

Fix in Cursor Fix in Web

if group_history and group_history.date_added <= status_change_threshold:
groups_by_project[project_id].append(group_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: N+1 query problem in deletion loop

The loop executes a separate database query for GroupHistory for each group that passes the last_seen check. If thousands of groups meet the criteria, this creates thousands of individual queries instead of using a bulk query or join, significantly degrading performance and increasing database load.

Fix in Cursor Fix in Web


if not groups_by_project:
logger.info("delete_pending_groups.no_groups_in_limbo_found")
return

total_groups = sum(len(group_ids) for group_ids in groups_by_project.values())
total_tasks = 0
Expand Down
Loading