ref(crons): Reorganize incident creation / issue occurrence logic #8

everettbu · 2025-07-29T00:32:09Z

Test 8

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

greptile-apps

Greptile Summary

This PR implements a significant refactoring of the monitor incident creation and issue occurrence logic within Sentry's cron monitoring system. The changes extract complex functionality from the monolithic mark_failed.py file into two new dedicated modules:

incidents.py - Contains the try_incident_threshold function that centralizes incident creation logic, handling different monitor status states and failure thresholds
incident_occurrence.py - Houses functions for creating issue occurrences in Kafka, generating human-readable failure messages, and building monitor environment context

The refactoring also introduces a new SimpleCheckIn TypedDict in types.py to optimize database queries by using lightweight dictionary objects instead of full Django model instances when processing check-ins.

The main mark_failed function now delegates incident threshold handling to the new try_incident_threshold function, creating a cleaner separation of concerns. This modular approach improves code organization by giving each module a single, focused responsibility: mark_failed.py handles monitor state updates, incidents.py manages incident threshold logic, and incident_occurrence.py handles issue creation and messaging.

This refactoring maintains identical functionality while making the codebase more maintainable, testable, and following the single responsibility principle more closely. The changes also include performance optimizations through the use of .values() queries with the new SimpleCheckIn type.

Confidence score: 3/5

• This refactoring appears safe but has some potential concurrency and database consistency concerns that need attention
• The score reflects database transaction safety issues in incidents.py and a potential bug in incident_occurrence.py, plus the lack of comprehensive testing for concurrent scenarios
• Files needing more attention: src/sentry/monitors/logic/incidents.py for database race conditions and src/sentry/monitors/logic/incident_occurrence.py for the context building bug

_{4 files reviewed, 2 comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps · 2025-07-29T17:19:27Z

src/sentry/monitors/logic/incident_occurrence.py

+
+def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]):
+    """
+    Builds a humam readible string from a list of failed check-ins.


syntax: 'humam' is misspelled, should be 'human'

Suggested change

Builds a humam readible string from a list of failed check-ins.

Builds a human readible string from a list of failed check-ins.

greptile-apps · 2025-07-29T17:19:28Z

src/sentry/monitors/logic/incident_occurrence.py

+        "id": str(monitor_environment.monitor.guid),
+        "slug": str(monitor_environment.monitor.slug),
+        "name": monitor_environment.monitor.name,
+        "config": monitor_environment.monitor.config,


logic: Using monitor_environment.monitor.config instead of the modified config variable defeats the purpose of the schedule_type transformation on line 162

Suggested change

"config": monitor_environment.monitor.config,

"config": config,

GitHoobar · 2025-12-24T09:58:23Z

Review Summary

🏷️ Draft Comments (6)

Skipped posting 6 draft comments that were valid but scored below your review threshold (>=13/15). Feel free to update them here.
src/sentry/monitors/logic/incident_occurrence.py (2)
168-168: get_monitor_environment_context returns the original monitor.config dict, not the copied/modified config, so changes to schedule_type are ignored and the returned context may be inconsistent.

📊 Impact Scores:

Production Impact: 4/5

Fix Specificity: 5/5

Urgency Impact: 3/5

Total Score: 12/15

🤖 AI Agent Prompt (Copy & Paste Ready):
In src/sentry/monitors/logic/incident_occurrence.py, line 168, the function `get_monitor_environment_context` returns the original `monitor.config` instead of the local `config` variable that may have been modified (e.g., with a humanized `schedule_type`). Change line 168 to return `"config": config,` so that the returned context includes the correct, possibly modified configuration.
139-143: get_failure_reason uses checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys() in a loop, causing O(n*m) time for large failed_checkins; this can be O(n) by using in on the dict directly.

📊 Impact Scores:

Production Impact: 2/5

Fix Specificity: 5/5

Urgency Impact: 2/5

Total Score: 9/15

🤖 AI Agent Prompt (Copy & Paste Ready):
In src/sentry/monitors/logic/incident_occurrence.py, lines 139-143, the code uses `checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()` inside a loop, which is inefficient for large datasets. Change this to `checkin["status"] in HUMAN_FAILURE_STATUS_MAP` to improve performance by reducing time complexity from O(n*m) to O(n).
src/sentry/monitors/logic/incidents.py (1)
40-48: MonitorCheckIn.objects.filter(...).values(...).order_by(...).values(...) fetches all matching check-ins, then slices and reverses in Python, causing unnecessary DB load and memory use for large datasets.

📊 Impact Scores:

Production Impact: 3/5

Fix Specificity: 5/5

Urgency Impact: 2/5

Total Score: 10/15

🤖 AI Agent Prompt (Copy & Paste Ready):
Optimize database query and memory usage in src/sentry/monitors/logic/incidents.py lines 40-48. The current code fetches all matching check-ins, then slices and reverses in Python, which is inefficient for large datasets. Refactor so that only the required number of check-ins are fetched from the database, and then reverse the result in Python. Replace the current logic with a query that slices before converting to a list and reversing.
src/sentry/monitors/logic/mark_failed.py (2)
76-79: mark_failed returns the result of try_incident_threshold, but if try_incident_threshold raises an exception (e.g., due to DB or logic error), the monitor environment will have already been updated, potentially leaving the system in an inconsistent state.

📊 Impact Scores:

Production Impact: 3/5

Fix Specificity: 5/5

Urgency Impact: 3/5

Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):
In src/sentry/monitors/logic/mark_failed.py, lines 76-79, the function `mark_failed` calls `try_incident_threshold` after updating the monitor environment. If `try_incident_threshold` raises an exception, the monitor environment may be left in an inconsistent state. Please wrap the call to `try_incident_threshold` in a try/except block, log the exception, and return False if an error occurs. Insert the following code:

    try:
        return try_incident_threshold(failed_checkin, failure_issue_threshold, received)
    except Exception as e:
        logger.exception("Failed to process incident threshold after marking monitor as failed")
        return False

This should replace the current direct return statement.
52-76: mark_failed performs two sequential DB updates (update, then refresh_from_db) for every failed check-in, which can cause significant DB load and contention under high check-in volume.

📊 Impact Scores:

Production Impact: 3/5

Fix Specificity: 2/5

Urgency Impact: 2/5

Total Score: 7/15

🤖 AI Agent Prompt (Copy & Paste Ready):
Optimize DB update pattern in src/sentry/monitors/logic/mark_failed.py lines 52-76. The current code performs a filter().update() followed by refresh_from_db(), causing two DB roundtrips and potential contention under high load. Refactor to use select_for_update() in a transaction, update the fields directly, and save, reducing DB load and ensuring atomicity. Replace the current update/refresh logic with a single transaction block as shown.
src/sentry/monitors/types.py (1)
83-85: processing_key property may return a string with 'None' as the environment if environment is missing, leading to incorrect grouping of check-ins.

📊 Impact Scores:

Production Impact: 3/5

Fix Specificity: 5/5

Urgency Impact: 3/5

Total Score: 11/15

🤖 AI Agent Prompt (Copy & Paste Ready):
In src/sentry/monitors/types.py, lines 83-85, the `processing_key` property may return a string with 'None' as the environment if the 'environment' key is missing from the payload, which can cause incorrect grouping of check-ins. Update the code so that if 'environment' is missing or None, an empty string is used instead. Only change these lines.

ref(crons): Reorganize incident creation / issue occurrence logic (#8…

c3b32a7

…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527

greptile-apps bot reviewed Jul 29, 2025

View reviewed changes

This comment was marked as duplicate.

Sign in to view

zaibkhan mentioned this pull request Sep 9, 2025

ref(crons): Reorganize incident creation / issue occurrence logic codoki-ai/sentry-codoki#10

Open

This was referenced Nov 14, 2025

ref(crons): Reorganize incident creation / issue occurrence logic ai-code-review-evaluations/greptile-sentry#8

Open

ref(crons): Reorganize incident creation / issue occurrence logic ai-code-review-evaluations/claude_code-sentry#9

Open

This was referenced Dec 31, 2025

ref(crons): Reorganize incident creation / issue occurrence logic AI-Code-Review-Evals/claude_code-sentry#8

Draft

ref(crons): Reorganize incident creation / issue occurrence logic AI-Code-Review-Evals/greptile-sentry#14

Open

AncD8972 mentioned this pull request Jan 7, 2026

[Review] ref(crons): Reorganize incident creation / issue occurrence logic zju-sxf-code-review/sentry#8

Open

wangyue6761 mentioned this pull request Jan 7, 2026

ref(crons): Reorganize incident creation / issue occurrence logic wangyue6761/sentry-greptile#8

Open

glittering-universe mentioned this pull request Jan 9, 2026

ref(crons): Reorganize incident creation / issue occurrence logic glittering-universe/sentry-costrict#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(crons): Reorganize incident creation / issue occurrence logic #8

ref(crons): Reorganize incident creation / issue occurrence logic #8

Uh oh!

everettbu commented Jul 29, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jul 29, 2025

Uh oh!

greptile-apps bot Jul 29, 2025

Uh oh!

This comment was marked as duplicate.

Uh oh!

GitHoobar commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Builds a humam readible string from a list of failed check-ins.
	Builds a human readible string from a list of failed check-ins.

	"config": monitor_environment.monitor.config,
	"config": config,

ref(crons): Reorganize incident creation / issue occurrence logic #8

Are you sure you want to change the base?

ref(crons): Reorganize incident creation / issue occurrence logic #8

Uh oh!

Conversation

everettbu commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 3/5

Uh oh!

greptile-apps bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

GitHoobar commented Dec 24, 2025

Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

everettbu commented Jul 29, 2025 •

edited

Loading