JIRA <-> IMPACT sync for Status and Priority by juanjogarciatorres43 · Pull Request #209 · ManoManoTech/firefighter-incident

juanjogarciatorres43 · 2026-01-23T00:16:24Z

https://manomano.atlassian.net/browse/GT-1773

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

for more information, see https://pre-commit.ci

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

codecov-commenter · 2026-01-23T08:43:46Z

Codecov Report

❌ Patch coverage is 55.86207% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.93%. Comparing base (00c08a8) to head (2cf8887).
⚠️ Report is 29 commits behind head on main.

Files with missing lines	Patch %	Lines
src/firefighter/raid/serializers.py	50.00%	55 Missing and 16 partials ⚠️
src/firefighter/raid/signals/incident_updated.py	58.97%	41 Missing and 7 partials ⚠️
src/firefighter/raid/utils.py	30.00%	5 Missing and 2 partials ⚠️
src/firefighter/raid/client.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #209      +/-   ##
==========================================
- Coverage   67.08%   66.93%   -0.16%     
==========================================
  Files         212      212              
  Lines       10029    10319     +290     
  Branches     1099     1158      +59     
==========================================
+ Hits         6728     6907     +179     
- Misses       3030     3100      +70     
- Partials      271      312      +41

Flag	Coverage Δ
unittests	`66.93% <55.86%> (-0.16%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

NicolasLafitteMM

Code Review Summary

Thank you for this comprehensive PR! The bidirectional Jira sync is a great feature with excellent documentation and test coverage. However, I've identified a few critical issues that should be addressed before merging.

Critical Issues (Must Fix)

Excessive WARNING logging - Normal operations logged as warnings will pollute production logs
Missing null check - Potential AttributeError when accessing incident.id in loop prevention
Error handling in MITIGATING transition - First transition failure prevents second attempt, leaving Jira in inconsistent state

Medium Priority Issues

Code duplication - _normalize_cache_value() duplicated in two files
Silent error handling - Returning True on failures masks problems
N+1 query potential - JiraTicket lookup inside loop

I'll add inline comments on specific lines with detailed recommendations.

NicolasLafitteMM · 2026-02-04T14:57:20Z

Code Review - Based on Actual Branch Code

I've now checked out the branch and reviewed the actual code. Here are my findings:

✅ What's Excellent

Outstanding documentation - jira-integration.md is exceptionally detailed
Good test coverage - New test file covers main scenarios
Loop prevention mechanism - Cache-based approach is solid
P4+ improvement - Relaxing key event requirements for lower priorities makes sense
Proper null checks - Code handles null incidents correctly (e.g., serializers.py:556-558)

🔴 Critical Issues (Must Fix)

I'll post these as separate comments with detailed recommendations.

NicolasLafitteMM · 2026-02-04T14:57:30Z

🔴 Critical #1: Excessive WARNING logging

File: src/firefighter/raid/signals/incident_updated.py:66-72

Every normal incident update is logged as WARNING, which will severely pollute production logs:

logger.warning(
    "incident_updated handler invoked for incident #%s with status %s; updated_fields=%s event_type=%s",
    getattr(incident, "id", "unknown"),
    incident_update.status,
    updated_fields,
    incident_update.event_type,
)

Impact: This handler is called for EVERY incident status update. In production, this will generate thousands of WARNING logs daily for normal operations.

Fix:

logger.debug(  # or logger.info at most
    "incident_updated handler invoked for incident #%s with status %s; updated_fields=%s event_type=%s",
    getattr(incident, "id", "unknown"),
    incident_update.status,
    updated_fields,
    incident_update.event_type,
)

Principle: WARNING should be reserved for actual problems, not normal operations. This is especially important for monitoring and alerting systems.

NicolasLafitteMM · 2026-02-04T14:57:40Z

🔴 Critical #2: Error handling in MITIGATING double transition

File: src/firefighter/raid/signals/incident_updated.py:98-125

When transitioning to MITIGATING status, if the first transition ("Pending resolution") fails, the return on line 119 prevents attempting the second transition ("in progress"), leaving Jira in an inconsistent state:

if incident_update.status == IncidentStatus.MITIGATING:
    for step in ("Pending resolution", "in progress"):
        try:
            client.transition_issue_auto(
                incident.jira_ticket.id, step, RAID_JIRA_WORKFLOW_NAME
            )
        except Exception:
            logger.exception(...)
            return  # ❌ Gives up completely - second transition never attempted

Problem Scenarios:

First transition fails → Jira stuck at "Pending resolution"
First succeeds but second fails → Jira stuck at "Pending resolution" instead of "in progress"
Both fail → Same result, but we can't distinguish the cases

Recommendation:
Don't return immediately. Either:

Option A: Continue to second transition and let it fail/succeed independently
Option B: Check current Jira status before each transition
Option C: Collect errors and handle appropriately

if incident_update.status == IncidentStatus.MITIGATING:
    errors = []
    for step in ("Pending resolution", "in progress"):
        try:
            logger.debug(...)
            client.transition_issue_auto(
                incident.jira_ticket.id, step, RAID_JIRA_WORKFLOW_NAME
            )
        except Exception as e:
            logger.exception(...)
            errors.append((step, e))
            # Don't return - continue to next step
    
    if not errors:
        logger.info("Successfully transitioned through both steps")
    elif len(errors) == 2:
        logger.error("Both transitions failed")
    else:
        logger.warning("Partial transition completed")
    return

NicolasLafitteMM · 2026-02-04T14:57:51Z

⚠️ Medium #3: Code duplication - `_normalize_cache_value()`

Files:

src/firefighter/raid/serializers.py:58-67
src/firefighter/raid/signals/incident_updated.py:29-39

The function is duplicated in both files with nearly identical implementation (minor comment difference).

Recommendation:
Extract to raid/utils.py or raid/cache_utils.py:

# raid/cache_utils.py
from typing import Any

def normalize_cache_value(value: Any) -> str:
    """Normalize cache values for loop-prevention keys.
    
    Ensures consistent cache key generation across Impact→Jira and Jira→Impact syncs.
    """
    if value is None:
        return ""
    if isinstance(value, str):
        return value.strip().lower()
    try:
        return str(int(value))
    except (TypeError, ValueError):
        return str(value).strip().lower()

Then import in both files:

from firefighter.raid.cache_utils import normalize_cache_value

Benefits:

DRY principle
Single source of truth for normalization logic
Easier to test and maintain
Ensures consistency if logic needs to change

NicolasLafitteMM · 2026-02-04T14:57:59Z

⚠️ Medium #4: Silent error handling in postmortem validation

File: src/firefighter/raid/serializers.py:468-489

Returning True after exceptions suggests success, but actually the sync was skipped:

if incident.needs_postmortem and impact_status == IncidentStatus.CLOSED:
    if not hasattr(incident, "jira_postmortem_for"):
        logger.warning(...)
        return True  # ❌ Misleading - nothing was synced
    
    try:
        is_ready, current_status = jira_postmortem_service.is_postmortem_ready(...)
        if not is_ready:
            logger.warning(...)
            return True  # ❌ Misleading - sync was skipped
    except Exception:
        logger.exception(...)
        return True  # ❌ Misleading - error occurred

Problem:

Monitoring systems can't detect these failures
Debugging is harder when issues are masked as successes
Creates silent failures in production

Recommendations:

Option A (Preferred): Return False to indicate failure:

if not hasattr(incident, "jira_postmortem_for"):
    logger.warning(...)
    return False  # ✅ Clearly indicates sync didn't happen

try:
    is_ready, current_status = jira_postmortem_service.is_postmortem_ready(...)
    if not is_ready:
        logger.warning(...)
        return False  # ✅ Indicates sync was intentionally skipped
except Exception:
    logger.exception(...)
    return False  # ✅ Indicates failure

Option B: At minimum, add clear comments explaining why True is acceptable:

# Returning True because skipping sync is acceptable behavior,
# not an error condition - the webhook was processed successfully
return True

NicolasLafitteMM · 2026-02-04T14:58:14Z

💡 Suggestion #5: Potential N+1 query in webhook processing

File: src/firefighter/raid/serializers.py:328-332, 449, 514

The _get_incident_from_jira_ticket() method is called inside the loop for each change item:

def create(self, validated_data: dict[str, Any]) -> bool:
    changes = validated_data.get("changelog", {}).get("items") or []
    
    for change_item in changes:  # ← Loop over changes
        self._sync_jira_fields_to_incident(...)  # ← Calls _get_incident_from_jira_ticket

def _handle_status_update(...):
    incident = self._get_incident_from_jira_ticket(jira_ticket_key)  # ← DB query

def _handle_priority_update(...):
    incident = self._get_incident_from_jira_ticket(jira_ticket_key)  # ← DB query

Impact:

If a webhook has multiple changes (e.g., status + priority), the same JiraTicket is fetched multiple times
Minor performance issue, but can accumulate at scale

Optimization:
Fetch once at the beginning of create():

def create(self, validated_data: dict[str, Any]) -> bool:
    changes = validated_data.get("changelog", {}).get("items") or []
    if not changes:
        return True
    
    jira_ticket_key = validated_data["issue"].get("key")
    # Fetch JiraTicket + Incident once, outside the loop
    incident = self._get_incident_from_jira_ticket(jira_ticket_key)
    if incident is None:
        return True  # No incident linked, skip all changes
    
    for change_item in changes:
        self._sync_jira_fields_to_incident(
            validated_data, incident, change_item  # Pass pre-fetched incident
        )
    return True

Note: This is a minor optimization. In practice, webhooks rarely have multiple relevant changes, so the impact is small. Consider this a "nice to have" rather than critical.

NicolasLafitteMM · 2026-02-04T14:58:21Z

💡 Suggestion #6: Cache timeout configuration

File: src/firefighter/raid/signals/incident_updated.py:42-48

The 30-second cache timeout is hardcoded:

def _set_impact_to_jira_cache(
    incident_id: Any, field: str, value: Any, timeout: int = 30
) -> None:

Concern:
If Jira webhooks are delayed by more than 30 seconds (network issues, Jira load, etc.), the loop prevention cache will have expired, potentially causing loops.

Recommendation:
Make it configurable via Django settings:

from django.conf import settings

# Default to 60s for safer margin
JIRA_SYNC_CACHE_TIMEOUT = getattr(settings, 'JIRA_SYNC_CACHE_TIMEOUT', 60)

def _set_impact_to_jira_cache(
    incident_id: Any, 
    field: str, 
    value: Any, 
    timeout: int = JIRA_SYNC_CACHE_TIMEOUT
) -> None:
    cache_key = (
        f"sync:impact_to_jira:{incident_id}:{field}:{_normalize_cache_value(value)}"
    )
    cache.set(cache_key, value=True, timeout=timeout)

Benefits:

Ops team can adjust based on observed webhook latency
Easy to increase if loops are detected
No code changes needed to tune in production

Alternative: Just increase the default to 60 seconds if configurability isn't needed.

NicolasLafitteMM · 2026-02-04T14:58:34Z

💡 Code Quality #7: Magic strings for Jira statuses

File: src/firefighter/raid/signals/incident_updated.py:99, 140, etc.

Jira status strings are hardcoded throughout:

for step in ("Pending resolution", "in progress"):  # Line 99
    ...

target_jira_status = "Closed"  # Line 140

Recommendation:
Extract as module-level constants near the status map:

# Module-level constants
JIRA_STATUS_INCOMING = "Incoming"
JIRA_STATUS_PENDING_RESOLUTION = "Pending resolution"
JIRA_STATUS_IN_PROGRESS = "in progress"
JIRA_STATUS_REPORTER_VALIDATION = "Reporter validation"
JIRA_STATUS_CLOSED = "Closed"

IMPACT_TO_JIRA_STATUS_MAP: dict[IncidentStatus, str] = {
    IncidentStatus.OPEN: JIRA_STATUS_INCOMING,
    IncidentStatus.INVESTIGATING: JIRA_STATUS_IN_PROGRESS,
    IncidentStatus.MITIGATING: JIRA_STATUS_IN_PROGRESS,
    IncidentStatus.MITIGATED: JIRA_STATUS_REPORTER_VALIDATION,
    IncidentStatus.POST_MORTEM: JIRA_STATUS_REPORTER_VALIDATION,
    IncidentStatus.CLOSED: JIRA_STATUS_CLOSED,
}

# Then use:
for step in (JIRA_STATUS_PENDING_RESOLUTION, JIRA_STATUS_IN_PROGRESS):
    ...

target_jira_status = JIRA_STATUS_CLOSED

Benefits:

Single source of truth
IDE autocomplete and type checking
Easier refactoring if Jira status names change
Reduces typo risks
More maintainable

NicolasLafitteMM · 2026-02-04T14:58:49Z

❓ Questions for the Author

I have a few important questions:

1. Empty string in `closure_reason` constraint

File: src/firefighter/incidents/models/incident.py:291-292

check=models.Q(closure_reason__in=[*ClosureReason.values, ""])
    | models.Q(closure_reason__isnull=True),

Question: Why allow both empty string ("") AND NULL?

This creates ambiguity:

NULL = no closure reason
"" = empty closure reason (what does this mean?)
Valid enum value = actual reason

Is there a semantic difference between NULL and empty string? If not, consider removing the empty string option for clearer data model.

2. P4+ closure without key events

File: src/firefighter/incidents/models/incident.py:418-428

# Require key events only for P1-P3; P4+ can close without them.
require_milestones = not (self.priority and self.priority.value >= 4)

Questions:

Was this behavior change discussed with stakeholders/product?
Is it documented in user-facing documentation?
What's the rationale for allowing P4+ to skip key events?

This is a significant behavior change that affects how incidents are tracked.

3. Database Migrations

Are there Liquibase migrations for:

The modified closure_reason_valid constraint?
Any other schema changes?

4. Deployment & Rollout Strategy

Feature flag: Can this bidirectional sync be enabled/disabled without deployment?
Gradual rollout: Will this be enabled in staging first, then production?
Rollback plan: What's the plan if issues arise in production?
Monitoring: Are there plans for metrics/alerts on:
- Sync success/failure rates
- Loop prevention cache hits
- Sync latency
- Jira API errors

5. Testing in Production

Have you considered:

Load testing with multiple concurrent updates?
Webhook delay testing (what if Jira is slow)?
Error scenarios (Jira API down, timeout, rate limits)?

Would appreciate your thoughts on these points! 🙏

NicolasLafitteMM · 2026-02-04T14:59:04Z

🎯 Final Verdict & Summary

Great work on this PR! The bidirectional sync is a valuable feature with excellent documentation.

✅ Strengths

Exceptional documentation - jira-integration.md is thorough and detailed
Good test coverage - Main scenarios are tested
Smart loop prevention - Cache-based mechanism is well thought out
Proper null checks - Code handles edge cases correctly
P4+ improvement - Relaxing key events makes sense for lower priorities
CI passing - All checks green ✅

🔴 Blocking Issues (2)

Must be fixed before merge:

Excessive WARNING logging (line 66) - Will pollute production logs
Error handling in MITIGATING transition (line 119) - Can leave Jira in inconsistent state

⚠️ Non-Blocking Issues (5)

Can be addressed in follow-up PRs:

Code duplication - _normalize_cache_value() in two files
Silent error handling - Returning True on failures
N+1 query potential - Minor performance issue
Cache timeout - Consider making configurable
Magic strings - Extract Jira status constants

❓ Questions (5)

Need clarification from author - see previous comment.

📊 Recommendation

Status: ⚠️ Request Changes

Next Steps:

Fix the 2 blocking issues (chore(deps-dev): bump daisyui from 2.52.0 to 4.4.19 #1 and chore(deps-dev): bump postcss from 8.4.31 to 8.4.32 #2)
Answer the questions when you have time
Consider the non-blocking issues for follow-up PRs
Re-request review once fixes are pushed

Note: Issues #3-7 are NOT blocking - you can address them later if preferred. The important fixes are #1 and #2.

Let me know if you need clarification on any of these comments! Happy to discuss approaches. 👍

juanjogarciatorres43 · 2026-02-04T15:26:00Z

⚠️ Medium #4: Silent error handling in postmortem validation

File: src/firefighter/raid/serializers.py:468-489

Returning True after exceptions suggests success, but actually the sync was skipped:

if incident.needs_postmortem and impact_status == IncidentStatus.CLOSED:
    if not hasattr(incident, "jira_postmortem_for"):
        logger.warning(...)
        return True  # ❌ Misleading - nothing was synced
    
    try:
        is_ready, current_status = jira_postmortem_service.is_postmortem_ready(...)
        if not is_ready:
            logger.warning(...)
            return True  # ❌ Misleading - sync was skipped
    except Exception:
        logger.exception(...)
        return True  # ❌ Misleading - error occurred

Problem:

Monitoring systems can't detect these failures
Debugging is harder when issues are masked as successes
Creates silent failures in production

Recommendations:

Option A (Preferred): Return False to indicate failure:

if not hasattr(incident, "jira_postmortem_for"):
    logger.warning(...)
    return False  # ✅ Clearly indicates sync didn't happen

try:
    is_ready, current_status = jira_postmortem_service.is_postmortem_ready(...)
    if not is_ready:
        logger.warning(...)
        return False  # ✅ Indicates sync was intentionally skipped
except Exception:
    logger.exception(...)
    return False  # ✅ Indicates failure

Option B: At minimum, add clear comments explaining why True is acceptable:

# Returning True because skipping sync is acceptable behavior,
# not an error condition - the webhook was processed successfully
return True

In this case that return True do not mean we are synced, it means we handle the signal in the expected way. Comment added on code for clarification.

juanjogarciatorres43 · 2026-02-04T15:44:06Z

💡 Suggestion #6: Cache timeout configuration

File: src/firefighter/raid/signals/incident_updated.py:42-48

The 30-second cache timeout is hardcoded:
def _set_impact_to_jira_cache(
    incident_id: Any, field: str, value: Any, timeout: int = 30
) -> None:
Concern: If Jira webhooks are delayed by more than 30 seconds (network issues, Jira load, etc.), the loop prevention cache will have expired, potentially causing loops.

Recommendation: Make it configurable via Django settings:
from django.conf import settings

# Default to 60s for safer margin
JIRA_SYNC_CACHE_TIMEOUT = getattr(settings, 'JIRA_SYNC_CACHE_TIMEOUT', 60)

def _set_impact_to_jira_cache(
    incident_id: Any, 
    field: str, 
    value: Any, 
    timeout: int = JIRA_SYNC_CACHE_TIMEOUT
) -> None:
    cache_key = (
        f"sync:impact_to_jira:{incident_id}:{field}:{_normalize_cache_value(value)}"
    )
    cache.set(cache_key, value=True, timeout=timeout)
Benefits:

Ops team can adjust based on observed webhook latency

Easy to increase if loops are detected

No code changes needed to tune in production

Alternative: Just increase the default to 60 seconds if configurability isn't needed.

Added config.

for more information, see https://pre-commit.ci

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

juanjogarciatorres43 added 7 commits January 22, 2026 09:45

Sync from Jira to Impact

420841a

Impact Jira double sync

8bd0f29

Sync from Jira to Impact

0c7e2ec

Impact Jira double sync

facc13f

Merge branch 'feat/impact_jira_status_sync' of https://github.com/Man…

0f5cef3

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

double sync for status and priority

9920156

Loop prevention and normalization

d432d3a

juanjogarciatorres43 requested a review from NicolasLafitteMM January 23, 2026 00:16

pre-commit-ci bot and others added 5 commits January 23, 2026 00:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

775c82b

for more information, see https://pre-commit.ci

Update documentation

01a4fac

Merge branch 'feat/impact_jira_status_sync' of https://github.com/Man…

a3927a0

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

Fix ruff errors

2b1cffd

Fix unit tests

0b0b720

juanjogarciatorres43 marked this pull request as ready for review January 23, 2026 08:11

Fix test_jira_status_sync.py timeout

eaf3fe8

juanjogarciatorres43 changed the title ~~DRAFT: JIRA <-> IMPACT sync for Status and Priority~~ JIRA <-> IMPACT sync for Status and Priority Jan 23, 2026

juanjogarciatorres43 added 3 commits January 23, 2026 10:05

Fix lint-mypy

83f8976

Fix test_cannot_close_from_postmortem_without_key_events

735967d

Fix test_update_status

cfa1bd0

NicolasLafitteMM reviewed Feb 4, 2026

View reviewed changes

juanjogarciatorres43 and others added 5 commits February 4, 2026 16:45

Fix issues raised by AI

87f0205

[pre-commit.ci] auto fixes from pre-commit.com hooks

300e0a7

for more information, see https://pre-commit.ci

Fix tests

58ba339

Merge branch 'feat/impact_jira_status_sync' of https://github.com/Man…

ecf234c

…oManoTech/firefighter-incident into feat/impact_jira_status_sync

Fix import

2cf8887

juanjogarciatorres43 merged commit 8ceafa1 into main Feb 5, 2026
7 checks passed

juanjogarciatorres43 deleted the feat/impact_jira_status_sync branch February 5, 2026 08:41

Conversation

juanjogarciatorres43 commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NicolasLafitteMM left a comment

Choose a reason for hiding this comment

Code Review Summary

Critical Issues (Must Fix)

Medium Priority Issues

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

Code Review - Based on Actual Branch Code

✅ What's Excellent

🔴 Critical Issues (Must Fix)

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

🔴 Critical #1: Excessive WARNING logging

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

🔴 Critical #2: Error handling in MITIGATING double transition

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

⚠️ Medium #3: Code duplication - _normalize_cache_value()

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

⚠️ Medium #4: Silent error handling in postmortem validation

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

💡 Suggestion #5: Potential N+1 query in webhook processing

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

💡 Suggestion #6: Cache timeout configuration

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

💡 Code Quality #7: Magic strings for Jira statuses

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

❓ Questions for the Author

1. Empty string in closure_reason constraint

2. P4+ closure without key events

3. Database Migrations

4. Deployment & Rollout Strategy

5. Testing in Production

Uh oh!

NicolasLafitteMM commented Feb 4, 2026

🎯 Final Verdict & Summary

✅ Strengths

🔴 Blocking Issues (2)

⚠️ Non-Blocking Issues (5)

❓ Questions (5)

📊 Recommendation

Uh oh!

juanjogarciatorres43 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Medium #4: Silent error handling in postmortem validation

Uh oh!

juanjogarciatorres43 commented Feb 4, 2026

💡 Suggestion #6: Cache timeout configuration

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanjogarciatorres43 commented Jan 23, 2026 •

edited

Loading

codecov-commenter commented Jan 23, 2026 •

edited

Loading

⚠️ Medium #3: Code duplication - `_normalize_cache_value()`

1. Empty string in `closure_reason` constraint

juanjogarciatorres43 commented Feb 4, 2026 •

edited

Loading