Skip to content

Conversation

@drazisil-codecov
Copy link
Contributor

@drazisil-codecov drazisil-codecov commented Jan 7, 2026

Summary

Improves Sentry visibility when worker tasks crash, making it easier to debug production issues.

Changes

Signal Handlers (celery_config.py)

  • worker_process_init - Initialize Sentry in each forked worker process
  • worker_shutting_down - Flush pending Sentry events before shutdown (SIGTERM)
  • task_prerun - Set task context (args/kwargs) for all Sentry events

Failure/Timeout Capture (tasks/base.py)

  • on_failure hook - Capture exceptions to Sentry with failure_type=task_failure tag
  • on_timeout hook - Capture hard timeouts to Sentry with immediate flush (before SIGKILL)

Cleanup

  • Remove redundant comments in exception handlers

Why

When worker tasks crash (especially hard timeouts leading to SIGKILL), Sentry trace data wasn't being sent, making it difficult to debug production issues. These changes ensure:

  1. Sentry is properly initialized in forked workers
  2. Events are flushed before graceful shutdown
  3. Hard timeouts get captured with immediate flush before SIGKILL
  4. Task context (args, kwargs, retries, attempts) is available in Sentry events

Testing

  • All 55 existing tests pass
  • Code paths are exercised by test_sample_task_hard_timeout and test_sample_task_failure

Note

Improves observability of worker task failures and timeouts via Sentry.

  • Add Celery signal handlers in celery_config.py: re-initialize Sentry per forked worker (worker_process_init), flush events on shutdown (worker_shutting_down), and set task context (task_prerun) with safe stringified args/kwargs
  • In BaseCodecovRequest.on_timeout, capture hard timeouts to Sentry with failure_type=hard_timeout and immediate flush, plus metrics increments
  • Minor cleanup: remove redundant comments in exception handlers; wire Sentry helpers (initialize_sentry, is_sentry_enabled) and sentry_sdk usage

Written by Cursor Bugbot for commit 1adefc4. This will update automatically on new commits. Configure here.

- Add signal handlers for Sentry initialization in forked workers
- Flush Sentry events on graceful worker shutdown (SIGTERM)
- Set task context (args/kwargs) at task start for debugging
- Capture task failures to Sentry with failure_type tag
- Capture hard timeouts to Sentry with immediate flush before SIGKILL
- Clean up redundant comments in exception handlers

This ensures crash data is sent to Sentry even when workers die
unexpectedly, making it easier to debug production issues.
@codecov-notifications
Copy link

codecov-notifications bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
apps/worker/celery_config.py 63.15% 7 Missing ⚠️
apps/worker/tasks/base.py 80.00% 2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (68.96%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@sentry
Copy link

sentry bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 68.96552% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.89%. Comparing base (0360937) to head (1adefc4).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
apps/worker/celery_config.py 63.15% 7 Missing ⚠️
apps/worker/tasks/base.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #636      +/-   ##
==========================================
- Coverage   93.90%   93.89%   -0.02%     
==========================================
  Files        1286     1286              
  Lines       46804    46833      +29     
  Branches     1517     1517              
==========================================
+ Hits        43953    43973      +20     
- Misses       2542     2551       +9     
  Partials      309      309              
Flag Coverage Δ
workerintegration 59.14% <44.82%> (-0.03%) ⬇️
workerunit 91.26% <68.96%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Address Bugbot review: _capture_failure_to_sentry could throw and
block super().on_failure(), metrics, and flow logging.

Changes:
- Move Sentry capture AFTER super().on_failure() (consistent with on_timeout)
- Wrap Sentry capture in try-except for resilience
- Add safe_str() helper to handle objects with broken __str__
- Wrap _capture_hard_timeout_to_sentry in on_timeout with try-except
- Add _safe_str helper to celery_config.py for signal handler
- Ensures metrics and flow logging are not blocked by Sentry failures
…y events

CeleryIntegration already captures task failures automatically. We now only
enrich the context with task args/kwargs/retries without calling
capture_exception, avoiding duplicate events per Sentry issue #433.
The _enrich_sentry_context_for_failure method was setting Sentry context
in on_failure, but CeleryIntegration captures exceptions when they're
raised (before on_failure is called), so the enrichment was ineffective.

Changes:
- Remove _enrich_sentry_context_for_failure method and its call
- Add 'retries' to set_sentry_task_context at task_prerun where it
  will be captured with exception events
- Update docstring to clarify why context is set at prerun

This addresses the review feedback that Sentry context enrichment
occurs after the exception is already captured.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants