Skip to content

Comments

fix: catch BaseException in cleanup loops to survive asyncio.CancelledError#12

Merged
bkrabach merged 1 commit intomainfrom
fix/cleanup-cancellation-resilience
Feb 10, 2026
Merged

fix: catch BaseException in cleanup loops to survive asyncio.CancelledError#12
bkrabach merged 1 commit intomainfrom
fix/cleanup-cancellation-resilience

Conversation

@bkrabach
Copy link
Collaborator

Problem

asyncio.CancelledError is a BaseException subclass (since Python 3.9), so except Exception in kernel loop-based exception handling sites lets it escape, aborting remaining cleanup functions, hook handlers, and contribution collection. This causes resource leaks (open connections, file handles) and skipped lifecycle observers.

Inspired by #11 from @ramparte, who correctly identified this class of bug. This PR expands the fix to all affected sites with context-appropriate exception handling patterns, and adds comprehensive tests.

Approach

Different patterns per site semantics — one size does not fit all:

Site Pattern Rationale
collect_contributions except CancelledErrorbreak Active work: honor cancellation, return partial results
Cleanup loops (coordinator.cleanup) except BaseException → continue, re-raise fatal after Cleanup MUST complete; re-raise KeyboardInterrupt/SystemExit after
session.cleanup try/finally Guarantee loader.cleanup() (sys.path restoration) always runs
session.execute except BaseException → status tracking → re-raise Bookkeeping before propagation
hooks.emit / emit_and_collect except CancelledError → log, continue All handlers should observe events (especially cleanup events like session:end)
cancellation.trigger_callbacks except CancelledError → continue, re-raise fatal after All callbacks must run; fatal exceptions honored after

Files Changed

  • amplifier_core/coordinator.pycollect_contributions() + cleanup()
  • amplifier_core/session.py_safe_exception_str() + execute() + cleanup()
  • amplifier_core/hooks.pyemit() + emit_and_collect()
  • amplifier_core/cancellation.pytrigger_callbacks()
  • tests/test_cancellation_resilience.py — 10 new tests covering all sites

Testing

  • All 10 new cancellation resilience tests pass
  • All 194 existing tests pass (1 pre-existing failure in test_event_taxonomy.py unrelated to this change)
  • python_check (ruff + pyright) clean on all changed files

Closes

Supersedes #11

…dError

asyncio.CancelledError is a BaseException subclass (Python 3.9+), so
`except Exception` in kernel loop sites lets it escape, aborting
remaining cleanup functions, hook handlers, and contribution collection.

Different patterns per site semantics:
- collect_contributions: catch CancelledError → break (partial results)
- cleanup loops: catch BaseException → continue, re-raise fatal after
- session.cleanup: try/finally ensures loader.cleanup always runs
- session.execute: catch BaseException for status tracking (re-raises)
- hooks emit/emit_and_collect: catch CancelledError → log, continue
- cancellation.trigger_callbacks: catch CancelledError → continue

Also fixes identical vulnerability in hooks.py (emit, emit_and_collect)
and cancellation.py (trigger_callbacks) beyond the original PR scope.

Includes 10 new tests for CancelledError resilience across all sites.

Inspired-by: ramparte (PR #11)

Co-Authored-By: ramparte <ramparte@users.noreply.github.com>
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@bkrabach bkrabach merged commit d143126 into main Feb 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant