Skip to content

Comments

fix: catch BaseException in cleanup loops to survive asyncio.CancelledError#11

Open
ramparte wants to merge 1 commit intomicrosoft:mainfrom
ramparte:fix/cleanup-cancellation-resilience
Open

fix: catch BaseException in cleanup loops to survive asyncio.CancelledError#11
ramparte wants to merge 1 commit intomicrosoft:mainfrom
ramparte:fix/cleanup-cancellation-resilience

Conversation

@ramparte
Copy link
Contributor

Problem

Users intermittently see "Unclosed client session" warnings from aiohttp on session exit:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7653d156ec50>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7653d1404390>

Root Cause

Since Python 3.9, asyncio.CancelledError inherits from BaseException, not Exception.

The coordinator's cleanup loop (coordinator.py:345-357) catches only except Exception:

async def cleanup(self):
    for cleanup_fn in reversed(self._cleanup_functions):
        try:
            # ... call cleanup_fn ...
        except Exception as e:        # ← misses CancelledError
            logger.error(f"Error during cleanup: {e}")

When task cancellation arrives during shutdown (common with Ctrl+C or normal session teardown), CancelledError escapes the try/except, aborts the for-loop entirely, and all remaining cleanup functions are skipped. Modules like tool-web and hooks-notify that create aiohttp.ClientSession objects never get their cleanup called, producing the warnings.

Note: the individual module cleanup functions are actually defensively coded (they use asyncio.shield() and catch CancelledError internally). The bug is that the coordinator never reaches them because an earlier cleanup function's CancelledError breaks out of the loop.

The same pattern exists in session.py where except Exception in execute() means CancelledError never reaches the cancellation status tracking code (lines 437-448), and where coordinator.cleanup() failure would also prevent loader.cleanup() from running.

Fix

File Change Why
coordinator.py cleanup() except Exceptionexcept BaseException Core fix: cleanup loop survives CancelledError
coordinator.py collect_contributions() Same Consistency: same loop pattern
session.py cleanup() Wrap coordinator.cleanup() in try/except BaseException Ensures loader.cleanup() always runs
session.py execute() except Exceptionexcept BaseException CancelledError reaches cancellation tracking
session.py _safe_exception_str() Widen param type to BaseException Type correctness for callers

Testing

  • All 184 existing tests pass (1 pre-existing failure on main in test_event_taxonomy.py::test_events_follow_naming_convention — unrelated event naming convention test)
  • Pyright and ruff clean on both changed files
  • The fix is minimal and mechanical — just widening exception catch clauses

🤖 Generated with Amplifier

…dError

asyncio.CancelledError inherits from BaseException (not Exception) since
Python 3.9. The coordinator cleanup loop and session cleanup used
`except Exception`, which misses CancelledError. When cancellation arrives
during shutdown, it escapes the try/except, aborts the for-loop, and all
remaining module cleanup functions are skipped — causing "Unclosed client
session" warnings from aiohttp in modules like tool-web and hooks-notify.

Changes:
- coordinator.cleanup(): except Exception → except BaseException
- coordinator.collect_contributions(): same fix for consistency
- session.cleanup(): wrap coordinator.cleanup() so loader.cleanup() always runs
- session.execute(): except Exception → except BaseException so CancelledError
  reaches the cancellation status/event tracking code
- session._safe_exception_str(): widen type to BaseException to match callers

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes intermittent "Unclosed client session" warnings from aiohttp by ensuring cleanup functions execute even when asyncio.CancelledError is raised. Since Python 3.8+, CancelledError inherits from BaseException rather than Exception, so cleanup loops catching only Exception would abort when cancellation occurs.

Changes:

  • Modified exception handling in cleanup and contribution collection loops to catch BaseException instead of Exception
  • Wrapped coordinator cleanup in session.cleanup() with exception handling to ensure loader cleanup always runs
  • Updated _safe_exception_str() type annotation to accept BaseException for compatibility with new usage

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
amplifier_core/coordinator.py Changed exception handlers in cleanup() and collect_contributions() from Exception to BaseException to survive asyncio.CancelledError
amplifier_core/session.py Changed exception handler in execute() to catch BaseException, wrapped coordinator cleanup with exception handling, and updated _safe_exception_str() type annotation
Comments suppressed due to low confidence (1)

amplifier_core/session.py:454

  • Catching BaseException without re-raising critical exceptions like KeyboardInterrupt and SystemExit can prevent proper program termination. While this change allows cancellation tracking code to execute when asyncio.CancelledError is raised, it also catches critical exceptions that should propagate immediately.

Since the code already re-raises the exception after logging, consider explicitly re-raising critical exceptions before the cancellation check:

except BaseException as e:
    if isinstance(e, (KeyboardInterrupt, SystemExit)):
        raise
    # Check if this was a cancellation-related exception
    if self.coordinator.cancellation.is_cancelled:
        ...
        except BaseException as e:
            # Check if this was a cancellation-related exception
            if self.coordinator.cancellation.is_cancelled:
                self.status.status = "cancelled"
                from .events import CANCEL_COMPLETED

                await self.coordinator.hooks.emit(
                    CANCEL_COMPLETED,
                    {
                        "was_immediate": self.coordinator.cancellation.is_immediate,
                        "error": _safe_exception_str(e),
                    },
                )
                logger.info(f"Execution cancelled: {_safe_exception_str(e)}")
                raise
            else:
                self.status.status = "failed"
                self.status.last_error = {"message": _safe_exception_str(e)}
                logger.error(f"Execution failed: {_safe_exception_str(e)}")
                raise

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if inspect.iscoroutine(result):
await result
except Exception as e:
except BaseException as e:
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching BaseException without re-raising critical exceptions like KeyboardInterrupt and SystemExit can prevent proper program termination. While this allows cleanup to survive asyncio.CancelledError, it also suppresses other critical exceptions that should propagate immediately.

Consider explicitly re-raising these critical exceptions:

except BaseException as e:
    if isinstance(e, (KeyboardInterrupt, SystemExit)):
        raise
    logger.error(f"Error during cleanup: {e}")

This ensures cleanup survives CancelledError while still allowing the user to forcefully terminate the program with Ctrl+C.

Suggested change
except BaseException as e:
except BaseException as e:
if isinstance(e, (KeyboardInterrupt, SystemExit)):
raise

Copilot uses AI. Check for mistakes.
Comment on lines +338 to 341
except BaseException as e:
logger.warning(
f"Contributor '{contributor['name']}' on channel '{channel}' failed: {e}"
)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching BaseException without re-raising critical exceptions like KeyboardInterrupt and SystemExit can prevent proper program termination. While this allows the contribution collection to survive asyncio.CancelledError, it also suppresses other critical exceptions.

Consider explicitly re-raising these critical exceptions:

except BaseException as e:
    if isinstance(e, (KeyboardInterrupt, SystemExit)):
        raise
    logger.warning(f"Contributor '{contributor['name']}' on channel '{channel}' failed: {e}")

Copilot uses AI. Check for mistakes.
Comment on lines +461 to +462
logger.error(f"Error during coordinator cleanup: {e}")
# Clean up sys.path modifications - must always run
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching BaseException and logging without re-raising means critical exceptions like KeyboardInterrupt and SystemExit will be suppressed during coordinator cleanup. This prevents proper program termination and means loader.cleanup() might run when it shouldn't.

Consider explicitly re-raising these critical exceptions:

try:
    await self.coordinator.cleanup()
except BaseException as e:
    if isinstance(e, (KeyboardInterrupt, SystemExit)):
        raise
    logger.error(f"Error during coordinator cleanup: {e}")
Suggested change
logger.error(f"Error during coordinator cleanup: {e}")
# Clean up sys.path modifications - must always run
# Re-raise critical exceptions to allow proper termination
if isinstance(e, (KeyboardInterrupt, SystemExit)):
raise
logger.error(f"Error during coordinator cleanup: {_safe_exception_str(e)}")
# Clean up sys.path modifications - must always run for non-critical errors

Copilot uses AI. Check for mistakes.
bkrabach added a commit that referenced this pull request Feb 10, 2026
…dError

asyncio.CancelledError is a BaseException subclass (Python 3.9+), so
`except Exception` in kernel loop sites lets it escape, aborting
remaining cleanup functions, hook handlers, and contribution collection.

Different patterns per site semantics:
- collect_contributions: catch CancelledError → break (partial results)
- cleanup loops: catch BaseException → continue, re-raise fatal after
- session.cleanup: try/finally ensures loader.cleanup always runs
- session.execute: catch BaseException for status tracking (re-raises)
- hooks emit/emit_and_collect: catch CancelledError → log, continue
- cancellation.trigger_callbacks: catch CancelledError → continue

Also fixes identical vulnerability in hooks.py (emit, emit_and_collect)
and cancellation.py (trigger_callbacks) beyond the original PR scope.

Includes 10 new tests for CancelledError resilience across all sites.

Inspired-by: ramparte (PR #11)

Co-Authored-By: ramparte <ramparte@users.noreply.github.com>
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
bkrabach added a commit that referenced this pull request Feb 10, 2026
…dError (#12)

asyncio.CancelledError is a BaseException subclass (Python 3.9+), so
`except Exception` in kernel loop sites lets it escape, aborting
remaining cleanup functions, hook handlers, and contribution collection.

Different patterns per site semantics:
- collect_contributions: catch CancelledError → break (partial results)
- cleanup loops: catch BaseException → continue, re-raise fatal after
- session.cleanup: try/finally ensures loader.cleanup always runs
- session.execute: catch BaseException for status tracking (re-raises)
- hooks emit/emit_and_collect: catch CancelledError → log, continue
- cancellation.trigger_callbacks: catch CancelledError → continue

Includes 10 new tests for CancelledError resilience across all sites.

Inspired-by: ramparte (PR #11)
Co-Authored-By: ramparte <ramparte@users.noreply.github.com>
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant