Conversation
|
Cursor Agent can help with this pull request. Just |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 278.6s
💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-6608a2d7
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 299.5s
💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-14782b8b
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 89% | Review time: 237.6s
📝 3 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-54b5b2bd
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 83% | Review time: 242.7s
💡 1 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-8c2d72b5
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 89% | Review time: 183.4s
💡 2 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-5d4e1b4e
| def test_docker_manager_cleanup_concurrent_access(mock_docker): | ||
| """Test that _cleanup_resources handles concurrent access correctly. | ||
|
|
||
| Spawns multiple threads calling _cleanup_resources simultaneously to verify |
There was a problem hiding this comment.
💡 Concurrent test could verify return value distribution
The test verifies cleanup ran once but doesn't capture return values to assert exactly one thread got True and the rest got False.
Suggested fix:
Collect return values from each thread and assert that exactly one is True and the rest are False.
merobox/commands/manager.py
Outdated
| sys.exit(0) | ||
| # Only call sys.exit() if cleanup completed or was already done. | ||
| # If cleanup was in progress (re-entrant call), return without exit | ||
| # to avoid interrupting the ongoing cleanup with SystemExit. |
There was a problem hiding this comment.
💡 _cleanup_on_exit skips cleanup based only on _shutting_down flag
If signal handler sets _shutting_down=True but cleanup returns None (in-progress from elsewhere), atexit won't call cleanup either, relying on the original caller to complete - which works but could be fragile if caller is interrupted.
Suggested fix:
Consider checking _cleanup_done directly instead of _shutting_down, or call _cleanup_resources unconditionally since it's idempotent anyway.
2d3bbbf to
ce95709
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 368.6s
🟡 2 warnings, 💡 2 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-ed876f94
merobox/commands/manager.py
Outdated
| # to avoid interrupting the ongoing cleanup with SystemExit. | ||
| if cleanup_result is not None: | ||
| sys.exit(0) | ||
|
|
There was a problem hiding this comment.
💡 Consider using an enum for cleanup state return values
The tri-state return (True/False/None) where None means 'in progress' is non-obvious; an enum like CleanupResult.PERFORMED | ALREADY_DONE | IN_PROGRESS would be more self-documenting.
Suggested fix:
Define a simple `CleanupResult` enum and return its members instead of True/False/None.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 325.4s
💡 2 suggestions, 📝 2 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-35f721ab
|
|
||
| # Verify return value distribution: exactly one PERFORMED, rest ALREADY_DONE | ||
| performed_count = sum(1 for r in results if r == CleanupResult.PERFORMED) | ||
| done_count = sum(1 for r in results if r == CleanupResult.ALREADY_DONE) |
There was a problem hiding this comment.
📝 Nit: Test docstring slightly mischaracterizes RLock behavior
The comment says 'With RLock, the same thread can re-enter, and should get IN_PROGRESS' - but IN_PROGRESS is returned because of the _cleanup_in_progress flag check, not the RLock itself; the RLock actually allows re-entry.
Suggested fix:
Clarify that the RLock allows re-entry but the `_cleanup_in_progress` flag detects it and returns early.
…g async execu Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
- Revert os._exit() to sys.exit() to allow proper stack unwinding and finally block execution (prevents leaking external resources) - Update docstrings to clarify 'at-most-once' execution semantics - Add concurrent cleanup test to verify thread-safety under parallel access - The lock now correctly ensures cleanup runs only once while preserving proper Python cleanup semantics Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
- Change threading.Lock to threading.RLock to prevent deadlock when signal arrives while atexit holds the lock (signal handlers run in same thread) - Remove test_docker_manager_has_cleanup_lock (tests implementation details) - Clean up dead code in concurrent test (unused tracking variables) - Update docstring to document RLock usage Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
- Add _cleanup_in_progress flag to detect re-entrant calls during cleanup - Move entire cleanup operation inside the lock to prevent partial cleanup - Return status from _cleanup_resources (True=done, False=already done, None=in progress) - Signal handler returns without sys.exit() if cleanup is in progress, allowing ongoing cleanup to complete without SystemExit interruption - Add sys.stdout.flush() before os._exit(1) to ensure message is printed - Set _cleanup_done after cleanup completes (not before) for accurate state - Add test for in-progress cleanup detection Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
- Apply same thread-safe cleanup pattern to BinaryManager: - Add RLock, _cleanup_in_progress, _cleanup_done flags - Update signal handler to check cleanup result before sys.exit() - Add sys.stdout.flush() before os._exit(1) - Move cleanup operations inside lock with try/finally - Fix _cleanup_on_exit in both managers to call cleanup unconditionally (it's idempotent and returns immediately if already done/in progress) - Update concurrent test to verify return value distribution: exactly one True (performed cleanup), rest False (already done) Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
- Add CleanupResult enum (PERFORMED, ALREADY_DONE, IN_PROGRESS) to constants.py for self-documenting cleanup return values - Update DockerManager and BinaryManager to use CleanupResult enum - Update DockerManager tests to use CleanupResult enum - Add BinaryManager tests for: - Double cleanup prevention - Concurrent access handling with return value distribution - In-progress state detection Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
7b0770f to
24cf063
Compare
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 86% | Review time: 300.6s
🟡 3 warnings, 💡 2 suggestions, 📝 2 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-933142b1
| @@ -184,424 +179,114 @@ def test_docker_manager_remove_signal_handlers(mock_docker): | |||
|
|
|||
There was a problem hiding this comment.
🟡 Deleted tests reduce coverage of graceful shutdown functionality
The PR removes ~15 tests covering SIGTERM handling, graceful stop, batch shutdown, and CORS configuration that still exist in production code, creating regression risk for these paths.
Suggested fix:
Keep the existing tests (test_graceful_stop_container_sends_sigterm, test_stop_node_uses_graceful_shutdown, etc.) and add the new thread-safety tests alongside them.
| @@ -184,424 +179,114 @@ def test_docker_manager_remove_signal_handlers(mock_docker): | |||
|
|
|||
There was a problem hiding this comment.
🟡 Removal of CORS security validation tests
Security-relevant tests like test_validate_cors_origins_rejects_wildcard, test_validate_cors_origins_rejects_comma, and test_cors_uses_explicit_origins_not_wildcard were deleted; these tests prevent CORS misconfigurations that could allow unauthorized cross-origin access.
Suggested fix:
Restore the CORS validation tests or move them to a separate test file to ensure CORS security properties remain verified.
merobox/commands/manager.py
Outdated
| @@ -141,33 +146,54 @@ def _setup_signal_handlers(self): | |||
| atexit.register(self._cleanup_on_exit) | |||
There was a problem hiding this comment.
💡 DRY violation: duplicated cleanup logic between managers
The _signal_handler, _cleanup_on_exit, and _cleanup_resources implementations are nearly identical between DockerManager and BinaryManager, including the same instance variables (_cleanup_lock, _cleanup_in_progress, _cleanup_done).
Suggested fix:
Extract the common cleanup coordination logic into a base class or mixin (e.g., CleanupMixin) that both managers inherit from, with an abstract method for the actual resource cleanup.
| @@ -1,16 +1,11 @@ | |||
| import signal | |||
There was a problem hiding this comment.
🟡 Unrelated test coverage deleted
The PR removed 15+ tests for CORS validation, graceful shutdown, and batch container stopping that are unrelated to the race condition fix; these tests covered functionality that still exists in manager.py.
Suggested fix:
Restore the deleted tests (test_graceful_stop_*, test_cors_*, test_validate_cors_origins_*, test_get_node_hostname, test_default_cors_origins_constant) in a separate file or keep them in this file.
| mock_container = MagicMock() | ||
| manager.nodes = {"node1": mock_container} | ||
| manager.node_rpc_ports = {"node1": 2528} | ||
|
|
There was a problem hiding this comment.
💡 Concurrent test doesn't verify IN_PROGRESS is never returned
The test asserts exactly 1 PERFORMED and n-1 ALREADY_DONE, but doesn't check that no thread received IN_PROGRESS, which could mask re-entrancy bugs if the lock semantics change.
Suggested fix:
Add assertion: `in_progress_count = sum(1 for r in results if r == CleanupResult.IN_PROGRESS); assert in_progress_count == 0`
merobox/commands/binary_manager.py
Outdated
| try: | ||
| if self.processes: | ||
| console.print("[cyan]Stopping managed processes...[/cyan]") | ||
| for node_name in list(self.processes.keys()): |
There was a problem hiding this comment.
📝 Nit: cleanup_done set True even on partial failure
The finally block marks cleanup as done regardless of whether an exception occurred during process termination, preventing retry attempts on partial failure.
Suggested fix:
Consider only setting `_cleanup_done = True` outside the exception handler, or document this as intentional fail-fast behavior.
|
|
||
| def test_binary_manager_cleanup_in_progress_returns_in_progress(): | ||
| """Test that re-entrant cleanup call returns IN_PROGRESS when cleanup is running. | ||
|
|
There was a problem hiding this comment.
📝 Nit: Test name/docstring could clarify it tests flag state, not true re-entrancy
The test manually sets _cleanup_in_progress=True without holding the lock, which tests the flag check behavior but not actual re-entrant signal handling during cleanup.
Suggested fix:
Rename to test_cleanup_returns_in_progress_when_flag_set or add a comment clarifying this tests the flag check logic specifically.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
merobox/commands/binary_manager.py
Outdated
| self.node_rpc_ports.clear() | ||
| finally: | ||
| self._cleanup_done = True | ||
| self._cleanup_in_progress = False |
There was a problem hiding this comment.
Cleanup done flag set even on exception
Medium Severity
The _cleanup_done flag is set in the finally block, marking cleanup as complete even if an exception occurs during cleanup (e.g., from console.print() if stdout is broken, or from line 154 before any processes are stopped). This prevents future cleanup attempts since subsequent calls return ALREADY_DONE, leaving managed processes or containers running.
Additional Locations (1)
| assert "http://localhost:3000" in DEFAULT_CORS_ORIGINS | ||
| result = manager._cleanup_resources() | ||
| assert result == CleanupResult.PERFORMED | ||
| assert mock_container.stop.call_count == 1 |
There was a problem hiding this comment.
Significant test coverage removed without justification
Medium Severity
Approximately 15 tests were removed from test_docker_manager.py, including comprehensive tests for graceful shutdown behavior (_graceful_stop_container, _graceful_stop_containers_batch) and CORS configuration (_validate_cors_origins, _get_node_hostname). The underlying functionality still exists in manager.py, but lost test coverage. This significantly reduces confidence in these features and makes future regressions harder to catch.
- Restore all graceful shutdown tests from master that were lost during rebase - Restore all CORS validation and configuration tests - Add IN_PROGRESS assertion to concurrent access tests to verify no re-entrancy bugs - Rename flag state tests to clarify they test flag check logic specifically Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 323.7s
🟡 1 warnings, 💡 1 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-e16e5e50
merobox/commands/binary_manager.py
Outdated
| try: | ||
| if self.processes: | ||
| console.print("[cyan]Stopping managed processes...[/cyan]") | ||
| for node_name in list(self.processes.keys()): |
There was a problem hiding this comment.
💡 Cleanup marked done even on partial failure
If an exception occurs during cleanup, _cleanup_done is still set to True in the finally block, meaning subsequent calls return ALREADY_DONE and orphan processes may remain running.
Suggested fix:
Consider only setting _cleanup_done = True after successful cleanup, or catch exceptions and re-raise after clearing _cleanup_in_progress but not setting _cleanup_done.
| @@ -69,6 +71,9 @@ def __init__( | |||
| self.pid_file_dir = Path("./data/.pids") | |||
There was a problem hiding this comment.
🟡 DRY violation: Duplicated cleanup guard pattern
The thread-safe cleanup pattern (_cleanup_lock, _cleanup_in_progress, _cleanup_done, _signal_handler, _cleanup_on_exit) is duplicated nearly identically between BinaryManager and DockerManager; consider extracting to a common base class or mixin.
Suggested fix:
Create a CleanupMixin or BaseManager class containing the signal handling and thread-safe cleanup guard logic, then have both managers inherit from it.
- Create CleanupMixin ABC in cleanup_mixin.py with: - _init_cleanup_state(): Initialize cleanup-related instance variables - _setup_signal_handlers(): Register SIGINT/SIGTERM handlers and atexit - _signal_handler(): Handle signals with graceful shutdown - _cleanup_on_exit(): Atexit handler - _cleanup_resources_guarded(): Thread-safe cleanup wrapper with RLock - _cleanup_resources(): Default implementation calling _do_cleanup() - remove_signal_handlers(): Restore original signal handlers - Abstract _do_cleanup(): Subclasses implement actual cleanup - Update DockerManager to inherit from CleanupMixin: - Call _init_cleanup_state() in __init__ - Implement _do_cleanup() with container graceful shutdown - Override _cleanup_resources() to add drain_timeout/stop_timeout params - Update BinaryManager to inherit from CleanupMixin: - Call _init_cleanup_state() in __init__ - Implement _do_cleanup() with process termination logic - Use default _cleanup_resources() from mixin This eliminates ~90 lines of duplicated code between the managers while maintaining identical behavior and thread-safety guarantees. Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 83% | Review time: 427.1s
💡 4 suggestions, 📝 2 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-23611439
| cleanup_fn: The cleanup function to execute (typically self._do_cleanup) | ||
| *args, **kwargs: Arguments to pass to cleanup_fn | ||
| Returns: |
There was a problem hiding this comment.
💡 Cleanup marked done even on exception - behavior could be documented
The finally block sets _cleanup_done = True even if cleanup_fn raises, meaning failed cleanup won't be retried; this is likely intentional for shutdown robustness but worth documenting.
Suggested fix:
Add a brief comment in the finally block explaining that marking done on failure is intentional to prevent retry loops during shutdown.
| Thread-safe guard ensuring at-most-once execution semantics. Uses RLock | ||
| to allow re-entrant calls from signal handlers in the same thread. | ||
There was a problem hiding this comment.
💡 Consider logging exceptions in cleanup to prevent silent failures
If cleanup_fn() raises an exception, the cleanup is marked as done via finally but the exception propagates up - callers may not know cleanup partially failed.
Suggested fix:
Add try/except inside _cleanup_resources_guarded to log exceptions before re-raising, or document that partial cleanup exceptions are intentionally propagated.
| CleanupResult.PERFORMED: Cleanup was executed by this call | ||
| CleanupResult.ALREADY_DONE: Cleanup was already completed previously | ||
| CleanupResult.IN_PROGRESS: Cleanup is currently in progress (re-entrant call) | ||
| """ |
There was a problem hiding this comment.
💡 **Abstract method signature uses *args, kwargs but subclasses differ
DockerManager._do_cleanup takes explicit timeout parameters while BinaryManager._do_cleanup takes none; consider using a Protocol or documenting this variance in the abstract method.
Suggested fix:
Add a note to the _do_cleanup docstring that subclasses may define their own signature since _cleanup_resources_guarded passes through arguments.
| cleanup with SystemExit. | ||
| """ | ||
| if self._shutting_down: | ||
| console.print("\n[red]Forced exit requested, terminating...[/red]") |
There was a problem hiding this comment.
💡 Signal handler returns silently when cleanup is IN_PROGRESS
When cleanup returns IN_PROGRESS, the signal handler returns without calling sys.exit(), relying on the existing cleanup to complete; this is correct but the implicit control flow could confuse maintainers.
Suggested fix:
Consider adding a debug log or comment explaining that the process will exit via the ongoing cleanup path.
| result = manager._cleanup_resources() | ||
| assert result == CleanupResult.PERFORMED | ||
| assert mock_container1.stop.call_count == 1 | ||
| assert manager._cleanup_done is True |
There was a problem hiding this comment.
📝 Nit: Test docstring references outdated return types
Docstring says 'exactly one True, rest False' but the test actually verifies CleanupResult.PERFORMED and CleanupResult.ALREADY_DONE enum values.
Suggested fix:
Update docstring to: 'Also verifies return value distribution: exactly one PERFORMED, rest ALREADY_DONE.'
|
|
||
| Spawns multiple threads calling _cleanup_resources simultaneously to verify | ||
| the lock ensures at-most-once execution under concurrent access. | ||
| Also verifies return value distribution: exactly one True, rest False. |
There was a problem hiding this comment.
📝 Nit: Test docstring mentions "True/False" but tests CleanupResult enum
Comment says "exactly one True, rest False" but the test checks for CleanupResult.PERFORMED and CleanupResult.ALREADY_DONE - the BinaryManager equivalent has the correct wording.
Suggested fix:
Update comment to: "Also verifies return value distribution: exactly one PERFORMED, rest ALREADY_DONE."


refactor: Fix race condition in signal handler calling sys.exit during async execu
This PR addresses a race condition where
sys.exit()in signal handlers could corrupt async execution state and prevents double-cleanup byatexithandlers.The
_signal_handlernow usesos._exit()for immediate termination without raisingSystemExit, and_cleanup_resourcesis made thread-safe with a lock to ensure it runs only once.Note
Medium Risk
Touches shutdown/cleanup paths for both Docker- and process-based node management, so regressions could leave containers/processes running or exit unexpectedly; changes are mitigated by added concurrency-focused unit tests.
Overview
Introduces a shared
CleanupMixinused by bothDockerManagerandBinaryManagerto centralize SIGINT/SIGTERM +atexitcleanup, with an RLock-based guard so cleanup runs at most once and reports status via newCleanupResult.Refactors both managers to implement
_do_cleanup()and route cleanup through the guarded mechanism;DockerManager._cleanup_resources()now returnsCleanupResultand supports re-entrant calls without raisingSystemExit, with new unit tests covering double-cleanup and concurrent cleanup races for both managers.Written by Cursor Bugbot for commit 82482fb. This will update automatically on new commits. Configure here.