Conversation
|
Cursor Agent can help with this pull request. Just |
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 89% | Review time: 233.0s
🟡 5 warnings, 💡 1 suggestions, 📝 3 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-c989f828
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 93% | Review time: 219.2s
🟡 6 warnings, 💡 2 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-0449a885
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 96% | Review time: 205.5s
🔴 1 critical, 🟡 3 warnings, 💡 3 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-a7753912
merobox/commands/nuke.py
Outdated
| binary_manager = BinaryManager( | ||
| require_binary=False, enable_signal_handlers=False | ||
| ) | ||
| pid_dir = Path("./data/.pids") |
There was a problem hiding this comment.
💡 Hardcoded PID directory path may diverge from actual data path
The path ./data/.pids is hardcoded while find_calimero_data_dirs uses Path("data") - if the data directory logic ever changes, these could get out of sync.
Suggested fix:
Consider extracting the base data path as a constant or parameter shared between functions.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| docker_check_failed = True | ||
| except Exception as e: | ||
| warnings.append(f"Docker check failed unexpectedly: {e}") | ||
| docker_check_failed = True |
There was a problem hiding this comment.
DockerManager exits process, bypassing fail-safe detection
High Severity
When Docker is unavailable, DockerManager.__init__ internally catches the connection error and calls sys.exit(1), which raises SystemExit. Since SystemExit inherits from BaseException (not Exception), neither the except docker.errors.DockerException nor the except Exception handler in _detect_running_nodes will catch it. The program terminates immediately instead of setting docker_check_failed = True. This completely defeats the fail-safe mechanism — nuke --stale will crash when Docker isn't running, even with --force, instead of gracefully degrading as intended.
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 95% | Review time: 252.3s
🟡 3 warnings, 💡 4 suggestions. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-79cbfc39
| return False | ||
|
|
||
| # Check for expected Calimero node subdirectories or files | ||
| # A valid Calimero data directory typically contains a subdirectory |
There was a problem hiding this comment.
🟡 TOCTOU race condition between node detection and deletion
The time gap between get_running_node_names() and nuke_all_data_dirs() allows a node to start after detection but before deletion, potentially causing data loss for a newly-started node.
Suggested fix:
Consider re-checking that directories are still stale immediately before deletion, or add a brief lock/signal mechanism; the documented --dry-run advice is helpful but doesn't prevent the race in automated workflows.
| def _is_valid_calimero_data_dir(data_dir: str) -> bool: | ||
| """ | ||
| Validate that a directory appears to be legitimate Calimero node data. | ||
|
|
There was a problem hiding this comment.
🟡 Empty directory validation may race with node startup
Returning True for empty directories means a directory from a node that just started (created dir but hasn't written files yet) could be incorrectly marked as valid for deletion.
Suggested fix:
Consider checking the directory's mtime and skipping directories created very recently (e.g., within the last few seconds) to reduce this race window.
| for data_dir in data_dirs: | ||
| node_name = os.path.basename(data_dir) | ||
| try: | ||
| container = manager.client.containers.get(node_name) |
There was a problem hiding this comment.
🟡 Silent swallowing of APIError may hide operational failures
Catching docker.errors.APIError with just pass could mask real issues like permission denied or network failures during container stop operations.
Suggested fix:
Log a debug-level message or add to warnings list when APIError occurs, similar to how warnings are collected in `_detect_running_nodes`.
|
|
||
| Returns: | ||
| DetectionResult containing nodes found and any warnings/errors. | ||
|
|
There was a problem hiding this comment.
💡 DockerManager created but not explicitly closed
When manager is None, a new DockerManager is created but never closed; if DockerManager holds socket connections, this could leak file descriptors over repeated calls.
Suggested fix:
Consider using a context manager pattern or explicitly closing the manager when created locally, or ensure DockerManager handles cleanup via `__del__`.
|
|
||
| assert len(results) == 1 | ||
| assert results[0]["status"] == "not_found" | ||
|
|
There was a problem hiding this comment.
💡 Missing test for partial detection failure scenario
Tests cover complete failure and individual mechanism success, but no test verifies that stale_only mode continues correctly when only one detection method fails (partial failure).
Suggested fix:
Add a test case where Docker succeeds but binary detection fails (or vice versa) to ensure stale cleanup proceeds with warnings.
| nodes=running_nodes, | ||
| docker_failed=docker_check_failed, | ||
| binary_failed=binary_check_failed, | ||
| warnings=warnings, |
There was a problem hiding this comment.
💡 Weak directory validation accepts directories with common subdirectory names
Accepting any directory containing a logs/ subdirectory as valid Calimero data is permissive and could match unrelated directories if the data path is misconfigured or contains unexpected content.
Suggested fix:
Consider requiring a more specific marker file (e.g., a `.calimero` marker or config file) rather than relying solely on generic subdirectory names.
| auth_container = manager.client.containers.get("auth") | ||
| if not silent: | ||
| console.print("[yellow]Stopping auth service...[/yellow]") | ||
| auth_container.stop(timeout=NUKE_STOP_TIMEOUT) |
There was a problem hiding this comment.
💡 Repeated try/except pattern for Docker operations violates DRY
The same try/except pattern (NotFound + APIError) is repeated for auth, proxy, and volume operations; extracting a helper would reduce duplication.
Suggested fix:
Consider a small helper like `_safe_remove_container(client, name, silent)` and `_safe_remove_volume(client, name, silent)` that encapsulates this pattern.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| docker_check_failed = True | ||
| except Exception as e: | ||
| warnings.append(f"Docker check failed unexpectedly: {e}") | ||
| docker_check_failed = True |
There was a problem hiding this comment.
SystemExit from DockerManager escapes detection error handling
High Severity
DockerManager.__init__ calls sys.exit(1) when Docker is unavailable, raising SystemExit which inherits from BaseException, not Exception. The except Exception handler in _detect_running_nodes won't catch it, so the entire process terminates instead of gracefully setting docker_check_failed = True. This defeats the fail-safe mechanism — when Docker is down and no pre-existing manager is passed, stale cleanup silently exits rather than falling back to binary-only detection or raising NodeDetectionError.
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 240.6s
🟡 2 warnings, 💡 4 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-a72685dd
| warnings.append(f"Failed to check binary process {node_name}: {e}") | ||
| except Exception as e: | ||
| warnings.append(f"Binary process check failed: {e}") | ||
| binary_check_failed = True |
There was a problem hiding this comment.
🟡 Uncaught exceptions from stat() and iterdir() can crash stale cleanup
If dir_path.iterdir() or dir_path.stat() fail due to permission errors or race conditions (directory deleted mid-check), the exception propagates and aborts the entire stale cleanup operation.
Suggested fix:
Wrap in try/except and return False on OSError/PermissionError to skip problematic directories gracefully.
| "[yellow]⚠️ Binary detection failed - binary processes may not be detected[/yellow]" | ||
| ) | ||
|
|
||
| # If both detection mechanisms failed and fail_safe is enabled, raise an error |
There was a problem hiding this comment.
💡 Exception message couples library code to CLI flag
The NodeDetectionError message references --force, a CLI-specific flag; library-level exceptions should remain UI-agnostic.
Suggested fix:
Move the `--force` suggestion to the CLI layer where the exception is caught, keeping the exception message generic.
| # Accept old empty directories, but skip very recent ones to reduce | ||
| # deletion risk during node startup races. | ||
| if not any(dir_path.iterdir()): | ||
| dir_age_seconds = time.time() - dir_path.stat().st_mtime |
There was a problem hiding this comment.
🟡 Potential TOCTOU race in empty directory check
The check not any(dir_path.iterdir()) followed by dir_path.stat() can race if a file is created between the two calls, causing incorrect age calculation or validation.
Suggested fix:
Consider calling `stat()` first to capture mtime, then check `iterdir()`, or use a single atomic operation if possible.
| # Accept if we find the expected node subdirectory structure | ||
| if expected_subdir.is_dir(): | ||
| return True | ||
|
|
There was a problem hiding this comment.
💡 Hardcoded silent=True ignores caller's silent parameter
get_running_node_names(fail_safe=fail_safe, silent=True) always passes silent=True but the function has its own silent parameter that goes unused for this call.
Suggested fix:
Change to `get_running_node_names(fail_safe=fail_safe, silent=silent)` to respect caller's preference.
| console.print("[green]✓ Traefik proxy stopped and removed[/green]") | ||
| except Exception: | ||
| pass | ||
| console.print("[yellow]No stale data directories found.[/yellow]") |
There was a problem hiding this comment.
💡 Double detection when using precomputed_data_dirs with stale_only
When CLI provides precomputed_data_dirs, _filter_still_stale_dirs still runs full node detection again. For the CLI flow this is intentional safety, but the API contract could be clearer that precomputed dirs still get re-verified.
Suggested fix:
Consider documenting this behavior in the docstring or adding a `skip_recheck` parameter for trusted callers.
| auth_volume = manager.client.volumes.get("calimero_auth_data") | ||
| if not silent: | ||
| console.print("[yellow]Removing auth data volume...[/yellow]") | ||
| auth_volume.remove() |
There was a problem hiding this comment.
💡 execute_nuke has grown to 7 parameters
The function signature has accumulated many optional parameters making it harder to reason about; consider grouping related flags into a config object or dataclass.
Suggested fix:
Create a `NukeOptions` dataclass with fields `prefix`, `verbose`, `silent`, `stale_only`, `force`, `precomputed_data_dirs` and pass a single options object.
| filters={"label": "calimero.node=true", "status": "running"} | ||
| ) | ||
| for container in containers: | ||
| running_nodes.add(container.name) |
There was a problem hiding this comment.
📝 Nit: Catching bare Exception in detection code
Catching Exception is intentionally broad here for robustness, but adding a brief inline comment explaining this choice would help future readers.
Suggested fix:
Add a comment like `# Broad catch: detection must not crash the cleanup flow`.
Co-authored-by: Sandi Fatic <chefsale@users.noreply.github.com>
There was a problem hiding this comment.
🤖 AI Code Reviewer
Reviewed by 3 agents | Quality score: 100% | Review time: 316.3s
🟡 1 warnings, 💡 5 suggestions, 📝 1 nitpicks. See inline comments.
🤖 Generated by AI Code Reviewer | Review ID: review-79b0c48b
|
|
||
| return DetectionResult( | ||
| nodes=running_nodes, | ||
| docker_failed=docker_check_failed, |
There was a problem hiding this comment.
🟡 TOCTOU race can cause unhandled FileNotFoundError
If a directory is deleted between is_dir() check at line 221 and iterdir()/stat() calls here, an unhandled FileNotFoundError will be raised, causing the entire stale cleanup to fail rather than gracefully skipping the deleted directory.
Suggested fix:
Wrap the `iterdir()` and `stat()` calls in a try/except block that catches FileNotFoundError and returns False.
| binary_nodes_stopped = 0 | ||
| # Stop running Docker containers (if manager is DockerManager) | ||
| docker_nodes_stopped = 0 | ||
| if manager and hasattr(manager, "client"): |
There was a problem hiding this comment.
💡 Exception handling may miss non-API Docker errors
Catching only docker.errors.NotFound and docker.errors.APIError will let network timeouts or connection errors propagate and crash the cleanup; the original code used a broad except Exception which was safer for a best-effort cleanup operation.
Suggested fix:
Consider adding a fallback `except Exception as e` with a warning to handle unexpected Docker errors gracefully.
| ) | ||
|
|
||
|
|
||
| def execute_nuke( |
There was a problem hiding this comment.
💡 precomputed_data_dirs bypasses validation when called programmatically
When execute_nuke is called directly with precomputed_data_dirs, it bypasses _is_valid_calimero_data_dir validation; while the CLI flow is safe, callers of this internal API must ensure paths are pre-validated.
Suggested fix:
Consider adding a docstring note that `precomputed_data_dirs` must be pre-validated, or optionally re-validate paths when this parameter is used.
| pass | ||
|
|
||
| # Check binary processes via PID files | ||
| # Note: If PID directory doesn't exist, we treat it as "no binary nodes" |
There was a problem hiding this comment.
💡 Symlink validation not re-checked before deletion (TOCTOU)
The symlink check in _is_valid_calimero_data_dir is not re-verified immediately before shutil.rmtree; on platforms without fd-based functions, rmtree may be susceptible to symlink attacks in the window between validation and deletion.
Suggested fix:
Consider re-checking `Path(data_dir).is_symlink()` in `nuke_all_data_dirs` immediately before calling `shutil.rmtree`, or use `shutil.rmtree` with `dir_fd` parameter on supported platforms.
| if not any(dir_path.iterdir()): | ||
| dir_age_seconds = time.time() - dir_path.stat().st_mtime | ||
| return dir_age_seconds >= EMPTY_DATA_DIR_MIN_AGE_SECONDS | ||
|
|
There was a problem hiding this comment.
📝 Nit: Docstring suggests --dry-run mitigates TOCTOU but it doesn't
The docstring advises using --dry-run to preview deletions for TOCTOU safety, but dry-run doesn't mitigate the race; the actual mitigation is _filter_still_stale_dirs re-check which should be mentioned instead.
Suggested fix:
Update docstring to explain that the code re-checks running nodes immediately before deletion to reduce TOCTOU risk
| return False | ||
|
|
||
| # Check for expected Calimero node subdirectories or files | ||
| # A valid Calimero data directory typically contains a subdirectory |
There was a problem hiding this comment.
💡 Redundant node detection when precomputed_data_dirs is used
The CLI flow calls find_stale_data_dirs() which detects running nodes, then passes precomputed_data_dirs to execute_nuke() which calls _filter_still_stale_dirs() detecting nodes again; this is intentional for safety but results in two full Docker/binary detection cycles.
Suggested fix:
Consider caching the running nodes set between detection calls if the time window is small, or document why double-detection is acceptable.
| @@ -118,125 +396,215 @@ def nuke_all_data_dirs(data_dirs: list, dry_run: bool = False) -> dict: | |||
| return results | |||
There was a problem hiding this comment.
💡 Function has too many parameters (7)
The execute_nuke function accepts 7 parameters which makes it harder to maintain and test; consider grouping related flags into a configuration object.
Suggested fix:
Create a `NukeOptions` dataclass containing `verbose`, `silent`, `stale_only`, `force` and pass that instead
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| warnings.append(f"Failed to check binary process {node_name}: {e}") | ||
| except Exception as e: | ||
| warnings.append(f"Binary process check failed: {e}") | ||
| binary_check_failed = True |
There was a problem hiding this comment.
Fail-safe bypassed in Docker-only environments
Medium Severity
The fail_safe mechanism in get_running_node_names only raises NodeDetectionError when complete_failure is true (both docker_failed AND binary_failed). However, when the PID directory doesn't exist, binary detection is skipped and binary_failed stays False. In Docker-only environments (where the PID directory is never created), complete_failure can never be True — even if Docker detection fails. This means fail_safe=True provides no protection: Docker fails, binary is "skipped-not-failed," all directories appear stale, and running container data gets deleted with only a partial-failure warning.


Adds an option to the
nukecommand to clean up stale data directories that do not correspond to any running node.Note
Medium Risk
Touches destructive cleanup logic and adds new heuristics/fail-safe checks around what gets deleted; while guarded by detection/validation, edge cases in node detection or directory validation could still lead to unexpected retention or deletion.
Overview
Adds a
--stalemode tomerobox nuke(and astale_onlypath inexecute_nuke) to delete only orphaneddata/*node directories that don’t match any currently running node.Introduces safer running-node detection across Docker containers and binary PID-tracked processes, including a fail-safe that aborts stale cleanup if both detection methods fail (override via
--force), plus extra validation to avoid deleting non-Calimero or very recently-created empty directories and a final re-check to avoid TOCTOU races before deletion.Refactors nuke internals into
_stop_running_servicesand_cleanup_auth_services, and adds comprehensive unit tests covering stale discovery, detection failures/overrides, and deletion behavior.Written by Cursor Bugbot for commit f3c9045. This will update automatically on new commits. Configure here.