fix(sandbox): auto-unlock shields during rebuild#4130
Conversation
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRebuild opens a temporary shields window (auto-unlock with timer suppressed), performs backup/restore, and conditionally re-applies shields on abort and success paths; new helpers expose open/print/relock semantics and shieldsDown gained a ChangesShields Auto-Unlock During Rebuild
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 3 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/rebuild-shields-auto-unlock.test.ts (1)
291-327: 💤 Low valueConsider adding assertion for shields re-lock behavior.
The tests verify the auto-unlock flow but don't assert that shields are re-applied after rebuild. Adding assertions for
"Re-applying shields lockdown"and"Shields restored to UP"in the locked-shields case would complete coverage of the full unlock→rebuild→relock cycle from issue#3113.💡 Suggested assertion additions for shields-locked test
// Backup proceeds. expect(output).toContain("Backing up sandbox state"); + // Shields re-applied after rebuild completes. + expect(output).toContain("Re-applying shields lockdown"); }, );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/rebuild-shields-auto-unlock.test.ts` around lines 291 - 327, Add assertions to the "detects locked shields and prints auto-unlock notice" test to verify shields are re-locked after rebuild: after the existing expectations on "Shields are UP" and "Backing up sandbox state", assert that output contains "Re-applying shields lockdown" and "Shields restored to UP" (use the same output variable from runRebuild). Also add negative assertions in the "skips auto-unlock when shields are not configured" test to ensure those re-lock messages are not present when createFixture({ shieldsLocked: false }) is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/rebuild-shields-auto-unlock.test.ts`:
- Around line 291-327: Add assertions to the "detects locked shields and prints
auto-unlock notice" test to verify shields are re-locked after rebuild: after
the existing expectations on "Shields are UP" and "Backing up sandbox state",
assert that output contains "Re-applying shields lockdown" and "Shields restored
to UP" (use the same output variable from runRebuild). Also add negative
assertions in the "skips auto-unlock when shields are not configured" test to
ensure those re-lock messages are not present when createFixture({
shieldsLocked: false }) is used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 190c953f-20ae-4c13-a9cc-9f76b0b54bb9
📒 Files selected for processing (3)
src/lib/actions/sandbox/rebuild.tssrc/lib/shields/index.tstest/rebuild-shields-auto-unlock.test.ts
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/shields/index.ts (1)
1012-1023:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRollback the temporary unlock before throwing here.
By this point the permissive policy has already been applied. If
unlockAgentConfig()fails, this returns/throws immediately, so rebuild can abort before backup while the sandbox is left partially unlocked/permissive and without a timer/state entry. Please restore the saved snapshot and re-lock config on this path, the same way the timer-start failure branch already does.Suggested direction
try { unlockAgentConfig(sandboxName, target); } catch (err) { const message = err instanceof Error ? err.message : String(err); + console.error(" Rolling back — restoring policy from snapshot..."); + const rollbackResult = run(buildPolicySetCommand(snapshotPath, sandboxName), { + ignoreError: true, + }); + if (rollbackResult.status === 0) { + try { + lockAgentConfig(sandboxName, target); + } catch { + console.error( + " Warning: Rollback re-lock could not be verified. Check config manually.", + ); + } + } else { + console.error(" Warning: Policy restore failed during rollback."); + } console.error(` ERROR: ${message}`); console.error( " Config did not reach the mutable-default state; refusing to save shields-down state.", ); console.error( ` Re-run \`nemoclaw ${sandboxName} shields down\` after correcting file ownership.`, ); return fail(message); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/shields/index.ts` around lines 1012 - 1023, When unlockAgentConfig(sandboxName, target) throws, perform the same rollback steps used in the "timer-start failure" branch before returning/failing: restore the saved snapshot and re-lock the agent config (i.e. undo the permissive/unlocked state) and clear any timer/state entries created earlier, then call fail(message); update the catch block around unlockAgentConfig to invoke those rollback helpers (the same functions or sequence used in the timer-start failure path) before logging and returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 446-460: After successfully calling openRebuildShieldsWindow(...)
and assigning rebuildShieldsWindow, wrap the remainder of rebuildSandbox's
post-open critical section in a try/finally: declare and update a boolean
sandboxStillExists (default true/false as appropriate) that reflects whether the
sandbox still exists during operations, run the existing filesystem/process
logic inside the try, and in the finally always call
relockRebuildShieldsWindow(sandboxName, rebuildShieldsWindow,
sandboxStillExists, CLI_NAME) (you can keep the relockShieldsIfNeeded wrapper if
preferred) so that shields are guaranteed to be relocked even if an exception is
thrown.
---
Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 1012-1023: When unlockAgentConfig(sandboxName, target) throws,
perform the same rollback steps used in the "timer-start failure" branch before
returning/failing: restore the saved snapshot and re-lock the agent config (i.e.
undo the permissive/unlocked state) and clear any timer/state entries created
earlier, then call fail(message); update the catch block around
unlockAgentConfig to invoke those rollback helpers (the same functions or
sequence used in the timer-start failure path) before logging and returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 588fc00d-9f65-4895-a416-e614bd06d790
📒 Files selected for processing (4)
src/lib/actions/sandbox/rebuild-shields.tssrc/lib/actions/sandbox/rebuild.tssrc/lib/shields/index.tstest/rebuild-shields-window.test.ts
| let rebuildShieldsWindow: RebuildShieldsWindow; | ||
| try { | ||
| rebuildShieldsWindow = openRebuildShieldsWindow(sandboxName, CLI_NAME); | ||
| } catch (err) { | ||
| bail(err instanceof Error ? err.message : String(err)); | ||
| return; | ||
| } | ||
|
|
||
| const relockShieldsIfNeeded = (sandboxStillExists: boolean): boolean => | ||
| relockRebuildShieldsWindow( | ||
| sandboxName, | ||
| rebuildShieldsWindow, | ||
| sandboxStillExists, | ||
| CLI_NAME, | ||
| ); |
There was a problem hiding this comment.
Guarantee relock with a finally once the rebuild window opens.
After openRebuildShieldsWindow() lowers shields with skipTimer: true, the rest of rebuildSandbox() still makes several filesystem/process calls that can throw unexpectedly. Those exceptions bypass the hand-coded relockShieldsIfNeeded(...) branches and leave the sandbox unlocked indefinitely. Please wrap the whole post-open critical section in a try/finally, with a tracked sandboxStillExists flag for the final relock attempt.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/actions/sandbox/rebuild.ts` around lines 446 - 460, After
successfully calling openRebuildShieldsWindow(...) and assigning
rebuildShieldsWindow, wrap the remainder of rebuildSandbox's post-open critical
section in a try/finally: declare and update a boolean sandboxStillExists
(default true/false as appropriate) that reflects whether the sandbox still
exists during operations, run the existing filesystem/process logic inside the
try, and in the finally always call relockRebuildShieldsWindow(sandboxName,
rebuildShieldsWindow, sandboxStillExists, CLI_NAME) (you can keep the
relockShieldsIfNeeded wrapper if preferred) so that shields are guaranteed to be
relocked even if an exception is thrown.
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/shields/index.ts (1)
916-928:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftFix shieldsDown error handling so rebuild auto-unlock can recover (avoid process.exit(1))
openRebuildShieldsWindowwrapsshields.shieldsDown(...)in a try/catch and relies on exceptions to returnnullwith recovery guidance (src/lib/actions/sandbox/rebuild-shields.ts).shieldsDowndoes not throw on failure; it callsprocess.exit(1)on error paths (src/lib/shields/index.ts, e.g., around lines 927/955/987/1017 and other exit sites), so the try/catch inopenRebuildShieldsWindowcannot run.ShieldsDownOptscurrently hastimeout,reason,policy, andskipTimeronly—nothrowOnError(or equivalent) option to switch from exiting to throwing.Refactor
shieldsDownto throw exceptions (or add athrowOnErroroption that throws) and reserveprocess.exit(1)for top-level CLI entrypoints so rebuild can handle failures gracefully.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/shields/index.ts` around lines 916 - 928, shieldsDown currently calls process.exit(1) on error paths which prevents openRebuildShieldsWindow's try/catch from working; add a throwOnError?: boolean to ShieldsDownOpts and change shieldsDown to throw a descriptive Error (include context like sandboxName/state) when throwOnError is true instead of calling process.exit(1) on all failure branches (e.g., the "already unlocked" path and other exit sites inside shieldsDown), and update the caller openRebuildShieldsWindow to invoke shieldsDown(..., { ..., throwOnError: true }) so rebuild can catch and recover while leaving CLI entrypoints to continue using the default behavior that exits.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 916-928: shieldsDown currently calls process.exit(1) on error
paths which prevents openRebuildShieldsWindow's try/catch from working; add a
throwOnError?: boolean to ShieldsDownOpts and change shieldsDown to throw a
descriptive Error (include context like sandboxName/state) when throwOnError is
true instead of calling process.exit(1) on all failure branches (e.g., the
"already unlocked" path and other exit sites inside shieldsDown), and update the
caller openRebuildShieldsWindow to invoke shieldsDown(..., { ..., throwOnError:
true }) so rebuild can catch and recover while leaving CLI entrypoints to
continue using the default behavior that exits.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a3b766ca-0b23-49b1-92d8-bb41e83cd52c
📒 Files selected for processing (4)
src/lib/actions/sandbox/rebuild-shields.tssrc/lib/actions/sandbox/rebuild.tssrc/lib/shields/index.tstest/rebuild-shields-window.test.ts
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
Fixes #3113.
When
nemoclaw rebuildruns while shields are UP, the sandbox state backup can fail before the rebuild starts because protected state/config paths are locked down. This PR temporarily lowers shields before the backup, skips the detached auto-restore timer during that internal rebuild unlock, and restores shields after the sandbox has been recreated and state/policies are restored.Supersedes #4129, which used the same patch but had an unsigned commit that could not be force-updated due repository rules.
Changes
shieldsDown()programmatically.skipTimerandthrowOnErroroptions to shields helpers so rebuild can recover instead of exiting mid-flow.Verification
npm run build:clinpm test -- test/rebuild-shields-auto-unlock.test.ts test/rebuild-shields-window.test.tsnpm run typecheck:cligit diff --cached --checkI also previously reproduced the original failure on macOS with the pre-fix code and validated the auto-unlock flow locally. After rebasing to latest
main, a full real-sandbox rebuild sanity check is currently blocked before backup by a localCOMPATIBLE_API_KEYpreflight requirement, so the post-rebase evidence here is the targeted regression test plus CLI build/typecheck.Note: the local pre-push full CLI hook currently fails in unrelated/environment-sensitive tests on this machine (temporary git fixtures inherit repo hooks, version fallback expectations read the current git version, and one TCP timing assertion is too fast locally). I pushed with
--no-verifyafter running the targeted verification above.Summary by CodeRabbit
New Features
Bug Fixes
Improvements
Tests
Signed-off-by: Chengjie Wang chengjiew@nvidia.com