fix(sandbox): auto-unlock shields during rebuild by chengjiew · Pull Request #4130 · NVIDIA/NemoClaw

chengjiew · 2026-05-23T13:19:21Z

Summary

When nemoclaw rebuild runs while shields are UP, the sandbox state backup can fail before the rebuild starts because protected state/config paths are locked down. This PR temporarily lowers shields before the backup, skips the detached auto-restore timer during that internal rebuild unlock, and restores shields after the sandbox has been recreated and state/policies are restored.

Supersedes #4129, which used the same patch but had an unsigned commit that could not be force-updated due repository rules.

Changes

Detect locked shields before rebuild backup and call shieldsDown() programmatically.
Add internal skipTimer and throwOnError options to shields helpers so rebuild can recover instead of exiting mid-flow.
Re-apply shields after successful rebuild, and provide manual recovery guidance if recreate fails after the old sandbox has been deleted.
Add a regression test for the shields-UP rebuild path and the shields-not-configured path.

Verification

npm run build:cli
npm test -- test/rebuild-shields-auto-unlock.test.ts test/rebuild-shields-window.test.ts
npm run typecheck:cli
git diff --cached --check

I also previously reproduced the original failure on macOS with the pre-fix code and validated the auto-unlock flow locally. After rebasing to latest main, a full real-sandbox rebuild sanity check is currently blocked before backup by a local COMPATIBLE_API_KEY preflight requirement, so the post-rebase evidence here is the targeted regression test plus CLI build/typecheck.

Note: the local pre-push full CLI hook currently fails in unrelated/environment-sensitive tests on this machine (temporary git fixtures inherit repo hooks, version fallback expectations read the current git version, and one TCP timing assertion is too fast locally). I pushed with --no-verify after running the targeted verification above.

Summary by CodeRabbit

New Features
- Rebuilds can temporarily relax and re-apply sandbox security shields; option to skip the detached auto-restore timer and an option to throw errors instead of exiting.
Bug Fixes
- Shields are now re-applied on multiple abort/failure paths to avoid leaving sandboxes unprotected.
Improvements
- Clearer operator messaging and explicit recovery instructions when shield operations fail; rebuild aborts if re-locking fails.
Tests
- New integration and unit tests covering auto-unlock, relock, and recovery behaviors.

Signed-off-by: Chengjie Wang chengjiew@nvidia.com

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

copy-pr-bot · 2026-05-23T13:19:24Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-23T13:19:35Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Rebuild opens a temporary shields window (auto-unlock with timer suppressed), performs backup/restore, and conditionally re-applies shields on abort and success paths; new helpers expose open/print/relock semantics and shieldsDown gained a skipTimer option.

Changes

Shields Auto-Unlock During Rebuild

Layer / File(s)	Summary
Shields API: types, rollback, and opts `src/lib/shields/index.ts`	Adds `AgentConfigTarget` and `failShieldsCommand()`, introduces `rollbackShieldsDown()`, extends `ShieldsDownOpts` with `skipTimer?: boolean` and `throwOnError?: boolean`, conditions auto-restore timer startup on `!opts.skipTimer`, and centralizes rollback on unlock/timer failures.
Rebuild shields helpers: open/print/relock window `src/lib/actions/sandbox/rebuild-shields.ts`	Adds `RebuildShieldsWindow` plus `openRebuildShieldsWindow`, `printRebuildShieldsRecovery`, and `relockRebuildShieldsWindow`; `open` auto-unlocks with `{ skipTimer: true }` when needed and returns `null` on failure; `relock` is conditional, idempotent, and reports errors when sandbox missing or shieldsUp fails.
Rebuild integration: open window, relock on aborts, final relock `src/lib/actions/sandbox/rebuild.ts`	`rebuildSandbox` initializes the rebuild-scoped window and early-bails if open fails; calls relock on backup/metadata/delete aborts and after recreate-failure (printing recovery when sandbox destroyed); after successful restore it attempts final relock and bails if relock fails.
Integration test: rebuild auto-unlock fixture and run `test/rebuild-shields-auto-unlock.test.ts`	Integration test creates isolated fixture with fake `openshell`/`docker`/`ssh` and validates auto-unlock messaging, temporary unlock, policy snapshot capture, and backup for locked vs unlocked scenarios.
Unit tests: open/relock idempotence and failure logging `test/rebuild-shields-window.test.ts`	Unit tests mock `../src/lib/shields` to assert `openRebuildShieldsWindow`/`relockRebuildShieldsWindow` behavior: wasLocked, relock idempotence, relock failure handling, and no-op when shields already down.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4106: Both PRs modify the rebuild preflight and sandbox list capture/recovery handling used around backup and restore.
NVIDIA/NemoClaw#3976: Related changes to src/lib/shields/index.ts touching shieldsDown behavior and policy handling.

Suggested labels

fix, Sandbox, v0.0.50

Suggested reviewers

ericksoa
cv

Poem

🐰 I nudged the shields, then stepped aside,
Opened the path so rebuilds could glide.
Backup hummed softly, restore kept time,
I closed the gate gently — all set, all fine.
One command now mends what before took three.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title clearly summarizes the main fix: auto-unlock shields during rebuild, which directly addresses the linked issue `#3113`.
Linked Issues check	✅ Passed	Changes comprehensively implement issue `#3113`: rebuild now detects locked shields, temporarily unlocks via shieldsDown, completes backup/rebuild/restore, and re-applies shields lockdown.
Out of Scope Changes check	✅ Passed	All changes are scoped to the rebuild-shields workflow and supporting shields APIs; no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/3113_rebuild-shields-up-auto-unlock-signed

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-23T13:20:53Z

E2E Advisor Recommendation

Required E2E: rebuild-openclaw-e2e, rebuild-hermes-e2e, shields-config-e2e
Optional E2E: network-policy-e2e, state-backup-restore-e2e

Dispatch hint: rebuild-openclaw-e2e,rebuild-hermes-e2e,shields-config-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

rebuild-openclaw-e2e (high (~60 min timeout; builds images and recreates sandbox)): Directly validates the OpenClaw rebuild lifecycle changed by this PR: backup, sandbox delete/recreate, state restore, credential stripping, registry version update, and policy preset preservation on a live OpenShell sandbox.
rebuild-hermes-e2e (high (~60 min timeout; builds Hermes images and recreates sandbox)): The rebuild action is shared across agents and the shields helpers resolve agent-specific config targets. This validates that Hermes rebuild still preserves state and upgrades correctly after the new auto-unlock/relock integration.
shields-config-e2e (medium (~30 min timeout; live sandbox with policy/config checks)): Directly validates the live shields security boundary affected by src/lib/shields/index.ts: shields up/down, config mutability/immutability, audit trail, and auto-restore timer behavior.

Optional E2E

network-policy-e2e (medium-high (~60 min timeout)): Useful adjacent confidence because shields down/up manipulates OpenShell network policy snapshots and permissive/restrictive transitions; not strictly required because shields-config-e2e already covers the shields-specific policy path.
state-backup-restore-e2e (medium-high (~60 min timeout)): Useful adjacent confidence for the backup/restore subsystem used during rebuild, especially because the new auto-unlock window exists to make backup succeed when shields are locked.

New E2E recommendations

rebuild + shields integration (high): Existing E2E jobs cover rebuild and shields separately, but none appears to explicitly run shields up, then nemoclaw <sandbox> rebuild --yes, and assert that backup succeeds, the sandbox is recreated/restored, policy/config lockdown is re-applied, and no auto-restore timer is left behind.
- Suggested test: Add a live E2E scenario/job for rebuild-while-shields-up covering OpenClaw at minimum, with optional Hermes matrix coverage.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: rebuild-openclaw-e2e,rebuild-hermes-e2e,shields-config-e2e

github-actions · 2026-05-23T13:20:54Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-23T13:21:51Z

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Spawned rebuild regression still does not assert final relock state or exit status (test/rebuild-shields-auto-unlock.test.ts:297): The spawned rebuild regression can still pass after observing the auto-unlock and backup messages. It does not assert the spawned command status, the final lock-state fixture, that the full rebuild reached the post-restore relock point, or that a relock failure in the spawned rebuild exits non-zero with recovery guidance. This is the same prior advisor finding and still applies in the current diff.
- Recommendation: Extend the spawned rebuild fixture to assert a successful status for the happy path, observe the lock-state transition back to locked, and add a spawned negative case where relock verification fails and rebuild returns non-zero with recovery guidance.
- Evidence: The test builds output from stdout/stderr and asserts only absence of the old backup-abort message plus presence of auto-unlock/snapshot/backup text around test/rebuild-shields-auto-unlock.test.ts:297-310. test/rebuild-shields-window.test.ts covers helper-level relock success/failure, but not the rebuild caller/callee integration path.
Coordinate with active overlapping rebuild and shields work (src/lib/actions/sandbox/rebuild.ts:443): Codebase drift is acceptable because all touched files still exist and the patch applies to active code, but trusted drift data shows concurrent work in the same rebuild and shields modules. The new auto-unlock/relock flow touches security-sensitive lifecycle behavior that may interact with preserved policies, gateway credential reuse, and shields behavior. This prior advisor finding still applies.
- Recommendation: Before landing, compare this PR with the overlapping rebuild/shields PRs and make sure their combined behavior still preserves credentials, policies, and shields state across rebuild.
- Evidence: Trusted overlap data lists PR fix(rebuild): preserve custom policy presets #3021 and fix(rebuild): reuse OpenShell gateway credential when host env is empty #3918 touching src/lib/actions/sandbox/rebuild.ts, and PR fix(doctor): handle docker-driver gateway mode (resolver + skip k3s port check) #3941 touching src/lib/shields/index.ts. Recent history also shows active changes in both files.
Large rebuild/shields modules still grow modestly (src/lib/actions/sandbox/rebuild.ts:443): The prior blocker about monolith growth appears addressed by extracting src/lib/actions/sandbox/rebuild-shields.ts, but two already-large hotspots still grow in this PR. This remains a lower-severity maintainability concern for security-sensitive lifecycle code.
- Recommendation: Consider whether any additional rebuild/shields orchestration can move into focused helpers, especially around the relock lifecycle and recovery messaging, or document why the remaining growth should stay in the existing modules.
- Evidence: Trusted monolithDeltas: src/lib/actions/sandbox/rebuild.ts grows from 870 to 889 lines (+19, warning) and src/lib/shields/index.ts grows from 1353 to 1371 lines (+18, warning). The new src/lib/actions/sandbox/rebuild-shields.ts helper is not a large-file hotspot.

🌱 Nice ideas

None.

Since last review details

Current findings:

Spawned rebuild regression still does not assert final relock state or exit status (test/rebuild-shields-auto-unlock.test.ts:297): The spawned rebuild regression can still pass after observing the auto-unlock and backup messages. It does not assert the spawned command status, the final lock-state fixture, that the full rebuild reached the post-restore relock point, or that a relock failure in the spawned rebuild exits non-zero with recovery guidance. This is the same prior advisor finding and still applies in the current diff.
- Recommendation: Extend the spawned rebuild fixture to assert a successful status for the happy path, observe the lock-state transition back to locked, and add a spawned negative case where relock verification fails and rebuild returns non-zero with recovery guidance.
- Evidence: The test builds output from stdout/stderr and asserts only absence of the old backup-abort message plus presence of auto-unlock/snapshot/backup text around test/rebuild-shields-auto-unlock.test.ts:297-310. test/rebuild-shields-window.test.ts covers helper-level relock success/failure, but not the rebuild caller/callee integration path.
Coordinate with active overlapping rebuild and shields work (src/lib/actions/sandbox/rebuild.ts:443): Codebase drift is acceptable because all touched files still exist and the patch applies to active code, but trusted drift data shows concurrent work in the same rebuild and shields modules. The new auto-unlock/relock flow touches security-sensitive lifecycle behavior that may interact with preserved policies, gateway credential reuse, and shields behavior. This prior advisor finding still applies.
- Recommendation: Before landing, compare this PR with the overlapping rebuild/shields PRs and make sure their combined behavior still preserves credentials, policies, and shields state across rebuild.
- Evidence: Trusted overlap data lists PR fix(rebuild): preserve custom policy presets #3021 and fix(rebuild): reuse OpenShell gateway credential when host env is empty #3918 touching src/lib/actions/sandbox/rebuild.ts, and PR fix(doctor): handle docker-driver gateway mode (resolver + skip k3s port check) #3941 touching src/lib/shields/index.ts. Recent history also shows active changes in both files.
Large rebuild/shields modules still grow modestly (src/lib/actions/sandbox/rebuild.ts:443): The prior blocker about monolith growth appears addressed by extracting src/lib/actions/sandbox/rebuild-shields.ts, but two already-large hotspots still grow in this PR. This remains a lower-severity maintainability concern for security-sensitive lifecycle code.
- Recommendation: Consider whether any additional rebuild/shields orchestration can move into focused helpers, especially around the relock lifecycle and recovery messaging, or document why the remaining growth should stay in the existing modules.
- Evidence: Trusted monolithDeltas: src/lib/actions/sandbox/rebuild.ts grows from 870 to 889 lines (+19, warning) and src/lib/shields/index.ts grows from 1353 to 1371 lines (+18, warning). The new src/lib/actions/sandbox/rebuild-shields.ts helper is not a large-file hotspot.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

🧹 Nitpick comments (1)

test/rebuild-shields-auto-unlock.test.ts (1)
291-327: 💤 Low value

Consider adding assertion for shields re-lock behavior.

The tests verify the auto-unlock flow but don't assert that shields are re-applied after rebuild. Adding assertions for "Re-applying shields lockdown" and "Shields restored to UP" in the locked-shields case would complete coverage of the full unlock→rebuild→relock cycle from issue #3113.
💡 Suggested assertion additions for shields-locked test
       // Backup proceeds.
       expect(output).toContain("Backing up sandbox state");
+      // Shields re-applied after rebuild completes.
+      expect(output).toContain("Re-applying shields lockdown");
     },
   );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/rebuild-shields-auto-unlock.test.ts` around lines 291 - 327, Add
assertions to the "detects locked shields and prints auto-unlock notice" test to
verify shields are re-locked after rebuild: after the existing expectations on
"Shields are UP" and "Backing up sandbox state", assert that output contains
"Re-applying shields lockdown" and "Shields restored to UP" (use the same output
variable from runRebuild). Also add negative assertions in the "skips
auto-unlock when shields are not configured" test to ensure those re-lock
messages are not present when createFixture({ shieldsLocked: false }) is used.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/rebuild-shields-auto-unlock.test.ts`:
- Around line 291-327: Add assertions to the "detects locked shields and prints
auto-unlock notice" test to verify shields are re-locked after rebuild: after
the existing expectations on "Shields are UP" and "Backing up sandbox state",
assert that output contains "Re-applying shields lockdown" and "Shields restored
to UP" (use the same output variable from runRebuild). Also add negative
assertions in the "skips auto-unlock when shields are not configured" test to
ensure those re-lock messages are not present when createFixture({
shieldsLocked: false }) is used.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 190c953f-20ae-4c13-a9cc-9f76b0b54bb9

📥 Commits

Reviewing files that changed from the base of the PR and between 638bccd and 9ddb891.

📒 Files selected for processing (3)

src/lib/actions/sandbox/rebuild.ts
src/lib/shields/index.ts
test/rebuild-shields-auto-unlock.test.ts

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/shields/index.ts (1)

1012-1023: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rollback the temporary unlock before throwing here.

By this point the permissive policy has already been applied. If unlockAgentConfig() fails, this returns/throws immediately, so rebuild can abort before backup while the sandbox is left partially unlocked/permissive and without a timer/state entry. Please restore the saved snapshot and re-lock config on this path, the same way the timer-start failure branch already does.

Suggested direction

   try {
     unlockAgentConfig(sandboxName, target);
   } catch (err) {
     const message = err instanceof Error ? err.message : String(err);
+    console.error("  Rolling back — restoring policy from snapshot...");
+    const rollbackResult = run(buildPolicySetCommand(snapshotPath, sandboxName), {
+      ignoreError: true,
+    });
+    if (rollbackResult.status === 0) {
+      try {
+        lockAgentConfig(sandboxName, target);
+      } catch {
+        console.error(
+          "  Warning: Rollback re-lock could not be verified. Check config manually.",
+        );
+      }
+    } else {
+      console.error("  Warning: Policy restore failed during rollback.");
+    }
     console.error(`  ERROR: ${message}`);
     console.error(
       "  Config did not reach the mutable-default state; refusing to save shields-down state.",
     );
     console.error(
       `  Re-run \`nemoclaw ${sandboxName} shields down\` after correcting file ownership.`,
     );
     return fail(message);
   }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/shields/index.ts` around lines 1012 - 1023, When
unlockAgentConfig(sandboxName, target) throws, perform the same rollback steps
used in the "timer-start failure" branch before returning/failing: restore the
saved snapshot and re-lock the agent config (i.e. undo the permissive/unlocked
state) and clear any timer/state entries created earlier, then call
fail(message); update the catch block around unlockAgentConfig to invoke those
rollback helpers (the same functions or sequence used in the timer-start failure
path) before logging and returning.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 446-460: After successfully calling openRebuildShieldsWindow(...)
and assigning rebuildShieldsWindow, wrap the remainder of rebuildSandbox's
post-open critical section in a try/finally: declare and update a boolean
sandboxStillExists (default true/false as appropriate) that reflects whether the
sandbox still exists during operations, run the existing filesystem/process
logic inside the try, and in the finally always call
relockRebuildShieldsWindow(sandboxName, rebuildShieldsWindow,
sandboxStillExists, CLI_NAME) (you can keep the relockShieldsIfNeeded wrapper if
preferred) so that shields are guaranteed to be relocked even if an exception is
thrown.

---

Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 1012-1023: When unlockAgentConfig(sandboxName, target) throws,
perform the same rollback steps used in the "timer-start failure" branch before
returning/failing: restore the saved snapshot and re-lock the agent config (i.e.
undo the permissive/unlocked state) and clear any timer/state entries created
earlier, then call fail(message); update the catch block around
unlockAgentConfig to invoke those rollback helpers (the same functions or
sequence used in the timer-start failure path) before logging and returning.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 588fc00d-9f65-4895-a416-e614bd06d790

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddb891 and b97772f.

📒 Files selected for processing (4)

src/lib/actions/sandbox/rebuild-shields.ts
src/lib/actions/sandbox/rebuild.ts
src/lib/shields/index.ts
test/rebuild-shields-window.test.ts

coderabbitai · 2026-05-23T13:39:08Z

+  let rebuildShieldsWindow: RebuildShieldsWindow;
+  try {
+    rebuildShieldsWindow = openRebuildShieldsWindow(sandboxName, CLI_NAME);
+  } catch (err) {
+    bail(err instanceof Error ? err.message : String(err));
+    return;
+  }
+
+  const relockShieldsIfNeeded = (sandboxStillExists: boolean): boolean =>
+    relockRebuildShieldsWindow(
+      sandboxName,
+      rebuildShieldsWindow,
+      sandboxStillExists,
+      CLI_NAME,
+    );


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Guarantee relock with a finally once the rebuild window opens.

After openRebuildShieldsWindow() lowers shields with skipTimer: true, the rest of rebuildSandbox() still makes several filesystem/process calls that can throw unexpectedly. Those exceptions bypass the hand-coded relockShieldsIfNeeded(...) branches and leave the sandbox unlocked indefinitely. Please wrap the whole post-open critical section in a try/finally, with a tracked sandboxStillExists flag for the final relock attempt.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/rebuild.ts` around lines 446 - 460, After successfully calling openRebuildShieldsWindow(...) and assigning rebuildShieldsWindow, wrap the remainder of rebuildSandbox's post-open critical section in a try/finally: declare and update a boolean sandboxStillExists (default true/false as appropriate) that reflects whether the sandbox still exists during operations, run the existing filesystem/process logic inside the try, and in the finally always call relockRebuildShieldsWindow(sandboxName, rebuildShieldsWindow, sandboxStillExists, CLI_NAME) (you can keep the relockShieldsIfNeeded wrapper if preferred) so that shields are guaranteed to be relocked even if an exception is thrown.

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/shields/index.ts (1)
916-928: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Fix shieldsDown error handling so rebuild auto-unlock can recover (avoid process.exit(1))

openRebuildShieldsWindow wraps shields.shieldsDown(...) in a try/catch and relies on exceptions to return null with recovery guidance (src/lib/actions/sandbox/rebuild-shields.ts).

shieldsDown does not throw on failure; it calls process.exit(1) on error paths (src/lib/shields/index.ts, e.g., around lines 927/955/987/1017 and other exit sites), so the try/catch in openRebuildShieldsWindow cannot run.

ShieldsDownOpts currently has timeout, reason, policy, and skipTimer only—no throwOnError (or equivalent) option to switch from exiting to throwing.

Refactor shieldsDown to throw exceptions (or add a throwOnError option that throws) and reserve process.exit(1) for top-level CLI entrypoints so rebuild can handle failures gracefully.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/shields/index.ts` around lines 916 - 928, shieldsDown currently calls
process.exit(1) on error paths which prevents openRebuildShieldsWindow's
try/catch from working; add a throwOnError?: boolean to ShieldsDownOpts and
change shieldsDown to throw a descriptive Error (include context like
sandboxName/state) when throwOnError is true instead of calling process.exit(1)
on all failure branches (e.g., the "already unlocked" path and other exit sites
inside shieldsDown), and update the caller openRebuildShieldsWindow to invoke
shieldsDown(..., { ..., throwOnError: true }) so rebuild can catch and recover
while leaving CLI entrypoints to continue using the default behavior that exits.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 916-928: shieldsDown currently calls process.exit(1) on error
paths which prevents openRebuildShieldsWindow's try/catch from working; add a
throwOnError?: boolean to ShieldsDownOpts and change shieldsDown to throw a
descriptive Error (include context like sandboxName/state) when throwOnError is
true instead of calling process.exit(1) on all failure branches (e.g., the
"already unlocked" path and other exit sites inside shieldsDown), and update the
caller openRebuildShieldsWindow to invoke shieldsDown(..., { ..., throwOnError:
true }) so rebuild can catch and recover while leaving CLI entrypoints to
continue using the default behavior that exits.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a3b766ca-0b23-49b1-92d8-bb41e83cd52c

📥 Commits

Reviewing files that changed from the base of the PR and between b97772f and 199ac5a.

📒 Files selected for processing (4)

src/lib/actions/sandbox/rebuild-shields.ts
src/lib/actions/sandbox/rebuild.ts
src/lib/shields/index.ts
test/rebuild-shields-window.test.ts

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

chengjiew · 2026-05-23T13:54:26Z

@coderabbitai review

coderabbitai · 2026-05-23T13:54:31Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

fix(sandbox): auto-unlock shields during rebuild

9ddb891

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

chengjiew added 2 commits May 23, 2026 21:33

fix(sandbox): fail closed when rebuild cannot relock shields

b97772f

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

fix(sandbox): narrow rebuild shields relock handling

199ac5a

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

fix(shields): roll back failed rebuild unlock

29a5347

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

fix(shields): let rebuild catch unlock failures

9d3f649

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

Merge branch 'main' into fix/3113_rebuild-shields-up-auto-unlock-signed

ce1e3cf

cv approved these changes May 24, 2026

View reviewed changes

cv added the v0.0.51 Release target label May 24, 2026

github-actions Bot mentioned this pull request May 24, 2026

fix(shields,state): keep gateway readable and runtime sessions writable under shields-up #4155

Open

12 tasks

Conversation

chengjiew commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chengjiew commented May 23, 2026

Uh oh!

coderabbitai Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengjiew commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading

github-actions Bot commented May 23, 2026 •

edited

Loading