Skip to content

fix(snapshot): use registered imageTag for VM-driver auto-create#4058

Open
cr7258 wants to merge 2 commits into
NVIDIA:mainfrom
cr7258:snapshot-restore
Open

fix(snapshot): use registered imageTag for VM-driver auto-create#4058
cr7258 wants to merge 2 commits into
NVIDIA:mainfrom
cr7258:snapshot-restore

Conversation

@cr7258
Copy link
Copy Markdown
Contributor

@cr7258 cr7258 commented May 22, 2026

Summary

This PR fixes nemoclaw <src> snapshot restore --to <new> aborting on macOS Apple Silicon (VM driver) with Cannot auto-create '<new>': could not resolve '<src>' pod image., because resolveSrcPodImage()'s fast path was still hard-coded to openshellDriver === "docker" — a symmetric gap left by #3784 which only fixed probeGatewayRunning(). Route both Docker and VM driver sandboxes through the same usesGatewayMetadataProbe() helper and gate the auto-create DNS-proxy step on the "kubernetes" driver to match onboard.ts.

Related Issue

Fixes #4071

Changes

  • resolveSrcPodImage() fast path now trusts the registered imageTag for both docker and vm drivers via usesGatewayMetadataProbe(), so VM-driver auto-create no longer falls into the kubectl-via-docker probe against a container that does not exist.
  • autoCreateSandboxFromSource() DNS-proxy step gated on openshellDriver === "kubernetes", matching the onboard logic (src/lib/onboard.ts). Previously vm-driver sandboxes invoked setup-dns-proxy.sh even though they don't use the kubectl-based DNS proxy.
  • Added a regression test in test/snapshot-gateway-guard.test.ts covering snapshot restore --to <new> on a VM-driver sandbox: docker exec emits a sentinel that the test asserts never appears, and the registered imageTag is asserted to show up in the auto-create output.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Test the PR locally on a MacOS Apple Silicon device:

nemoclaw my-assistant snapshot create --name verify-fix
  Creating snapshot of 'my-assistant' (--name verify-fix)...
  ✓ Snapshot v3 name=verify-fix created (13 directories, 0 files)
    /Users/sevenc/.nemoclaw/rebuild-backups/my-assistant/2026-05-22T07-28-20-816Z

nemoclaw my-assistant snapshot restore verify-fix --to clone-1
  'clone-1' does not exist. Creating from 'my-assistant' image (openshell/sandbox-from:1779432198)...
  Creating sandbox in gateway...
  Waiting for sandbox to become ready...
  Sandbox reported Ready before create stream exited; continuing.
  ✓ Sandbox 'clone-1' created

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: sevenc sevenc@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved snapshot restore functionality to properly resolve images for different driver configurations
    • Enhanced DNS proxy setup to correctly initialize during sandbox auto-creation based on driver type
  • Tests

    • Added comprehensive test coverage for snapshot restore workflows to ensure image resolution works correctly across driver types

Review Change Stack

Restoring a snapshot from a VM-driver sandbox into a new destination
(`snapshot restore --to <new>`) aborted with:

    Cannot auto-create '<new>': could not resolve '<src>' pod image.

PR NVIDIA#3784 taught `probeGatewayRunning()` to treat `openshellDriver: "vm"`
like `"docker"`, but `resolveSrcPodImage()`'s fast path was still
hard-coded to `=== "docker"`. On macOS Apple Silicon (vm driver) the
fast path was skipped and we fell into the legacy
`docker exec openshell-cluster-* kubectl ...` probe — a container that
does not exist on VM-driver hosts — so the resolver always returned
null.

Route both Docker and VM driver sandboxes through the same
`usesGatewayMetadataProbe()` helper so the registered imageTag is
trusted. While here, gate the auto-create DNS-proxy step on the
`"kubernetes"` driver, matching `onboard.ts` — the VM driver does not
use the kubectl-based DNS proxy.

Signed-off-by: sevenc <sevenc@nvidia.com>
@cr7258 cr7258 self-assigned this May 22, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

📝 Walkthrough

Walkthrough

Prefers the sandbox registry's registered imageTag for drivers using gateway metadata probes (docker/vm), restricts DNS proxy execution to the kubernetes source driver, and adds a VM-driver regression test ensuring the restore fast path uses the registered imageTag.

Changes

Snapshot image resolution and proxy setup

Layer / File(s) Summary
Image resolution and DNS proxy gating
src/lib/actions/sandbox/snapshot.ts
resolveSrcPodImage now treats the sandbox registry imageTag as authoritative for drivers that use gateway metadata probing and limits the kubectl-in-gateway fallback to the kubernetes driver. DNS proxy auto-create runs only when srcDriver === "kubernetes".
VM-driver restore test with imageTag verification
test/snapshot-gateway-guard.test.ts
Adds makeVmRestoreToEnv test helper (sets sandbox registry imageTag, stubs openshell/ssh/docker) and a regression test that restores a snapshot to --to clone-1, asserting the registered imageTag appears in output and that the forbidden docker exec probe output is absent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • cv

Poem

🐰 A tiny hop through gateway mist,
ImageTag gleams where probes might twist,
Kubernetes hums when DNS must play,
VM restores glide the registry way,
No docker shout to steal the day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(snapshot): use registered imageTag for VM-driver auto-create' directly addresses the main change: fixing VM-driver auto-create to use the registered imageTag instead of falling back to kubectl probes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/snapshot.ts`:
- Around line 94-96: The current branch returns registeredImage for gateway
metadata probes even when the openshell driver is "vm" or "docker" and the image
lacks an imageTag, causing it to fall through to the kubectl probe; modify the
logic in the block that checks usesGatewayMetadataProbe(registeredDriver) &&
registeredImage so that if registeredDriver is "vm" or "docker" and
registeredImage.imageTag is missing, you return null (fail fast) instead of
registeredImage, otherwise keep the existing return; reference the symbols
usesGatewayMetadataProbe, registeredDriver, registeredImage and the openshell
driver values "vm"/"docker" when updating the condition.

In `@test/snapshot-gateway-guard.test.ts`:
- Around line 220-224: The test currently inspects runCli output but doesn't
assert process exit status; update the assertion after calling runCli (the const
r = runCli(...) invocation) to explicitly verify the command succeeded by
checking the exit code (e.g., assert/expect r.code is 0 or truthy success value
used in your runner). Add this single assertion (using the same test framework
style as other tests, e.g., expect(r.code).toBe(0) or equivalent) immediately
after creating r and before or after the existing output assertions so failures
with partial output fail the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1fbad4c5-6d0b-450c-b57c-e2cf0bb00a2f

📥 Commits

Reviewing files that changed from the base of the PR and between ef84117 and 0c70665.

📒 Files selected for processing (2)
  • src/lib/actions/sandbox/snapshot.ts
  • test/snapshot-gateway-guard.test.ts

Comment thread src/lib/actions/sandbox/snapshot.ts Outdated
Comment thread test/snapshot-gateway-guard.test.ts
…without imageTag

Per PR review: Docker- and VM-driver sandboxes never have the legacy
`openshell-cluster-nemoclaw` container, so falling through to the
`docker exec ... kubectl ...` probe when their imageTag happens to be
missing only wastes a guaranteed-failing docker call (and pollutes
output with a misleading docker error) before returning the same null.
Return imageTag (or null) directly for both drivers; keep the kubectl
probe path strictly for the legacy "kubernetes" driver.

Signed-off-by: sevenc <sevenc@nvidia.com>
@wscurran wscurran added fix Platform: macOS Support for macOS Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. labels May 22, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Platform: macOS Support for macOS Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS][Sandbox] nemoclaw snapshot restore fails on Apple Silicon

2 participants