fix(snapshot): use registered imageTag for VM-driver auto-create#4058
fix(snapshot): use registered imageTag for VM-driver auto-create#4058cr7258 wants to merge 2 commits into
Conversation
Restoring a snapshot from a VM-driver sandbox into a new destination
(`snapshot restore --to <new>`) aborted with:
Cannot auto-create '<new>': could not resolve '<src>' pod image.
PR NVIDIA#3784 taught `probeGatewayRunning()` to treat `openshellDriver: "vm"`
like `"docker"`, but `resolveSrcPodImage()`'s fast path was still
hard-coded to `=== "docker"`. On macOS Apple Silicon (vm driver) the
fast path was skipped and we fell into the legacy
`docker exec openshell-cluster-* kubectl ...` probe — a container that
does not exist on VM-driver hosts — so the resolver always returned
null.
Route both Docker and VM driver sandboxes through the same
`usesGatewayMetadataProbe()` helper so the registered imageTag is
trusted. While here, gate the auto-create DNS-proxy step on the
`"kubernetes"` driver, matching `onboard.ts` — the VM driver does not
use the kubectl-based DNS proxy.
Signed-off-by: sevenc <sevenc@nvidia.com>
📝 WalkthroughWalkthroughPrefers the sandbox registry's registered imageTag for drivers using gateway metadata probes (docker/vm), restricts DNS proxy execution to the kubernetes source driver, and adds a VM-driver regression test ensuring the restore fast path uses the registered imageTag. ChangesSnapshot image resolution and proxy setup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/snapshot.ts`:
- Around line 94-96: The current branch returns registeredImage for gateway
metadata probes even when the openshell driver is "vm" or "docker" and the image
lacks an imageTag, causing it to fall through to the kubectl probe; modify the
logic in the block that checks usesGatewayMetadataProbe(registeredDriver) &&
registeredImage so that if registeredDriver is "vm" or "docker" and
registeredImage.imageTag is missing, you return null (fail fast) instead of
registeredImage, otherwise keep the existing return; reference the symbols
usesGatewayMetadataProbe, registeredDriver, registeredImage and the openshell
driver values "vm"/"docker" when updating the condition.
In `@test/snapshot-gateway-guard.test.ts`:
- Around line 220-224: The test currently inspects runCli output but doesn't
assert process exit status; update the assertion after calling runCli (the const
r = runCli(...) invocation) to explicitly verify the command succeeded by
checking the exit code (e.g., assert/expect r.code is 0 or truthy success value
used in your runner). Add this single assertion (using the same test framework
style as other tests, e.g., expect(r.code).toBe(0) or equivalent) immediately
after creating r and before or after the existing output assertions so failures
with partial output fail the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1fbad4c5-6d0b-450c-b57c-e2cf0bb00a2f
📒 Files selected for processing (2)
src/lib/actions/sandbox/snapshot.tstest/snapshot-gateway-guard.test.ts
…without imageTag Per PR review: Docker- and VM-driver sandboxes never have the legacy `openshell-cluster-nemoclaw` container, so falling through to the `docker exec ... kubectl ...` probe when their imageTag happens to be missing only wastes a guaranteed-failing docker call (and pollutes output with a misleading docker error) before returning the same null. Return imageTag (or null) directly for both drivers; keep the kubectl probe path strictly for the legacy "kubernetes" driver. Signed-off-by: sevenc <sevenc@nvidia.com>
|
✨ |
Summary
This PR fixes
nemoclaw <src> snapshot restore --to <new>aborting on macOS Apple Silicon (VM driver) withCannot auto-create '<new>': could not resolve '<src>' pod image., becauseresolveSrcPodImage()'s fast path was still hard-coded toopenshellDriver === "docker"— a symmetric gap left by #3784 which only fixedprobeGatewayRunning(). Route both Docker and VM driver sandboxes through the sameusesGatewayMetadataProbe()helper and gate the auto-create DNS-proxy step on the"kubernetes"driver to matchonboard.ts.Related Issue
Fixes #4071
Changes
resolveSrcPodImage()fast path now trusts the registeredimageTagfor bothdockerandvmdrivers viausesGatewayMetadataProbe(), so VM-driver auto-create no longer falls into the kubectl-via-docker probe against a container that does not exist.autoCreateSandboxFromSource()DNS-proxy step gated onopenshellDriver === "kubernetes", matching the onboard logic (src/lib/onboard.ts). Previously vm-driver sandboxes invokedsetup-dns-proxy.sheven though they don't use the kubectl-based DNS proxy.test/snapshot-gateway-guard.test.tscoveringsnapshot restore --to <new>on a VM-driver sandbox: dockerexecemits a sentinel that the test asserts never appears, and the registeredimageTagis asserted to show up in the auto-create output.Type of Change
Test the PR locally on a MacOS Apple Silicon device:
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: sevenc sevenc@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests