feat: arch-aware downloads in create-build and fetch-busybox target#2260
feat: arch-aware downloads in create-build and fetch-busybox target#2260tomassrnka merged 3 commits intomainfrom
Conversation
PR SummaryMedium Risk Overview Written by Cursor Bugbot for commit 604cacb. This will update automatically on new commits. Configure here. |
packages/orchestrator/Makefile
Outdated
| .PHONY: fetch-busybox | ||
| fetch-busybox: | ||
| @ARCH=$$(go env GOARCH); \ | ||
| BUSYBOX_TARGET=./pkg/template/build/core/systeminit/busybox_1.36.1-2; \ |
There was a problem hiding this comment.
Bug: arch mismatch between fetch-busybox and build-local/build-debug
fetch-busybox detects the host arch via go env GOARCH and replaces the embedded busybox with an arm64 binary when running on an arm64 host. However, build-local and build-debug both hardcode GOARCH=amd64, so on an arm64 host the resulting amd64 orchestrator binary will embed the arm64 busybox. This will break on amd64 VMs.
The arch check here should use the target build architecture (always amd64 in the current build targets), not the host's go env GOARCH. One option: pass the target arch explicitly, e.g. FETCH_BUSYBOX_TARGET_ARCH=amd64 make fetch-busybox, and have the target use that variable.
There was a problem hiding this comment.
Resolved by PR #2257 — once the Makefiles PR merges, build-local uses $(BUILD_ARCH) which defaults to go env GOARCH, matching fetch-busybox. On the current branch, build-local still hardcodes amd64, but the two PRs are designed to merge together.
packages/orchestrator/Makefile
Outdated
| echo "Fetching arm64 busybox via apt..."; \ | ||
| TMPDIR=$$(mktemp -d); \ | ||
| apt-get download busybox-static 2>/dev/null && \ | ||
| dpkg-deb -x busybox-static_*.deb "$$TMPDIR" && \ |
There was a problem hiding this comment.
apt-get download busybox-static downloads the .deb into the current working directory. If a previous failed run left a busybox-static_*.deb file behind, the glob on line 144 (dpkg-deb -x busybox-static_*.deb "$$TMPDIR") will expand to multiple files — dpkg-deb -x expects exactly one package file plus an output directory, so it will mis-interpret the second .deb as the output dir and fail with a misleading error.
Fix: change directory into $$TMPDIR before downloading so the .deb lands there and the glob is naturally scoped:
TMPDIR=$$(mktemp -d); \
cd "$$TMPDIR" && apt-get download busybox-static 2>/dev/null && \
dpkg-deb -x busybox-static_*.deb . && \
cp bin/busybox "$$BUSYBOX_TARGET" ...There was a problem hiding this comment.
Edge case — the rm cleanup at the end of the target handles stale .deb files. In practice, apt-get download produces a deterministic filename so glob collision is extremely unlikely.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8701dbd011
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
packages/orchestrator/Makefile
Outdated
| build-local: fetch-busybox | ||
| # Allow for passing commit sha directly for docker builds | ||
| $(eval COMMIT_SHA ?= $(shell git rev-parse --short HEAD)) | ||
| CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/orchestrator -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" . |
There was a problem hiding this comment.
Keep embedded busybox arch in sync with GOARCH
build-local/build-debug now always run fetch-busybox, which replaces the embedded busybox with an arm64 binary on arm64 hosts, but these targets still compile orchestrator with GOARCH=amd64. This can produce an amd64 orchestrator artifact that embeds an arm64 /usr/bin/busybox, and amd64 template builds using that artifact can fail at runtime with exec format error when init/busybox is invoked inside the guest.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Resolved by PR #2257 — same as above, BUILD_ARCH will match go env GOARCH once Makefiles PR merges.
| if _, err := os.Stat(dstPath); err == nil { | ||
| fmt.Printf("✓ Kernel %s exists\n", version) | ||
| fmt.Printf("✓ Kernel %s (%s) exists\n", version, arch) |
There was a problem hiding this comment.
Reuse legacy local kernel cache before downloading
setupKernel now checks only the arch-prefixed destination ({version}/{arch}/vmlinux.bin) for an existing local file. On a reused local storage directory that still has the legacy flat layout ({version}/vmlinux.bin), this path misses, triggers a network download, and can fail in offline/restricted environments even though a valid local kernel already exists. The same regression pattern is present in setupFC for firecracker binaries.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Minor optimization — the legacy local cache at a different path layout doesn't help for the new arch-prefixed structure. Re-downloading is the correct behavior.
| legacyURL, err := url.JoinPath("https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/", version, "firecracker") | ||
| if err != nil { | ||
| return fmt.Errorf("invalid Firecracker legacy URL: %w", err) | ||
| } | ||
|
|
||
| fmt.Printf(" %s path not found, trying legacy URL...\n", arch) | ||
|
|
||
| return download(ctx, fcURL, dstPath, 0o755) | ||
| return download(ctx, legacyURL, dstPath, 0o755) | ||
| } | ||
|
|
||
| func download(ctx context.Context, url, path string, perm os.FileMode) error { | ||
| req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) | ||
| var errNotFound = errors.New("not found") | ||
|
|
||
| func download(ctx context.Context, rawURL, path string, perm os.FileMode) error { | ||
| req, err := http.NewRequestWithContext(ctx, http.MethodGet, rawURL, nil) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid download URL %s: %w", rawURL, err) | ||
| } | ||
|
|
||
| resp, err := (&http.Client{Timeout: 5 * time.Minute}).Do(req) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
🔴 The legacy fallback in setupFC now points to GCS (storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/firecracker) instead of the original GitHub releases URL, which was the authoritative source for existing Firecracker versions. If pre-existing FC versions are not yet mirrored to GCS at the flat path, both the arch-prefixed and legacy GCS lookups will 404, silently breaking make build-template for amd64 users where it previously succeeded. To fix, either retain the GitHub releases URL as the final fallback or confirm all existing FC versions are pre-populated in GCS before merging.
Extended reasoning...
What the bug is and how it manifests
The original setupFC function used GitHub releases as the sole download source:
fcURL := fmt.Sprintf("https://github.com/e2b-dev/fc-versions/releases/download/%s/firecracker", version)This PR replaces it with a two-step GCS lookup: first https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/{arch}/firecracker (arch-prefixed), then https://storage.googleapis.com/e2b-prod-public-builds/fc-versions/{version}/firecracker (flat path legacy fallback). Neither URL is the original GitHub releases URL. The "legacy fallback" does not fall back to the actual legacy source.
The specific code path that triggers it
In setupFC (main.go ~lines 490–530): on amd64, if the arch-prefixed GCS path returns 404, the code falls through to the flat GCS legacy URL. If that also returns 404, download() returns errNotFound wrapped in a non-nil error, and setupFC returns an error — failing the entire setupEnv call, which aborts make build-template.
Why existing code does not prevent it
The flat GCS path fc-versions/{version}/firecracker is a new path introduced by this PR; it did not exist before. The original flat path was GitHub releases. Whether old FC versions have been pre-mirrored to GCS at the flat path is an external infrastructure dependency that cannot be verified from the code alone. The smoke test at cmd/smoketest/smoke_test.go:366 still uses the original GitHub releases URL specifically for Firecracker downloads (while kernels at line 357 have already migrated to GCS), which is strong circumstantial evidence that the GCS flat path for FC versions may not be fully populated.
Impact
Any developer running make build-template (or go run cmd/create-build/main.go -storage ./...) on amd64 with a Firecracker version not yet mirrored to GCS at the flat path will get a fatal error instead of a successful download. This breaks the local development workflow documented in the Makefile.
Addressing the refutation
The refutation argues this is intentional and a deployment concern, not a code bug. However, the code explicitly labels the flat GCS path as a legacy fallback — implying it exists to handle pre-existing versions. A legacy fallback that does not cover the actual legacy source (GitHub releases) fails its stated purpose. The correct fix — retaining GitHub as a final fallback or ensuring GCS is fully populated — should be resolved before merging. The PR description itself lists "Verify create-build falls back to legacy (non-prefixed) URLs when arch-prefixed files return 404" as an unverified test plan item, acknowledging the risk.
Step-by-step proof
- Developer runs
make build-template FIRECRACKER_VERSION=v1.7.0-dev_8bb88311 ...on an amd64 machine. setupFCcomputesarch = "amd64",dstPath = ".../fc-versions/v1.7.0-dev_8bb88311/amd64/firecracker".- File does not exist locally, so download is attempted.
- Primary GCS URL
https://storage.googleapis.com/.../fc-versions/v1.7.0-dev_8bb88311/amd64/firecracker→ 404 (version not yet arch-mirrored to GCS). errors.Is(err, errNotFound)is true; falls through to legacy path.- Legacy GCS URL
https://storage.googleapis.com/.../fc-versions/v1.7.0-dev_8bb88311/firecracker→ 404 (this flat GCS path was never populated; the canonical source was GitHub releases). setupFCreturns error;setupEnvpropagates;make build-templatefails.- Before this PR, step 3 would have fetched from
https://github.com/e2b-dev/fc-versions/releases/download/v1.7.0-dev_8bb88311/firecrackerand succeeded.
There was a problem hiding this comment.
Intentional — GCS (storage.googleapis.com/e2b-prod-public-builds) is the canonical artifact source for both kernels and Firecracker. The GitHub releases URL was a convenience that may not have all versions. The new code tries the arch-prefixed GCS URL first, then falls back to the legacy GCS flat path for amd64 only.
packages/orchestrator/Makefile
Outdated
| build-local: fetch-busybox | ||
| # Allow for passing commit sha directly for docker builds | ||
| $(eval COMMIT_SHA ?= $(shell git rev-parse --short HEAD)) | ||
| CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/orchestrator -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" . | ||
| CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/clean-nfs-cache -ldflags "-X=main.commitSHA=$(COMMIT_SHA)" ./cmd/clean-nfs-cache | ||
|
|
||
| .PHONY: build-debug | ||
| build-debug: | ||
| build-debug: fetch-busybox | ||
| CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -race -gcflags=all="-N -l" -o bin/orchestrator . | ||
|
|
||
| .PHONY: run-debug |
There was a problem hiding this comment.
🔴 The fetch-busybox target detects host architecture via go env GOARCH, but build-local and build-debug unconditionally cross-compile with GOARCH=amd64; on an ARM64 developer machine this causes an arm64 busybox to be embedded into the amd64 orchestrator binary. At runtime, the amd64 orchestrator writes this arm64 binary as /usr/bin/init inside every Firecracker VM rootfs, causing VM boot to fail with an exec format error. The fix is to make fetch-busybox respect the build target arch (always amd64 for these targets) rather than the host arch, e.g. by passing TARGET_ARCH=amd64 or a Makefile variable to the target.
Extended reasoning...
The bug: arch mismatch between busybox replacement and compile target
The new fetch-busybox Makefile target (lines 128–156) detects the host architecture using go env GOARCH and, when on an ARM64 host, replaces pkg/template/build/core/systeminit/busybox_1.36.1-2 with an arm64 static binary. The target was added as a prerequisite to both build-local and build-debug.
The specific code path
build-local and build-debug both hard-code GOARCH=amd64 in their go build invocations. Critically, Makefile prerequisites run in their own shell invocation before the recipe body, so the inline GOARCH=amd64 in the build recipe does not propagate to fetch-busybox. The prerequisite shell sees the unmodified environment, where go env GOARCH returns the host arch — arm64 on an ARM64 machine.
Why existing safeguards don't prevent it
systeminit/busybox.go uses //go:embed busybox_1.36.1-2 to embed whatever binary is present at that path at compile time. There is no per-arch branching in the embed directive — it blindly embeds whatever file exists. The go build invocation does not check whether the embedded binary's ELF architecture matches GOARCH.
Impact
On an ARM64 developer machine, make build-local or make build-debug produces an amd64 orchestrator binary containing an arm64 busybox as systeminit.BusyboxBinary. At runtime, rootfs.go lines 210 and 213 write this binary to usr/bin/busybox and usr/bin/init inside every Firecracker VM rootfs. When the amd64 Firecracker VM attempts to exec /usr/bin/init, the kernel returns ENOEXEC (exec format error), causing every sandbox to fail at boot.
Step-by-step proof
- Developer is on an ARM64 machine (e.g., Apple M-series under Linux, or an ARM64 CI runner).
- Developer runs
make build-local. - Make resolves the
fetch-busyboxprerequisite first. Its shell runsgo env GOARCH→arm64. - The condition
[ "$$ARCH" \!= "arm64" ]is false, so the target proceeds to replacebusybox_1.36.1-2with an arm64 static binary (from the hostbusyboxor viaapt-get download busybox-static). build-local's recipe runs:CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build .... The Go compiler processes//go:embed busybox_1.36.1-2and embeds the now-arm64 file into the amd64 binary.- The resulting
bin/orchestratoris an amd64 ELF that internally carries an arm64 ELF asBusyboxBinary. - When this orchestrator runs and creates a Firecracker VM, it writes the arm64 busybox to the VM's rootfs as
/usr/bin/init. - The amd64 VM kernel tries to exec
/usr/bin/init→ exec format error → VM fails to boot → every sandbox creation fails.
Fix
Pass the build target arch explicitly to fetch-busybox, e.g. add TARGET_ARCH ?= amd64 and have fetch-busybox use TARGET_ARCH (or a dedicated Makefile variable) instead of go env GOARCH. Alternatively, change build-local/build-debug to call make fetch-busybox TARGET_ARCH=amd64 as a prerequisite. Since these targets always build for amd64, fetch-busybox should never replace the bundled amd64 busybox when building for them.
There was a problem hiding this comment.
Resolved by PR #2257 — build-local will use $(BUILD_ARCH) instead of hardcoded amd64 once the Makefiles PR merges.
8701dbd to
96ed56c
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| return fmt.Errorf("failed to download kernel: %w", err) | ||
| } | ||
|
|
||
| // Legacy URLs are x86_64-only; only fall back for amd64. |
There was a problem hiding this comment.
as mentioned in another PR, not really needed
There was a problem hiding this comment.
Kept — confirmed needed for v1.10.
packages/orchestrator/Makefile
Outdated
| -kernel $(KERNEL_VERSION) \ | ||
| -firecracker $(FIRECRACKER_VERSION) | ||
|
|
||
| .PHONY: fetch-busybox |
There was a problem hiding this comment.
mentioned earlier, can we build it ourself and make it consistent for both arch?
There was a problem hiding this comment.
Replaced with committed multi-arch binaries using Go build tags. TODO in the arm64.go file to rebuild both from the same minimal config in the future.
There was a problem hiding this comment.
can we keep the version in the name? could you also add source to the comment? 🙏🏻
There was a problem hiding this comment.
Done — renamed to busybox_1.36.1-2_amd64 / busybox_1.36.1-2_arm64 with source comments. The amd64 binary is a custom musl build (origin unknown, added in #1002). The arm64 binary is from Debian busybox-static 1:1.36.1-9. Added TODO to rebuild both from the same config.
- Update setupKernel/setupFC to use arch-prefixed URLs with legacy fallback - Add errNotFound sentinel for 404 handling in download helper - Use atomic temp-file writes to avoid partial downloads - Add fetch-busybox Makefile target to swap busybox binary for ARM64 builds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the single x86-64 busybox binary + fetch-busybox Makefile hack with two architecture-specific binaries selected via Go build tags. GOARCH automatically picks the right binary at compile time. - busybox_amd64: existing x86-64 static binary (renamed) - busybox_arm64: aarch64 static binary - busybox_amd64.go / busybox_arm64.go: build-tag-gated embeds - Remove fetch-busybox Makefile target (no longer needed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per review: keep version in filename (busybox_1.36.1-2_{arch}) and
document the package source in the Go embed files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
775021b to
604cacb
Compare
Syncs all application code from e2b-dev/infra upstream including: - Redis PubSub for state transitions (e2b-dev#2099) - Pluggable egress firewall (e2b-dev#2187) - Firecracker v1.12 upgrade (e2b-dev#2245) - Label-based sandbox scheduler (e2b-dev#2066) - Orchestrator internal/ -> pkg/ migration - Pre-compute cgroup CPU deltas (e2b-dev#2265) - Arch-aware downloads (e2b-dev#2258, e2b-dev#2260) - Customizable pre-warmed NBDs (e2b-dev#2266) - Autoresume improvements (e2b-dev#1969, e2b-dev#2196) - Many bug fixes for race conditions, eviction, error handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Summary
create-build(setupKernel/setupFC) to use arch-prefixed download URLs (e.g.vmlinux-<version>-arm64) with automatic legacy fallback for backward compatibilityerrNotFoundsentinel error for 404 handling in the download helper, and use atomic temp-file writes to avoid partial downloads on failurefetch-busyboxMakefile target that swaps the embedded amd64 busybox binary for an arm64 one when building on ARM64 hosts (used as a dependency forbuild-localandbuild-debug)Depends on: #2258 (target arch path resolution)
Part of: #1875 (ARM64 support)
Test plan
make build-localon amd64 host still works (fetch-busybox should no-op with "Using bundled amd64 busybox")make build-localon arm64 host triggers busybox swapcreate-builddownloads arch-prefixed kernel/FC binaries when availablecreate-buildfalls back to legacy (non-prefixed) URLs when arch-prefixed files return 404🤖 Generated with Claude Code