feat: add initial ARM64 (aarch64) architecture support#1875
feat: add initial ARM64 (aarch64) architecture support#1875tomassrnka wants to merge 11 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 589a0596cb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
730d2d7 to
06f483f
Compare
8e7806e to
20baf1c
Compare
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | ||
| // Provide fallback values so callers don't get an error. | ||
| if (family == "" || model == "") && runtime.GOARCH == "arm64" { | ||
| if family == "" { | ||
| family = "arm64" | ||
| } | ||
| if model == "" { | ||
| model = "0" | ||
| } | ||
| } else if family == "" || model == "" { |
There was a problem hiding this comment.
let's make it cleaner
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | |
| // Provide fallback values so callers don't get an error. | |
| if (family == "" || model == "") && runtime.GOARCH == "arm64" { | |
| if family == "" { | |
| family = "arm64" | |
| } | |
| if model == "" { | |
| model = "0" | |
| } | |
| } else if family == "" || model == "" { | |
| // On ARM64, gopsutil doesn't populate Family/Model from /proc/cpuinfo. | |
| // Provide fallback values so callers don't get an error. | |
| if (runtime.GOARCH == "arm64") { | |
| if family == "" { | |
| family = "arm64" | |
| } | |
| if model == "" { | |
| model = "0" | |
| } | |
| } | |
| if family == "" || model == "" { |
04f0ec3 to
45e635f
Compare
|
@cursoragent bugbot run |
|
Unable to authenticate your request. Please make sure to connect your GitHub account to Cursor. Go to Cursor |
|
@cursoragent bugbot run |
|
@claude review this draft PR |
packages/orchestrator/Makefile
Outdated
|
|
||
| .PHONY: fetch-busybox | ||
| fetch-busybox: | ||
| @ARCH=$$(dpkg --print-architecture 2>/dev/null || echo "amd64"); \ |
There was a problem hiding this comment.
The fetch-busybox target uses dpkg --print-architecture to detect the arch, but TargetArch() uses TARGET_ARCH env var. If a user sets TARGET_ARCH=arm64 on an amd64 host, this target will still use the amd64 busybox (since dpkg returns amd64), but the build will be for arm64, causing a binary mismatch.
| if err := download(ctx, archURL, dstPath, 0o644); err == nil { | ||
| return nil | ||
| } else if !errors.Is(err, errNotFound) { | ||
| return fmt.Errorf("failed to download arm64 kernel: %w", err) |
There was a problem hiding this comment.
For arm64, if the arch-specific download (line 424) fails with a non-404 error (network issue, permissions, etc.), the code returns immediately without trying the legacy fallback. This differs from the Firecracker logic below which falls through to legacy on any 404. Consider consistent fallback behavior.
| hugePages bool, | ||
| ) error { | ||
| smt := true | ||
| smt := runtime.GOARCH != "arm64" |
There was a problem hiding this comment.
SMT is disabled based on runtime.GOARCH (compile-time host arch), not TARGET_ARCH. If cross-compiling (TARGET_ARCH=arm64 on amd64 host), the binary will incorrectly enable SMT for ARM64 VMs. Should check the actual target arch or defer this to runtime detection.
|
|
||
| echo "Making configuration immutable" | ||
| $BUSYBOX chattr +i /etc/resolv.conf | ||
| $BUSYBOX chattr +i /etc/resolv.conf 2>/dev/null || true |
There was a problem hiding this comment.
Silencing chattr failures could mask real issues beyond busybox compatibility. If chattr fails on a full-featured system due to filesystem type, permissions, or corrupted inodes, the script will continue silently. Consider checking if chattr exists first, or logging the failure reason.
|
@claude review this draft PR |
d1c01e1 to
0b5d74f
Compare
|
Re: CI workflow deduplication (@jakubno's comment about matrix inside We'll handle this differently — the ARM64 tests need a self-hosted runner with specific setup (KVM, hugepages, unprivileged UFFD, NBD modules) which doesn't fit cleanly into a matrix with the standard amd64 runner. Keeping them as a separate workflow ( |
- Add TargetArch() utility for runtime architecture resolution with TARGET_ARCH env var override and alias normalization (x86_64↔amd64, aarch64↔arm64) - Add BUILD_ARCH/BUILD_PLATFORM variables to api, client-proxy, envd, and orchestrator Makefiles (defaults to host GOARCH) - Add fetch-busybox target for ARM64 busybox binary swap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- config.go: prefer arch-prefixed paths ({version}/{arch}/binary) with
legacy flat path fallback for existing production nodes
- create-build: download from GCS with {version}/{arch}/ layout, legacy
fallback only for amd64 and only on 404
- OCI: use TargetArch() for container image platform selection
- Tests for arch-prefixed vs legacy path precedence
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- client.go: disable SMT on ARM64 (no hyperthreading, Firecracker rejects SMT=true on ARM) - script_builder.go: disable seccomp on ARM64 (upstream FC aarch64 filter missing userfaultfd syscall) - userfaultfd.go: skip UFFD write-protection flag on ARM64 (kernel doesn't support it; KVM dirty log used instead for diff tracking) - machineinfo: ARM64 fallback for CPU Family/Model when gopsutil doesn't populate them - smoketest: use runtime.GOARCH instead of hardcoded amd64 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- pr-tests-arm64.yml: cross-compile all packages, run unit tests on self-hosted ARM64 runner with KVM, hugepages, and NBD - setup-arm64-runner.sh: configure self-hosted runner for ARM64 tests - pull-request.yml: invoke ARM64 test workflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARM64's weaker memory model reliably triggers races that x86 papers over: - clean-nfs-cache: fd use-after-close between Scanner and Statter goroutines — pass directory path string instead of *os.File - nbd/path_direct: loop variable capture in goroutine closure - envd conversion_test: shared connect.Response across parallel subtests — use RunAndReturn to create fresh response per call - errorcollector_test: plain bool and ctx variable reuse in concurrent test — use atomic.Bool and distinct context variables - db/testutils: goose v3.26 SetDialect() races on package globals when parallel tests run migrations — serialize with sync.Mutex - uffd/page_mmap: graceful skip on hugepage ENOMEM in CI - async_wp_test: skip UFFD write-protection test on ARM64 (unsupported) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document architecture naming conventions, SMT behavior, cross-arch deployment via TARGET_ARCH, and path resolution strategy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c5495a4 to
2ffb97e
Compare
Remove the ARM64 guard that skipped UFFD WP — ARM64 kernels 6.10+ support it (merged upstream in Linux 6.10). The orchestrator requires kernel 6.10+ on ARM64 hosts. - userfaultfd.go: remove runtime.GOARCH != "arm64" guard from WP flag - async_wp_test.go: remove ARM64 test skip CI runner (ubuntu-24.04-arm) runs kernel 6.14 which supports WP. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR SummaryMedium Risk Overview Written by Cursor Bugbot for commit c9c2f76. This will update automatically on new commits. Configure here. |
HTTP handler goroutines concurrently access shared slices and maps without synchronization: - receivedParts map: switch to sync.Map (matches pattern in later tests) - partSizes slice: add sync.Mutex around append - retryTimes slice: add sync.Mutex around append Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Can we also run the integration tests? 🙏🏻 |
- Remove unused StracePfx field/template var from script_builder.go - Revert unrelated package-lock.json version drift from branch divergence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARM64 Support — PR BreakdownThe
Merge orderCleanup applied before split
Closing this issue — tracked by individual PRs above. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Seccomp completely disabled on ARM64 Firecracker sandboxes
- Added SeccompFilterPath() to check for custom seccomp filter with userfaultfd syscall; uses --seccomp-filter when available, only falls back to --no-seccomp if custom filter is missing.
- ✅ Fixed: fetch-busybox downloads wrong architecture in cross-compilation
- Changed apt-get download to explicitly request arm64 architecture and added post-download architecture validation to ensure the binary is actually aarch64.
Or push these changes by commenting:
@cursor push dc30338e96
Preview (dc30338e96)
diff --git a/packages/orchestrator/Makefile b/packages/orchestrator/Makefile
--- a/packages/orchestrator/Makefile
+++ b/packages/orchestrator/Makefile
@@ -146,12 +146,17 @@
elif command -v apt-get >/dev/null 2>&1 && command -v dpkg-deb >/dev/null 2>&1; then \
echo "Fetching arm64 busybox via apt..."; \
TMPDIR=$$(mktemp -d); \
- apt-get download busybox-static 2>/dev/null && \
- dpkg-deb -x busybox-static_*.deb "$$TMPDIR" && \
+ apt-get download busybox-static:arm64 2>/dev/null && \
+ dpkg-deb -x busybox-static_*_arm64.deb "$$TMPDIR" && \
+ if ! file "$$TMPDIR/bin/busybox" 2>/dev/null | grep -q 'aarch64\|ARM aarch64'; then \
+ rm -rf "$$TMPDIR" busybox-static_*.deb; \
+ echo "⚠ Downloaded busybox is not arm64 architecture"; \
+ exit 1; \
+ fi && \
cp "$$TMPDIR/bin/busybox" "$$BUSYBOX_TARGET" && \
rm -rf "$$TMPDIR" busybox-static_*.deb && \
echo "✓ Replaced embedded busybox with arm64 binary (from busybox-static package)" || \
- { rm -rf "$$TMPDIR" busybox-static_*.deb; echo "⚠ apt-get download failed"; exit 1; }; \
+ { rm -rf "$$TMPDIR" busybox-static_*.deb; echo "⚠ apt-get download failed (arm64 architecture may not be configured)"; exit 1; }; \
else \
echo "⚠ ARM64 busybox required but no method available to fetch it."; \
echo " Options:"; \
diff --git a/packages/orchestrator/pkg/sandbox/fc/config.go b/packages/orchestrator/pkg/sandbox/fc/config.go
--- a/packages/orchestrator/pkg/sandbox/fc/config.go
+++ b/packages/orchestrator/pkg/sandbox/fc/config.go
@@ -13,6 +13,13 @@
FirecrackerBinaryName = "firecracker"
+ // SeccompFilterName is the name of the custom seccomp filter BPF file.
+ // On aarch64, the default Firecracker seccomp filter does not include the
+ // userfaultfd syscall (nr 282), which is required for UFFD-based snapshot
+ // restore. A custom filter that adds userfaultfd can be placed at:
+ // {FirecrackerVersionsDir}/{version}/[{arch}/]seccomp-filter.bpf
+ SeccompFilterName = "seccomp-filter.bpf"
+
envsDisk = "/mnt/disks/fc-envs/v1"
buildDirName = "builds"
@@ -55,6 +62,25 @@
return filepath.Join(config.FirecrackerVersionsDir, t.FirecrackerVersion, FirecrackerBinaryName)
}
+// SeccompFilterPath returns the path to a custom seccomp filter BPF file if it exists.
+// Returns empty string if no custom filter is found. The custom filter should include
+// the userfaultfd syscall for UFFD-based snapshot restore on aarch64.
+func (t Config) SeccompFilterPath(config cfg.BuilderConfig) string {
+ // Check arch-prefixed path first ({version}/{arch}/seccomp-filter.bpf)
+ archPath := filepath.Join(config.FirecrackerVersionsDir, t.FirecrackerVersion, utils.TargetArch(), SeccompFilterName)
+ if _, err := os.Stat(archPath); err == nil {
+ return archPath
+ }
+
+ // Fall back to legacy flat path ({version}/seccomp-filter.bpf)
+ flatPath := filepath.Join(config.FirecrackerVersionsDir, t.FirecrackerVersion, SeccompFilterName)
+ if _, err := os.Stat(flatPath); err == nil {
+ return flatPath
+ }
+
+ return ""
+}
+
type RootfsPaths struct {
TemplateVersion uint64
TemplateID string
diff --git a/packages/orchestrator/pkg/sandbox/fc/config_test.go b/packages/orchestrator/pkg/sandbox/fc/config_test.go
--- a/packages/orchestrator/pkg/sandbox/fc/config_test.go
+++ b/packages/orchestrator/pkg/sandbox/fc/config_test.go
@@ -111,3 +111,69 @@
// Should prefer the arch-prefixed path
assert.Equal(t, filepath.Join(dir, "vmlinux-6.1.102", arch, "vmlinux.bin"), result)
}
+
+func TestSeccompFilterPath_ArchPrefixed(t *testing.T) {
+ t.Parallel()
+ dir := t.TempDir()
+ arch := utils.TargetArch()
+
+ // Create the arch-prefixed seccomp filter
+ archDir := filepath.Join(dir, "v1.12.0", arch)
+ require.NoError(t, os.MkdirAll(archDir, 0o755))
+ require.NoError(t, os.WriteFile(filepath.Join(archDir, "seccomp-filter.bpf"), []byte("bpf"), 0o644))
+
+ config := cfg.BuilderConfig{FirecrackerVersionsDir: dir}
+ fc := Config{FirecrackerVersion: "v1.12.0"}
+
+ result := fc.SeccompFilterPath(config)
+
+ assert.Equal(t, filepath.Join(dir, "v1.12.0", arch, "seccomp-filter.bpf"), result)
+}
+
+func TestSeccompFilterPath_LegacyFallback(t *testing.T) {
+ t.Parallel()
+ dir := t.TempDir()
+
+ // Only create the legacy flat seccomp filter
+ require.NoError(t, os.MkdirAll(filepath.Join(dir, "v1.12.0"), 0o755))
+ require.NoError(t, os.WriteFile(filepath.Join(dir, "v1.12.0", "seccomp-filter.bpf"), []byte("bpf"), 0o644))
+
+ config := cfg.BuilderConfig{FirecrackerVersionsDir: dir}
+ fc := Config{FirecrackerVersion: "v1.12.0"}
+
+ result := fc.SeccompFilterPath(config)
+
+ assert.Equal(t, filepath.Join(dir, "v1.12.0", "seccomp-filter.bpf"), result)
+}
+
+func TestSeccompFilterPath_NoneExists(t *testing.T) {
+ t.Parallel()
+ dir := t.TempDir()
+
+ // No seccomp filter — should return empty string
+ config := cfg.BuilderConfig{FirecrackerVersionsDir: dir}
+ fc := Config{FirecrackerVersion: "v1.12.0"}
+
+ result := fc.SeccompFilterPath(config)
+
+ assert.Equal(t, "", result)
+}
+
+func TestSeccompFilterPath_PrefersArchOverLegacy(t *testing.T) {
+ t.Parallel()
+ dir := t.TempDir()
+ arch := utils.TargetArch()
+
+ // Create BOTH arch-prefixed and legacy flat seccomp filters
+ require.NoError(t, os.MkdirAll(filepath.Join(dir, "v1.12.0", arch), 0o755))
+ require.NoError(t, os.WriteFile(filepath.Join(dir, "v1.12.0", arch, "seccomp-filter.bpf"), []byte("arch-bpf"), 0o644))
+ require.NoError(t, os.WriteFile(filepath.Join(dir, "v1.12.0", "seccomp-filter.bpf"), []byte("legacy-bpf"), 0o644))
+
+ config := cfg.BuilderConfig{FirecrackerVersionsDir: dir}
+ fc := Config{FirecrackerVersion: "v1.12.0"}
+
+ result := fc.SeccompFilterPath(config)
+
+ // Should prefer the arch-prefixed path
+ assert.Equal(t, filepath.Join(dir, "v1.12.0", arch, "seccomp-filter.bpf"), result)
+}
diff --git a/packages/orchestrator/pkg/sandbox/fc/script_builder.go b/packages/orchestrator/pkg/sandbox/fc/script_builder.go
--- a/packages/orchestrator/pkg/sandbox/fc/script_builder.go
+++ b/packages/orchestrator/pkg/sandbox/fc/script_builder.go
@@ -87,13 +87,20 @@
rootfsPaths RootfsPaths,
namespaceID string,
) startScriptArgs {
- // On ARM64, disable seccomp to allow userfaultfd syscall for snapshot restore.
- // The upstream Firecracker seccomp filter for aarch64 does not include the
- // userfaultfd syscall (nr 282), causing snapshot loading to fail with
- // "Failed to UFFD object: System error".
+ // On ARM64, we need to handle seccomp specially because the upstream Firecracker
+ // seccomp filter for aarch64 does not include the userfaultfd syscall (nr 282),
+ // which is required for UFFD-based snapshot restore.
+ //
+ // If a custom seccomp filter is available (seccomp-filter.bpf), use it via
+ // --seccomp-filter. This custom filter should be the default aarch64 filter
+ // with userfaultfd added. If no custom filter exists, fall back to --no-seccomp.
var extraArgs string
if runtime.GOARCH == "arm64" {
- extraArgs = " --no-seccomp"
+ if filterPath := versions.SeccompFilterPath(sb.builderConfig); filterPath != "" {
+ extraArgs = " --seccomp-filter " + filterPath
+ } else {
+ extraArgs = " --no-seccomp"
+ }
}
return startScriptArgs{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| var extraArgs string | ||
| if runtime.GOARCH == "arm64" { | ||
| extraArgs = " --no-seccomp" | ||
| } |
There was a problem hiding this comment.
Seccomp completely disabled on ARM64 Firecracker sandboxes
High Severity
On ARM64, --no-seccomp is appended to the Firecracker command, completely disabling the seccomp sandbox rather than just allowing the missing userfaultfd syscall. This removes an entire security boundary for all sandbox processes on ARM64 hosts. Firecracker supports custom seccomp filters via --seccomp-filter which could allowlist only the needed userfaultfd syscall (nr 282) while keeping all other restrictions intact.
| cp "$$TMPDIR/bin/busybox" "$$BUSYBOX_TARGET" && \ | ||
| rm -rf "$$TMPDIR" busybox-static_*.deb && \ | ||
| echo "✓ Replaced embedded busybox with arm64 binary (from busybox-static package)" || \ | ||
| { rm -rf "$$TMPDIR" busybox-static_*.deb; echo "⚠ apt-get download failed"; exit 1; }; \ |
There was a problem hiding this comment.
fetch-busybox downloads wrong architecture in cross-compilation
Low Severity
When BUILD_ARCH=arm64 is set on an amd64 host (cross-compilation), the apt-get download busybox-static fallback downloads the host architecture package (amd64), not arm64. The script then silently embeds the wrong-architecture busybox binary into the build, since there's no post-download architecture validation via file in this code path (unlike the host-busybox check in the elif above).



Summary
Adds ARM64/aarch64 architecture support to the E2B infrastructure, enabling builds and sandbox execution on Apple Silicon and other ARM64 hosts (via Lima VM + nested KVM).
Changes by commit:
GOARCH=amd64and--platform linux/amd64with$(shell go env GOARCH)across all 4 service Makefilesruntime.GOARCHfor OCI image platform, add ARM64 fallback for CPU detection (gopsutil doesn't populate Family/Model on ARM)chattrcalls non-fatal (|| true) for busybox versions that lack itarm64/subdirectory first, falls back to generic),E2B_BASE_IMAGEenv var for base image overrideRelated PRs:
Test plan
make fetch-busyboxon ARM64 host to swap busybox binarycreate-buildon ARM64uname -min sandbox returnsaarch64🤖 Generated with Claude Code
Note
High Risk
High risk because it changes core sandbox/Firecracker startup and snapshot handling (including disabling seccomp on ARM64) and alters kernel/Firecracker path resolution and OCI platform selection, which can impact both security posture and runtime stability across architectures.
Overview
Adds initial ARM64 support end-to-end by introducing an ARM64 PR workflow (cross-compile + native arm runners) and runner setup script, making service build/publish Makefiles architecture-aware, and teaching the orchestrator to resolve Firecracker/kernel artifacts and OCI pulls by
TARGET_ARCHwith legacy fallbacks. It also adjusts runtime behavior for ARM64 (disable SMT, tweak UFFD write-protect usage, pass--no-seccompfor Firecracker on ARM64), hardens/deflakes several concurrency and hugepage-related tests, and updates template/rootfs provisioning to better handle clock-skewed APT, missingchattr, static network setup, and ext4 repair retries.Written by Cursor Bugbot for commit 5491da7. This will update automatically on new commits. Configure here.