docs: ARM64 architecture support in orchestrator README#2254
docs: ARM64 architecture support in orchestrator README#2254tomassrnka wants to merge 4 commits intomainfrom
Conversation
PR SummaryLow Risk Overview Written by Cursor Bugbot for commit 01c9167. This will update automatically on new commits. Configure here. |
|
|
||
| ### ARM64-specific behavior | ||
|
|
||
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) |
There was a problem hiding this comment.
The code doesn't actually disable SMT — fc/client.go:329 sets smt := true unconditionally with no ARM64 branch. Either the behavior isn't implemented yet or this claim is incorrect.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
| ### ARM64-specific behavior | ||
|
|
||
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) | ||
| - **CPU detection** uses fallback values since `gopsutil` doesn't populate Family/Model on ARM64 |
There was a problem hiding this comment.
There are no fallback values in the current code. machineinfo/main.go:25-26 returns an error when Family or Model is empty — it does not fall back to any default values. This will cause the orchestrator to fail on ARM64 with gopsutil if this behavior isn't implemented yet.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
|
|
||
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) | ||
| - **CPU detection** uses fallback values since `gopsutil` doesn't populate Family/Model on ARM64 | ||
| - **OCI platform** is set to the target architecture instead of hardcoded `amd64` |
There was a problem hiding this comment.
oci/oci.go:59-62 still hardcodes Architecture: "amd64" in DefaultPlatform, and both GetPublicImage and GetImage use this without any target-arch override. The documented behavior doesn't match the code.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
packages/orchestrator/README.md
Outdated
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) | ||
| - **CPU detection** uses fallback values since `gopsutil` doesn't populate Family/Model on ARM64 | ||
| - **OCI platform** is set to the target architecture instead of hardcoded `amd64` | ||
| - **Busybox binary** must be swapped before building: `make fetch-busybox` |
There was a problem hiding this comment.
make fetch-busybox does not exist in any Makefile in the orchestrator package. Documenting a non-existent command will confuse developers trying to build for ARM64.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
|
|
||
| ### Cross-architecture deployment | ||
|
|
||
| `TARGET_ARCH` is a **runtime** environment variable that overrides the architecture used for path resolution and OCI image pulls. When unset, defaults to the host architecture (`runtime.GOARCH`). |
There was a problem hiding this comment.
TARGET_ARCH is not read anywhere in the Go source code — no os.Getenv("TARGET_ARCH") call exists. The env var section and the environment variables table entry (line 282) document behavior that isn't implemented yet, which could mislead users.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f46dd522b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| `TARGET_ARCH` is a **runtime** environment variable that overrides the architecture used for path resolution and OCI image pulls. When unset, defaults to the host architecture (`runtime.GOARCH`). | ||
|
|
||
| ```bash | ||
| # Run orchestrator targeting amd64 paths from an arm64 host | ||
| TARGET_ARCH=amd64 ./bin/orchestrator |
There was a problem hiding this comment.
Remove unsupported TARGET_ARCH override instructions
The new section documents TARGET_ARCH as a runtime override for binary path resolution and OCI pulls, but the implementation does not read this env var (pkg/cfg/model.go has no TARGET_ARCH field), still builds Firecracker/kernel paths without an architecture segment (pkg/sandbox/fc/config.go), and keeps OCI pulls pinned to amd64 via DefaultPlatform (pkg/template/build/core/oci/oci.go). Users following this guidance on mixed-arch hosts will expect overrides to work, but runtime behavior will not match the docs and can lead to missing binaries or wrong image architecture pulls.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
packages/orchestrator/README.md
Outdated
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) | ||
| - **CPU detection** uses fallback values since `gopsutil` doesn't populate Family/Model on ARM64 | ||
| - **OCI platform** is set to the target architecture instead of hardcoded `amd64` | ||
| - **Busybox binary** must be swapped before building: `make fetch-busybox` |
There was a problem hiding this comment.
Replace invalid busybox build command
The README now instructs make fetch-busybox, but there is no such Makefile target or script in this repo (the only occurrence is this new README line). Following this step will fail immediately with “No rule to make target”, so ARM64 setup instructions are not executable as written.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
packages/orchestrator/README.md
Outdated
| - OCI image platform for container pulls | ||
|
|
||
| It does **not** affect: | ||
| - Makefile compilation — use `GOARCH` directly for cross-compilation: `GOARCH=amd64 make build-local` |
There was a problem hiding this comment.
🔴 The README's cross-compilation example (GOARCH=amd64 make build-local) implies that varying GOARCH controls the target architecture, but the build-local Makefile target hardcodes GOARCH=amd64 inline in the recipe (lines 27-28), so GOARCH=arm64 make build-local silently produces an amd64 binary. Operators trying to cross-compile for ARM64 using this documented approach will produce the wrong binary without any error.
Extended reasoning...
What the bug is: The README's new ARM64 section states: "It does not affect: Makefile compilation — use GOARCH directly for cross-compilation: GOARCH=amd64 make build-local". This implies that an operator can vary GOARCH (e.g., set it to arm64) to compile for a different target architecture using the build-local Make target.
The specific code path: In packages/orchestrator/Makefile, lines 27-28, the build-local recipe reads:
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/orchestrator ...
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o bin/clean-nfs-cache ...
GOARCH=amd64 is set inline in the shell command within the recipe itself, not as a Makefile variable.
Why existing code doesn't prevent it: In GNU Make, environment variables passed on the command line (e.g., GOARCH=arm64 make build-local) override Makefile variable assignments (:= or =), but they do NOT override inline shell-level variable assignments inside recipe commands. Since GOARCH=amd64 is a shell prefix on the go build command inside the recipe, it unconditionally overrides any caller-supplied GOARCH for that process. There is no way for the user to override this without modifying the Makefile itself.
The impact: A developer or operator following the README documentation and running GOARCH=arm64 make build-local to produce an ARM64 binary will instead silently get an amd64 binary. This could lead to deploying the wrong architecture binary to an ARM64 host, resulting in a crash or exec format error at runtime — a potentially confusing failure since no error occurs at build time.
Step-by-step proof:
- README says: use
GOARCH=amd64 make build-localfor cross-compilation (implying varying GOARCH works) - Operator runs:
GOARCH=arm64 make build-localtargeting an ARM64 host - Make invokes the recipe:
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build ... - The inline
GOARCH=amd64in the recipe overrides the caller'sGOARCH=arm64 - Result: an amd64 binary in
bin/orchestrator, with no warning or error - Deploying this to an ARM64 host results in
exec format error
How to fix it: Either (a) replace the hardcoded GOARCH=amd64 in the recipe with a Makefile variable that defaults to amd64 but can be overridden (GOARCH ?= amd64 as a Makefile variable, then reference it in the recipe as GOARCH=$(GOARCH)), or (b) update the README to accurately state that build-local always compiles for amd64 and provide a separate build-local-arm64 target or explicit instructions for ARM64 cross-compilation.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
packages/orchestrator/README.md
Outdated
| - **SMT** is disabled (ARM processors don't support simultaneous multi-threading) | ||
| - **CPU detection** uses fallback values since `gopsutil` doesn't populate Family/Model on ARM64 | ||
| - **OCI platform** is set to the target architecture instead of hardcoded `amd64` | ||
| - **Busybox binary** must be swapped before building: `make fetch-busybox` |
There was a problem hiding this comment.
🔴 The README documents make fetch-busybox as a required ARM64 setup step, but this make target does not exist anywhere in the repository. Running it will fail with "No rule to make target 'fetch-busybox'", blocking ARM64 developers who follow these instructions. The busybox binary is already committed to the repo at packages/orchestrator/pkg/template/build/core/systeminit/busybox_1.36.1-2 and embedded at compile time — the correct fix is to document the actual manual replacement steps instead.
Extended reasoning...
Bug: Non-existent make fetch-busybox target
What the bug is:
The newly added ARM64 section in the README states: "Busybox binary must be swapped before building: make fetch-busybox". This make target does not exist in the orchestrator Makefile or anywhere else in the repository.
The specific code path:
The documented command appears on line 254 of packages/orchestrator/README.md in the ARM64-specific behavior section. The orchestrator Makefile (packages/orchestrator/Makefile) contains targets: init, generate, build, build-local, build-debug, run-debug, run-local, various upload/* and build-and-upload/* targets, test, test-docker, build-template, migrate, and lint. There is no fetch-busybox or any busybox-related target in any Makefile in the repo.
Why existing code doesn't prevent it:
The busybox binary is handled entirely differently from what the docs suggest. The binary at packages/orchestrator/pkg/template/build/core/systeminit/busybox_1.36.1-2 is committed directly to the repository and embedded at compile time via a //go:embed busybox_1.36.1-2 directive in busybox.go. There is no automated fetch mechanism — the replacement for ARM64 must be done manually by swapping the binary file. The make target referenced in the docs was never created.
Impact:
Any ARM64 developer following the README setup instructions will run make fetch-busybox and receive a hard failure: make: *** No rule to make target 'fetch-busybox'. Stop. This leaves them with no path forward from the documented instructions and unable to complete the ARM64 build setup.
How to fix:
Remove the make fetch-busybox reference and replace it with accurate instructions. Since the busybox binary is embedded at compile time from the committed file, the correct guidance would be something like: manually replace packages/orchestrator/pkg/template/build/core/systeminit/busybox_1.36.1-2 with an ARM64 build of busybox before compiling.
Step-by-step proof:
- An ARM64 developer reads the new ARM64 section in README.md and sees the bullet: "Busybox binary must be swapped before building:
make fetch-busybox" - They
cd packages/orchestratorand runmake fetch-busybox - Make searches the Makefile for target
fetch-busybox— it is not defined anywhere - Make exits with:
make: *** No rule to make target 'fetch-busybox'. Stop. - The developer cannot proceed with ARM64 setup using the documented workflow
- The actual binary is at
packages/orchestrator/pkg/template/build/core/systeminit/busybox_1.36.1-2(an x86-64 ELF), embedded via//go:embedin Go source — no Makefile target exists or has ever existed to automate swapping it
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
| `TARGET_ARCH` is a **runtime** environment variable that overrides the architecture used for path resolution and OCI image pulls. When unset, defaults to the host architecture (`runtime.GOARCH`). | ||
|
|
||
| ```bash | ||
| # Run orchestrator targeting amd64 paths from an arm64 host | ||
| TARGET_ARCH=amd64 ./bin/orchestrator | ||
|
|
||
| # Or in .env file (read at runtime) | ||
| echo "TARGET_ARCH=amd64" >> .env.local | ||
| ``` | ||
|
|
||
| `TARGET_ARCH` affects: | ||
| - Firecracker and kernel binary path resolution (`{version}/{arch}/...`) | ||
| - OCI image platform for container pulls | ||
|
|
||
| It does **not** affect: | ||
| - Makefile compilation — use `GOARCH` directly for cross-compilation: `GOARCH=amd64 make build-local` | ||
| - Hardware-dependent runtime behavior (SMT detection, CPU info) which always uses the actual host architecture | ||
|
|
||
| --- |
There was a problem hiding this comment.
🔴 The README documents TARGET_ARCH as a functional runtime environment variable for cross-architecture deployment, but it is never read anywhere in the Go codebase — setting it will be silently ignored. Operators following this documentation to run TARGET_ARCH=amd64 ./bin/orchestrator will find the variable has no effect, as the orchestrator always uses runtime.GOARCH (the host architecture). Either implement the variable in the orchestrator config and path resolution code, or remove the cross-architecture deployment section until the feature is actually built.
Extended reasoning...
What the bug is: The PR adds documentation for a TARGET_ARCH environment variable described as overriding the architecture used for Firecracker/kernel path resolution and OCI image pulls at runtime. This documentation is detailed and authoritative-sounding, with concrete usage examples. However, TARGET_ARCH does not exist in the Go codebase at all.
The specific code path: The packages/orchestrator/pkg/cfg/model.go Config struct reads environment variables for HOST_ENVD_PATH, HOST_KERNELS_DIR, FIRECRACKER_VERSIONS_DIR, SANDBOX_DIR, and others — but there is no TARGET_ARCH or TargetArch field anywhere. A grep across all .go files for TARGET_ARCH, TargetArch, and targetArch returns zero results.
Why existing code doesn't prevent it: There is no parsing, validation, or logging of unrecognized environment variables. The orchestrator will start normally with TARGET_ARCH=amd64 set, give no error, and use runtime.GOARCH (the actual host architecture) throughout — meaning an arm64 host will resolve arm64 binary paths regardless of what TARGET_ARCH is set to.
Impact: An operator following the documented instructions to run TARGET_ARCH=amd64 ./bin/orchestrator from an arm64 host expecting amd64 sandbox behavior will instead get arm64 path resolution. If amd64 Firecracker/kernel binaries exist at the arm64 path this will fail with a file-not-found error; if arm64 binaries happen to exist the process will succeed silently but use the wrong architecture. The feature being advertised — cross-architecture deployment — is entirely non-functional.
Step-by-step proof: (1) On an arm64 host, set TARGET_ARCH=amd64 per the README example. (2) Launch ./bin/orchestrator. (3) The orchestrator calls runtime.GOARCH (returns "arm64") to construct paths like fc-versions/v1.12.1_717921c/arm64/firecracker. (4) TARGET_ARCH was never read — it remains unset in any config struct. (5) amd64 paths are never resolved.
How to fix it: Add a TargetArch field to the Config struct in packages/orchestrator/pkg/cfg/model.go populated via os.Getenv("TARGET_ARCH") (defaulting to runtime.GOARCH), then thread it through all arch-sensitive code paths (Firecracker/kernel path construction and OCI platform selection). Alternatively, remove the TARGET_ARCH documentation section until the implementation is ready.
Note on duplication: A refutation argues this duplicates bug_002 targeting the same lines. Both bugs describe the same root cause, so they should be consolidated — but the bug itself is confirmed real and should be fixed before merging.
There was a problem hiding this comment.
Not applicable — this is a docs-only PR, split from the arm64-support branch. The code implementing these features (TARGET_ARCH, SMT, fetch-busybox, OCI platform) is in separate PRs (#2258, #2259, #2260, #2257). Bots are comparing against this branch alone; the docs will be accurate once all PRs merge.
packages/orchestrator/README.md
Outdated
| fc-versions/v1.12.1_717921c/amd64/firecracker | ||
| fc-versions/v1.12.1_717921c/arm64/firecracker |
There was a problem hiding this comment.
🟡 The binary path examples in the new ARM64 section use Firecracker version v1.12.1_717921c, but the create-build flag documentation (line 40) and the entire codebase consistently use v1.12.1_a41d3fb as the default. An operator following this README would be given an incorrect version string when trying to locate actual GCS bucket paths.
Extended reasoning...
What the bug is: The new ARM64 section (lines 239–240) shows example binary paths using Firecracker version v1.12.1_717921c, while every other reference to the Firecracker version in this repository uses v1.12.1_a41d3fb. The two strings share the same base version (v1.12.1) but have different commit hash suffixes (717921c vs a41d3fb), making them distinct build artifacts.
The specific inconsistency: Within the same README file, line 40 documents the -firecracker flag default as v1.12.1_a41d3fb. Lines 239–240 then show path examples using v1.12.1_717921c. The codebase anchors on v1.12.1_a41d3fb: it is defined as DefaultFirecackerV1_12Version in packages/shared/pkg/featureflags/flags.go, used in .github/actions/build-sandbox-template/action.yml, and referenced in tests/integration/seed.go. The string v1.12.1_717921c appears nowhere else in the repository.
Why existing documentation doesn't prevent it: The path examples are illustrative (they show directory structure, not runnable commands), so there is no automated check that would catch a stale version string in a code block. It was introduced as a new section and the author used a different version hash than the established default.
Step-by-step proof of confusion: An operator reads the README to set up ARM64 support. They see the create-build default is v1.12.1_a41d3fb (line 40). They then scroll to the ARM64 section and see paths referencing v1.12.1_717921c. They attempt to verify or download fc-versions/v1.12.1_717921c/arm64/firecracker from the GCS bucket — this path does not exist. The correct path is fc-versions/v1.12.1_a41d3fb/arm64/firecracker. The operator is now confused about which version is actually deployed and which path layout is correct.
Impact: Operators following this section to locate GCS bucket contents or configure FIRECRACKER_VERSIONS_DIR locally will use the wrong version string. While the examples are not copy-paste commands, the version string is the most critical variable in the path and readers naturally assume it is accurate.
How to fix: Replace both occurrences of v1.12.1_717921c in lines 239–240 with v1.12.1_a41d3fb to match the documented default and codebase constant.
There was a problem hiding this comment.
Fixed — corrected to v1.12.1_a41d3fb to match codebase default.
There was a problem hiding this comment.
the default is
DefaultFirecackerV1_12Version = "v1.12.1_210cbac"
There was a problem hiding this comment.
Fixed — updated to v1.12.1_210cbac.
There was a problem hiding this comment.
Fixed — updated to v1.12.1_210cbac.
There was a problem hiding this comment.
Fixed — updated to v1.12.1_210cbac and kernel to vmlinux-6.1.158.
Document architecture naming conventions, ARM64-specific behavior, cross-architecture deployment, and TARGET_ARCH environment variable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use v1.12.1_a41d3fb (matching codebase default) instead of v1.12.1_717921c. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update FC version examples to v1.12.1_210cbac (current default) - Update busybox note: binaries committed with build tags, no fetch - Fix cross-compilation example: BUILD_ARCH instead of GOARCH Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7b0320c to
bdde198
Compare
jakubno
left a comment
There was a problem hiding this comment.
lgtm, but let's merge once we have the other PRs merged
packages/orchestrator/README.md
Outdated
| @@ -37,7 +37,7 @@ Flags: | |||
| - `-storage <path>` - Local path or `gs://bucket` (enables local mode with auto-download of kernel/FC) | |||
| - `-sandbox-dir <path>` - Override `SANDBOX_DIR` (the rootfs path baked into the snapshot) | |||
| - `-kernel <version>` - Kernel version (default: `vmlinux-6.1.102`) | |||
There was a problem hiding this comment.
| - `-kernel <version>` - Kernel version (default: `vmlinux-6.1.102`) | |
| - `-kernel <version>` - Kernel version (default: `vmlinux-6.1.158`) |
There was a problem hiding this comment.
Fixed — updated to vmlinux-6.1.158.
packages/orchestrator/README.md
Outdated
| kernels/vmlinux-6.1.102/amd64/vmlinux.bin | ||
| kernels/vmlinux-6.1.102/arm64/vmlinux.bin |
There was a problem hiding this comment.
| kernels/vmlinux-6.1.102/amd64/vmlinux.bin | |
| kernels/vmlinux-6.1.102/arm64/vmlinux.bin | |
| kernels/vmlinux-6.1.158/amd64/vmlinux.bin | |
| kernels/vmlinux-6.1.158/arm64/vmlinux.bin |
There was a problem hiding this comment.
Fixed — updated to vmlinux-6.1.158.
Update default kernel version references to match current default on main (DefaultKernelVersion = vmlinux-6.1.158). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressing review: updated FC version to |
Summary
TARGET_ARCHenvironment variableTest plan
🤖 Generated with Claude Code