-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Description
Issue Description
opencode crashes immediately with Aborted (core dumped) on systems using 64KB page sizes (aarch64+64k). This affects modern ARM systems, including NVIDIA Grace Hopper nodes and other HPC ARM clusters.
Steps to Reproduce
- Try to run opencode on a system with aarch64+64k kernel (64KB page size)
- Command:
opencode --version - Result:
Aborted (core dumped)
System Information
Affected System:
- Kernel:
Linux jpbl-s03-02 5.14.0-611.16.1.el9_7.aarch64+64k - Page size:
65536(64KB) - CPU: ARM Neoverse-V2 (NVIDIA Grace architecture)
- Architecture: aarch64
strace output shows the crash:
mmap(0x3840a000000, 1073741824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3840a000000
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=3669902, si_uid=8351} ---
+++ killed by SIGABRT (core dumped) +++
The binary tries to map memory with 4KB-aligned expectations but fails on a 64K page system.
Root Cause
The opencode binary is compiled assuming 4KB page granularity, which is the standard on most systems. However, modern ARM HPC systems (like NVIDIA Grace CPUs) use 64KB page sizes (aarch64+64k) for better TLB efficiency on memory-intensive workloads.
When the memory allocator (likely Rust's allocator or jemalloc) tries to perform operations based on 4KB page assumptions, they fail silently or crash on systems where the kernel reports different page alignment.
Why This Matters
This issue is critical for NVIDIA Grace Hopper nodes and all new ARM HPC systems:
-
NVIDIA Grace Hopper: All Grace and Grace Hopper systems from NVIDIA use aarch64+64k kernels. This means opencode cannot run on any of these systems.
-
Modern ARM HPC: Many ARM-based supercomputers (like the Jülich JUPITER cluster where this was discovered) are adopting 64KB pages for:
- Better TLB efficiency on large-memory workloads
- Reduced overhead for massive datasets
- Improved performance for AI/ML workloads
-
Future-proofing: As more ARM HPC hardware comes online, this class of architecture will become common in scientific computing and AI research.
-
Binary compatibility: Even if a distribution provides standard aarch64 versions, HPC clusters often use custom kernels with 64K pages for performance.
Expected Behavior
opencode should run on aarch64+64k systems without crashing.
Possible Solutions
- Multi-arch builds: Provide a separate
linux-aarch64-64kbuild alongside standardlinux-aarch64 - Runtime detection: The application could detect page size at startup and adapt accordingly
- Source builds: Ensure the build process can be done on 64K page systems successfully
Affected Platforms
- NVIDIA Grace CPU systems (all current and future)
- NVIDIA Grace Hopper (Grace + H100) systems
- ARM Neoverse-V2 based HPC clusters with 64KB page kernels
- Any system running
aarch64+64kLinux kernel
Additional Notes
- The application loads all system libraries correctly (glibc 2.34, no missing dependencies)
- Standard Linux tools run fine on this system
- This is purely a page-size compatibility issue