Skip to content

Add metal-std-version field and kernels-test-utils package#331

Draft
robtaylor wants to merge 3 commits intohuggingface:mainfrom
robtaylor:metal-std-version
Draft

Add metal-std-version field and kernels-test-utils package#331
robtaylor wants to merge 3 commits intohuggingface:mainfrom
robtaylor:metal-std-version

Conversation

@robtaylor
Copy link

Summary

  • metal-std-version: Allow per-kernel Metal standard version configuration via build.toml, following the pattern of cuda-flags, hip-flags, and sycl-flags. The default remains metal4.0. Kernels needing broader macOS compatibility can set metal-std-version = "metal3.1" (macOS 14+) or "metal3.2" (macOS 15+). AIR versions are forward-compatible, so metal3.1 kernels run on Metal 4 hardware.

  • kernels-test-utils: Shared Python test utilities package that consolidates duplicated device detection, tolerance tables, and allclose helpers across all kernel repos. Auto-injected into all kernel dev/test shells via the default pythonCheckInputs in genKernelFlakeOutputs. Downstream repos (rotary-embedding, fused-rms-norm, dequant-int4, dequant-gguf) already use it.

Changes

metal-std-version

  • Add metal_std_version field to Kernel::Metal in config structs (v2, v3, mod)
  • Pass field through Jinja template context to generated CMake
  • Accept METAL_STD_VERSION in metal_kernel_component() and propagate to compile_metal_shaders() via parent scope
  • Default to metal4.0 in compile-metal.cmake when not specified
  • Set metal-std-version = "metal3.1" in relu-metal-cpp example

kernels-test-utils

  • New kernels-test-utils/ package: device.py, tolerances.py, allclose.py
  • Nix derivation at nix/pkgs/python-modules/kernels-test-utils/
  • Registered in nix/overlay.nix
  • Default pythonCheckInputs in flake.nix auto-injects into all downstream shells
  • Updated template test to use kernels_test_utils imports

Test plan

  • nix build .#relu-metal-cpp passes with metal-std-version = "metal3.1"
  • nix develop ./builder/examples/relu-metal-cpppython -c "from kernels_test_utils import get_device; print(get_device())" works
  • Downstream repos (rotary-embedding 217/217, fused-rms-norm 74/74, dequant-int4 8/8, dequant-gguf 12/12) all pass with updated flake.lock pointing to these changes
  • CI (Rust build2cmake tests, nix builds)

Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)

Allow per-kernel Metal standard version configuration via build.toml,
following the pattern of cuda-flags, hip-flags, and sycl-flags.

The default remains metal4.0 (upstream's current value). Kernels that
need broader macOS compatibility can set metal-std-version = "metal3.1"
(macOS 14+) or "metal3.2" (macOS 15+). AIR versions are forward-
compatible, so metal3.1 kernels run on Metal 4 hardware.

Changes:
- Add metal_std_version field to Kernel::Metal in config structs (v2, v3, mod)
- Pass field through Jinja template context to generated CMake
- Accept METAL_STD_VERSION in metal_kernel_component() and propagate
  to compile_metal_shaders() via parent scope
- Default to metal4.0 in compile-metal.cmake when not specified
- Set metal-std-version = "metal3.1" in relu-metal-cpp example for
  broad macOS 14+ compatibility

Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)
Create a shared test utilities package that consolidates duplicated
device detection, tolerance tables, and allclose helpers across all
kernel repos. The package is automatically available in all kernel
dev/test shells via the default pythonCheckInputs.

Modules:
- device: get_device(), get_available_devices(), skip_if_no_gpu()
- tolerances: DEFAULT_TOLERANCES dict, get_tolerances(dtype)
- allclose: fp8_allclose() with MPS float64 workaround

Wired into nix overlay and set as default pythonCheckInputs in
genKernelFlakeOutputs so downstream repos get it automatically.
Updated template test to use kernels_test_utils imports.

Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)
Co-developed-by: Claude Code v2.1.58 (claude-opus-4-6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant