Fix Metal MPS encoder lifecycle and broaden macOS compatibility by robtaylor · Pull Request #308 · huggingface/kernels

robtaylor · 2026-03-04T13:33:41Z

Summary

Fix MPS encoder lifecycle: Template and examples create compute encoders directly via [commandBuffer computeCommandEncoder], bypassing PyTorch's MPS stream encoder management (kernel coalescing). This causes a fatal crash when any kernel is called twice in sequence: A command encoder is already encoding to this command buffer. Fixed by using stream->commandEncoder() from PyTorch's MPSStream API.
Lower Metal standard to metal3.1 (macOS 14+): All current kernel features (bfloat16_t, simd_sum, simd_shuffle, threadgroup_barrier) are available in Metal 3.1. Previous metal4.0 required macOS 26.
Multi-strategy Metal toolchain detection: Support macOS 14/15 (metal bundled in Xcode) and macOS 26 (separate Metal toolchain cryptex). Falls back to scanning /Applications/Xcode*.app when xcrun/xcode-select are unavailable in the Nix sandbox.
macOS CI matrix: Build-test on macos-14, macos-15, and macos-26-xlarge (arm64 with GPU). macOS 14 is marked best-effort (continue-on-error) due to MPS OOM on 16GB runners.
Update Metal docs: Document macOS 15+ as supported baseline, macOS 14 as best-effort.

Files changed

build2cmake/src/templates/metal/compile-metal.cmake — metal3.1 default, multi-strategy toolchain detection
builder/lib/torch-extension/arch.nix — toolchain fallback for macOS 14/15, clear SDKROOT in xcrunHost
template/__KERNEL_NAME_NORMALIZED___metal/__KERNEL_NAME_NORMALIZED__.mm — use MPSStream encoder API
builder/examples/relu/relu_metal/relu.mm — same fix
builder/examples/extra-data/relu_metal/relu.mm — same fix
.github/workflows/build_kernel_macos.yaml — macOS version matrix, sandbox=relaxed, macos-14 best-effort
docs/source/builder/metal.md — macOS version support table, updated requirements

Test plan

Build-tested on macos-14 (Xcode 15), macos-15 (Xcode 16), macos-26 (Xcode 26) via CI matrix
GPU tests pass on macos-15 and macos-26 (relu + relu-metal-cpp)
macos-14 builds succeed, test_relu_layer OOM (known, MPS memory on 16GB runner)
Verified fused-rms-norm kernel: 74/74 tests pass (was crashing on sequential calls)
Verified rotary-embedding kernel: 217/217 tests pass
Crash is 100% reproducible before fix, 0% after fix

Fixes #307

robtaylor · 2026-03-04T14:44:44Z

Hi! A couple of questions about Metal CI:

The build_kernel_macos.yaml workflow currently only builds but doesn't run tests (the comment mentions "Also run tests once we have a macOS runner"). Would it be possible to enable a macos-26-xlarge runner for this repo to get Metal GPU test coverage? The larger runners have actual Metal GPU access.
If that's not feasible, I'm planning to set up a Mac Mini M2 as a self-hosted runner for Metal kernel testing. Would that be welcome as a contribution?

For context, we've been developing Metal implementations of rotary-embedding (217/217 tests pass) and fused-rms-norm (74/74 tests pass), and discovered a crash-on-sequential-calls bug in the MPS encoder pattern used by the template and examples (this PR fixes it). Having CI that actually runs Metal kernels would catch these kinds of issues.

robtaylor · 2026-03-04T19:05:47Z

FYI: We've set up a CI mirror at ChipFlow/kernels to validate these changes on real GPU hardware (GitHub-hosted macos-*-xlarge ARM64 runners with Metal GPU access).

Latest CI run (all green): https://github.com/ChipFlow/kernels/actions/runs/22684099194

The stacked branch at ChipFlow/kernels#2 includes this PR's patches plus additional fixes (relu-metal-cpp example). Results:

Runner	relu build	relu test	relu-metal-cpp build	relu-metal-cpp test
macos-26-xlarge	✅	✅	✅	✅
macos-15-xlarge	✅	✅	✅	✅
macos-14-xlarge	✅	⚠️ OOM (best-effort)	-	-

macOS 14 is marked best-effort via continue-on-error — builds succeed but test_relu_layer OOMs on runners with 16GB unified memory.

The upstream kernel-builder doesn't yet have multi-strategy Metal toolchain detection for macOS 14/15. Use our ChipFlow fork's metal-stack branch which includes the fix (PR huggingface/kernels#308). Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)

danieldk · 2026-03-05T16:47:10Z

Nice work! One question, isn't Metal 4 required for the new Neural Accelerators of M5 GPUs?

robtaylor · 2026-03-05T21:55:23Z

Thanks @danieldk!

Good question. Metal 4.0 (-std=metal4.0 → air64_v28) is indeed needed for M5 Neural Engine / ANE features, but our current kernels only target the GPU compute pipeline — they don't use Neural Accelerator APIs.

We default to metal3.1 (air64_v26) for broad compatibility:

metal3.1 → macOS 14+ (M1 and later)
metal3.2 → macOS 15+
metal4.0 → macOS 26+ only

Individual kernels can override via the metal_std_version field in build.toml if they need Metal 4 features. So this is a safe default that doesn't prevent anyone from targeting Metal 4 when needed.

When ANE support becomes relevant (e.g., for matrix multiplication offload on M5), we'd add a metal4.0 kernel variant alongside the existing GPU compute path.

danieldk · 2026-03-09T10:46:56Z

Even though they are not wired up yet, we already have Neural Accelerator code in https://github.com/huggingface/kernels-community/tree/830d15e09d865f5ef43b3873fd98f82442f7095c/mlx-quantization-metal-kernels/quantization_mlx . Also, I think it is probably not a good idea to downgrade to Metal < 4 by default, since we have shipped with Metal 4 as the default in releases, so I think if we add back Metal < 4 support it should be opt-in from build.toml.

Another thing I'm a bit worried about is the UX. For CUDA/XPU/ROCm versions, we encode the required library version in the build variant. We should probably have a similar mechanism for macOS to ensure that a version is downloaded that is compatible with the system's Metal (e.g. to avoid loading a Metal 4 kernel on a system that does not support it).

That said, I am not really sure it is worth it, Apple pushes people pretty hard to move to new releases, so I am not really sure if it is worth complicating beyond the current policy (where we support the previous macOS version for a few months).

What do you think @drbh ?

robtaylor · 2026-03-09T14:47:52Z

it's a fair point and worth considering. I'll definitely take an action to check on the UX - My understanding is that linking Metal 3.1 should work for all Metal > 3.1 (indeed, that's [being tested in CI now]( https://github.com/ChipFlow/kernels/actions/runs/22684099194/job/65762236187) ). But yes, it'll get a bit complicated if we're also landing Neural Accelerator code. I'll note that telemetry data shows there's still a sizable number of folk still on macos 14 still (!) though this PR doesn't work on macos 14 atm due to the MPS encoder coalescing behavior that that #316 relies on wasn't fully stabilized until macOS 15. https://telemetrydeck.com/survey/apple/macOS/versions/ Any chance we can get macos-xlarge enabled for this repo? Being able to run CI properly will help with our overall confidence!

…

On Mon, 9 Mar 2026 at 10:47, Daniël de Kok ***@***.***> wrote: *danieldk* left a comment (huggingface/kernels#308) <#308 (comment)> Even though they are not wired up yet, we already have Neural Accelerator code in https://github.com/huggingface/kernels-community/tree/830d15e09d865f5ef43b3873fd98f82442f7095c/mlx-quantization-metal-kernels/quantization_mlx . Also, I think it is probably not a good idea to downgrade to Metal < 4 by default, since we have shipped with Metal 4 as the default in releases, so I think if we add back Metal < 4 support it should be opt-in from build.toml. Another thing I'm a bit worried about is the UX. For CUDA/XPU/ROCm versions, we encode the required library version in the build variant. We should probably have a similar mechanism for macOS to ensure that a version is downloaded that is compatible with the system's Metal. That said, I am not really sure it is worth it, Apple pushes people pretty hard to move to new releases, so I am not really sure if it is worth complicating beyond the current policy (where we support the previous macOS version for a few months). What do you think @drbh <https://github.com/drbh> ? — Reply to this email directly, view it on GitHub <#308 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAB4O4HDPTAET6PI6LWWHOD4P2OLNAVCNFSM6AAAAACWGVO5RCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DAMRSHA2DKMRQGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Use stream->commandEncoder() instead of creating encoders directly via [cmdBuf computeCommandEncoder] to properly integrate with PyTorch's MPS stream encoder lifecycle management (kernel coalescing). Direct encoder creation bypasses the stream's internal _commandEncoder state and crashes on sequential kernel dispatches. Lower the default Metal standard from metal3.2 (macOS 15+) to metal3.1 (macOS 14+) since all current kernel features (bfloat16_t, simd_sum, simd_shuffle, threadgroup_barrier) are available in Metal 3.1. Add multi-strategy Metal toolchain detection for macOS 14+: - Separate Metal toolchain component (macOS 26+ cryptex mount) - xcrun/xcode-select based detection - Direct /Applications/Xcode*.app filesystem scan fallback Also clear SDKROOT in xcrunHost to prevent Nix-set SDK paths from interfering with system xcrun. Fixes: huggingface#307 Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)

Test Metal kernel builds across multiple macOS versions to verify compatibility with the metal3.1 standard (macOS 14+). Use sandbox=relaxed for Nix to support __noChroot builds that access the host Metal toolchain. The separate Metal toolchain download is only needed on macOS 26+. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)

macOS 14 builds succeed but MPS tests may OOM on runners with limited unified memory. Use continue-on-error so macos-14 failures don't block the workflow. Update Metal docs to reflect macOS 15+ as the supported baseline with macOS 14 best-effort. Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)

danieldk · 2026-03-10T09:59:04Z

I'll note that telemetry data shows there's still a sizable number of folk still on macos 14 still (!)

Unless I'm looking at the stats wrong, it only seems to be around 3%?

Any chance we can get macos-xlarge enabled for this repo?

I think xlarge should work with our subscription.

More in general, we discussed supporting macOS multiple versions in our weekly sync yesterday. It is certainly something we would like to support, especially to accommodate the few months where the majority of users has not moved from macOS N to N+1 yet. The last cycle we did this by holding off the N + 1 requirement for some months, but that is not ideal, because it does not allow specific kernels to use features from N + 1.

However, since this requires that we change the macOS build variant format to encode the Metal version, which is a large, impactful change, we think it makes most sense to time this with the macOS 27 release and not change this mid-cycle.

robtaylor · 2026-03-10T17:24:10Z

Unless I'm looking at the stats wrong, it only seems to be around 3%?

You're right about 14.x — I went back and checked the raw data and realised I was looking at numbers from early 2025 when it was 14-16%. It's ~2.8% now.

However, macOS 15.x is still at 32.3% as of late February. And 15.x users are notably stickier than 14.x was — 14.x was already down to ~8% at the same point after its successor launched. At current decay rate, 15.x won't reach ~3% until around April 2027. So the multi-version story is really about macOS 15 vs 26, not 14.

I think xlarge should work with our subscription.

That would be great! Happy to help set up the workflow if useful.

we think it makes most sense to time this with the macOS 27 release and not change this mid-cycle.

Agreed, that's the right timing. By macOS 27 launch, 15.x will likely still be at 10-15%, so the build variant format change would land exactly when it matters. The metal-std-version field gives kernel authors the escape hatch in the meantime.

robtaylor marked this pull request as draft March 4, 2026 13:39

robtaylor marked this pull request as ready for review March 4, 2026 14:48

robtaylor marked this pull request as draft March 4, 2026 14:50

robtaylor force-pushed the fix-metal-mps-lifecycle branch 10 times, most recently from ee0c112 to 33406e1 Compare March 4, 2026 16:54

robtaylor changed the title ~~Fix Metal standard version and MPS encoder lifecycle~~ Fix Metal MPS encoder lifecycle and broaden macOS compatibility Mar 4, 2026

robtaylor marked this pull request as ready for review March 4, 2026 16:55

robtaylor mentioned this pull request Mar 4, 2026

Fix Metal MPS encoder lifecycle and broaden macOS compatibility ChipFlow/kernels#1

Closed

robtaylor force-pushed the fix-metal-mps-lifecycle branch from 33406e1 to 3ae5790 Compare March 4, 2026 17:06

robtaylor mentioned this pull request Mar 4, 2026

Metal: stacked patches for MPS lifecycle, CI, and relu-metal-cpp fix ChipFlow/kernels#2

Open

robtaylor mentioned this pull request Mar 9, 2026

Fix MPS encoder lifecycle in relu-metal-cpp example #316

Draft

2 tasks

robtaylor added 3 commits March 9, 2026 14:50

robtaylor force-pushed the fix-metal-mps-lifecycle branch from 0913ee4 to c810460 Compare March 9, 2026 14:51

robtaylor mentioned this pull request Mar 9, 2026

[Platform] Add MPS (Apple Metal) platform support for macOS vllm-project/vllm#36523

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Metal MPS encoder lifecycle and broaden macOS compatibility#308

Fix Metal MPS encoder lifecycle and broaden macOS compatibility#308
robtaylor wants to merge 3 commits intohuggingface:mainfrom
robtaylor:fix-metal-mps-lifecycle

robtaylor commented Mar 4, 2026 •

edited

Loading

Uh oh!

robtaylor commented Mar 4, 2026 •

edited

Loading

Uh oh!

robtaylor commented Mar 4, 2026

Uh oh!

danieldk commented Mar 5, 2026

Uh oh!

robtaylor commented Mar 5, 2026

Uh oh!

danieldk commented Mar 9, 2026 •

edited

Loading

Uh oh!

robtaylor commented Mar 9, 2026 via email •

edited

Loading

Uh oh!

danieldk commented Mar 10, 2026

Uh oh!

robtaylor commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robtaylor commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Uh oh!

robtaylor commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robtaylor commented Mar 4, 2026

Uh oh!

danieldk commented Mar 5, 2026

Uh oh!

robtaylor commented Mar 5, 2026

Uh oh!

danieldk commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robtaylor commented Mar 9, 2026 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danieldk commented Mar 10, 2026

Uh oh!

robtaylor commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robtaylor commented Mar 4, 2026 •

edited

Loading

robtaylor commented Mar 4, 2026 •

edited

Loading

danieldk commented Mar 9, 2026 •

edited

Loading

robtaylor commented Mar 9, 2026 via email •

edited

Loading