[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal

### Git commit

- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)



### Operating System & Version

macOS 15.7

### GGML backends

Metal

### Command-line arguments used

sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

### Steps to reproduce

1. Build stable-diffusion.cpp with Metal enabled:

```sh
git clone --branch master-390-edf2cb3 --depth=1 --recursive --shallow-submodules https://github.com/leejet/stable-diffusion.cpp.git
cd stable-diffusion
mkdir build
cd build
cmake .. -DSD_METAL=ON
```

2. Run image generation:

```sh
% ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: using embedded metal library
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: loaded in 0.012 sec
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU name:   Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup reduction   = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup matrix mul. = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has unified memory    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has bfloat            = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use residency sets    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use shared buffers    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: recommendedMaxWorkingSetSize  = 11453.25 MB
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: allocating
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: found device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: picking default device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use bfloat         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use fusion         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use concurrency    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use graph optimize = true
[INFO ] stable-diffusion.cpp:227  - loading model from 'sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:382  - load sd_xl_turbo_1.0.q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:318  - Version: SDXL
[INFO ] stable-diffusion.cpp:346  - Weight type stat:                      f16: 150  |    q8_0: 2491
[INFO ] stable-diffusion.cpp:347  - Conditioner weight type stat:         q8_0: 713
[INFO ] stable-diffusion.cpp:348  - Diffusion model weight type stat:      f16: 74   |    q8_0: 1606
[INFO ] stable-diffusion.cpp:349  - VAE weight type stat:                  f16: 76   |    q8_0: 172
[WARN ] stable-diffusion.cpp:591  - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
  |==================================================| 2641/2641 - 461.39it/s
[INFO ] model.cpp:1594 - loading tensors completed, taking 5.73s (process: 0.00s, read: 5.44s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.08s)
[INFO ] stable-diffusion.cpp:782  - total params memory size = 3855.08MB (VRAM 3855.08MB, RAM 0.00MB): text_encoders 835.45MB(VRAM), diffusion_model 2925.17MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:896  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler A method
[INFO ] denoiser.hpp:364  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[ERROR] ggml_extend.hpp:75   - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach.  Try "help target".
No stack.
The program is not being run.
zsh: abort      ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
```

With debug output added:

`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`

### What you expected to happen

Not to throw an error.

### What actually happened

### Description

When building stable-diffusion.cpp with Metal backend enabled (`-DSD_METAL=ON`) on macOS, image generation fails with an "unsupported op" error. The operation `GGML_OP_DIAG_MASK_INF (op=44)` is defined in ggml but not implemented in the Metal backend.

### Root Cause

`GGML_OP_DIAG_MASK_INF` is defined in ggml (`include/ggml.h`) but is not implemented in the Metal backend. In `src/ggml-metal/ggml-metal-device.m`, the `ggml_metal_device_supports_op()` function has no case for this op, so it falls through to default: return `false`.

This can be verified by checking the https://github.com/ggml-org/ggml/blob/master/src/ggml-metalggml-metal-device.m - there's no case `GGML_OP_DIAG_MASK_INF`: in the `supports_op` switch statement.

### Impact

This prevents using Metal acceleration with SD models that use attention masking (like SDXL Turbo).

### Workaround

Disable Metal backend when building:

`cmake .. -DSD_METAL=OFF`

This falls back to CPU execution which works correctly but is significantly slower.

### Suggested Fix

This issue should likely be reported upstream to https://github.com/ggml-org/ggml to add Metal kernel support for
`GGML_OP_DIAG_MASK_INF`. Alternatively, stable-diffusion.cpp could:

1. Check if the model requires unsupported ops before attempting Metal execution
2. Fall back gracefully to CPU for unsupported operations
3. Document which models/features are compatible with Metal backend

---
Note: This is ultimately a ggml issue rather than stable-diffusion.cpp specifically. You may want to file this upstream at <https://github.com/ggml-org/ggml/issues> as well.

### Logs / error messages / stack trace

[ERROR] ggml_extend.hpp:75   - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach.  Try "help target".
No stack.
The program is not being run.
zsh: abort      ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
```

With debug output added:

`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`

### Additional context / environment details

- OS: macOS (Apple Silicon M1)
- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)
- Model: `sd_xl_turbo_1.0.q8_0.gguf` (SDXL Turbo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal #1040

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Description

Root Cause

Impact

Workaround

Suggested Fix

Logs / error messages / stack trace

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal #1040

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Description

Root Cause

Impact

Workaround

Suggested Fix

Logs / error messages / stack trace

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions