Skip to content

[Bug] Metal backend fails with GGML_OP_DIAG_MASK_INF - op not implemented for Metal #1040

@shakfu

Description

@shakfu

Git commit

  • stable-diffusion.cpp version: master-387-e4c50f1
  • ggml submodule version: 55bc9320a4aae82af18e23eefd5de319a755d7b9 (Nov 24, 2025)

Operating System & Version

macOS 15.7

GGML backends

Metal

Command-line arguments used

sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

Steps to reproduce

  1. Build stable-diffusion.cpp with Metal enabled:
git clone --branch master-390-edf2cb3 --depth=1 --recursive --shallow-submodules https://github.com/leejet/stable-diffusion.cpp.git
cd stable-diffusion
mkdir build
cd build
cmake .. -DSD_METAL=ON
  1. Run image generation:
% ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: using embedded metal library
[INFO ] ggml_extend.hpp:69   - ggml_metal_library_init: loaded in 0.012 sec
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU name:   Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup reduction   = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: simdgroup matrix mul. = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has unified memory    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: has bfloat            = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use residency sets    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: use shared buffers    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_device_init: recommendedMaxWorkingSetSize  = 11453.25 MB
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: allocating
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: found device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: picking default device: Apple M1
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use bfloat         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use fusion         = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use concurrency    = true
[INFO ] ggml_extend.hpp:69   - ggml_metal_init: use graph optimize = true
[INFO ] stable-diffusion.cpp:227  - loading model from 'sd_xl_turbo_1.0.q8_0.gguf'
[INFO ] model.cpp:382  - load sd_xl_turbo_1.0.q8_0.gguf using gguf format
[INFO ] stable-diffusion.cpp:318  - Version: SDXL
[INFO ] stable-diffusion.cpp:346  - Weight type stat:                      f16: 150  |    q8_0: 2491
[INFO ] stable-diffusion.cpp:347  - Conditioner weight type stat:         q8_0: 713
[INFO ] stable-diffusion.cpp:348  - Diffusion model weight type stat:      f16: 74   |    q8_0: 1606
[INFO ] stable-diffusion.cpp:349  - VAE weight type stat:                  f16: 76   |    q8_0: 172
[WARN ] stable-diffusion.cpp:591  - No VAE specified with --vae or --force-sdxl-vae-conv-scale flag set, using Conv2D scale 0.031
  |==================================================| 2641/2641 - 461.39it/s
[INFO ] model.cpp:1594 - loading tensors completed, taking 5.73s (process: 0.00s, read: 5.44s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.08s)
[INFO ] stable-diffusion.cpp:782  - total params memory size = 3855.08MB (VRAM 3855.08MB, RAM 0.00MB): text_encoders 835.45MB(VRAM), diffusion_model 2925.17MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:896  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler A method
[INFO ] denoiser.hpp:364  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[ERROR] ggml_extend.hpp:75   - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach.  Try "help target".
No stack.
The program is not being run.
zsh: abort      ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1

With debug output added:

METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)

What you expected to happen

Not to throw an error.

What actually happened

Description

When building stable-diffusion.cpp with Metal backend enabled (-DSD_METAL=ON) on macOS, image generation fails with an "unsupported op" error. The operation GGML_OP_DIAG_MASK_INF (op=44) is defined in ggml but not implemented in the Metal backend.

Root Cause

GGML_OP_DIAG_MASK_INF is defined in ggml (include/ggml.h) but is not implemented in the Metal backend. In src/ggml-metal/ggml-metal-device.m, the ggml_metal_device_supports_op() function has no case for this op, so it falls through to default: return false.

This can be verified by checking the https://github.com/ggml-org/ggml/blob/master/src/ggml-metalggml-metal-device.m - there's no case GGML_OP_DIAG_MASK_INF: in the supports_op switch statement.

Impact

This prevents using Metal acceleration with SD models that use attention masking (like SDXL Turbo).

Workaround

Disable Metal backend when building:

cmake .. -DSD_METAL=OFF

This falls back to CPU execution which works correctly but is significantly slower.

Suggested Fix

This issue should likely be reported upstream to https://github.com/ggml-org/ggml to add Metal kernel support for
GGML_OP_DIAG_MASK_INF. Alternatively, stable-diffusion.cpp could:

  1. Check if the model requires unsupported ops before attempting Metal execution
  2. Fall back gracefully to CPU for unsupported operations
  3. Document which models/features are compatible with Metal backend

Note: This is ultimately a ggml issue rather than stable-diffusion.cpp specifically. You may want to file this upstream at https://github.com/ggml-org/ggml/issues as well.

Logs / error messages / stack trace

[ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF'
~/projects/tmp/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op
Don't know how to attach. Try "help target".
No stack.
The program is not being run.
zsh: abort ./sd -m sd_xl_turbo_1.0.q8_0.gguf -p "blue house" --steps 1


With debug output added:

`METAL UNSUPPORTED OP: DIAG_MASK_INF (op=44)`

### Additional context / environment details

- OS: macOS (Apple Silicon M1)
- stable-diffusion.cpp version: `master-387-e4c50f1`
- ggml submodule version: `55bc9320a4aae82af18e23eefd5de319a755d7b9` (Nov 24, 2025)
- Model: `sd_xl_turbo_1.0.q8_0.gguf` (SDXL Turbo)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions