Summary
On Intel Iris Xe (Raptor Lake-P) with Mesa anv Vulkan driver, the generic (non-flash) attention path in the Vulkan backend produces structured noise instead of correct outputs. Any model — tested with SD 1.5 and Wan 2.2 TI2V-5B Turbo — outputs the same characteristic horizontal teal/pink stripe pattern when --diffusion-fa is omitted. Adding --diffusion-fa flips the exact same config to correct, prompt-accurate output.
This is the same symptom as discussion #1243, where Green-Sky wrote "Using --diffusion-fa with ROCm is absolutely necessary to get viable, non-scrambled output." It appears Mesa anv on Intel Iris Xe has the same issue.
May be related to #748 (Vulkan blank image) and #1031 (ZImage + Vulkan blank image), which could both be downstream symptoms of a broken generic Vulkan attention kernel.
Environment
- GPU: Intel Iris Xe Graphics (Raptor Lake-P), PCI
8086:a7a0
- Driver: Mesa
anv 25.2.8, Vulkan 1.4.318
- OS: Ubuntu 24.04.1 LTS, kernel 6.8.0-49-generic
- CPU: Intel Core i9-13900HK
- RAM: 62 GiB
- sd-cli build:
master-540-f16a110-3-g6e5fa00+ (branch wan2.2_5B_flf2v, commit 6e5fa00c4f0b), also reproduced on master-585-44cca3d
- Build flags:
-DSD_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ON
vulkaninfo --summary | grep -E 'deviceName|driverVersion|apiVersion' | head -6
apiVersion = 1.4.318
driverVersion = 25.2.8
deviceName = Intel(R) Iris(R) Xe Graphics (RPL-P)
Minimal reproducer — SD 1.5
Identical config, single flag difference.
Broken (no --diffusion-fa):
./sd-cli \
--model v1-5-pruned-emaonly-fp16.safetensors \
--mode img_gen \
--prompt "A cinematic shot of a lighthouse at dusk, warm amber light, ocean waves" \
--negative-prompt "blurry, low quality, deformed" \
--height 512 --width 512 --steps 20 \
--cfg-scale 7.0 --seed 42 \
--sampling-method euler_a \
--output broken.png
Output: structured horizontal stripes in teal/pink, no prompt content. Reproducible across seeds, steps, resolutions, samplers.
Working (add --diffusion-fa):
./sd-cli \
--model v1-5-pruned-emaonly-fp16.safetensors \
--mode img_gen \
--prompt "A cinematic shot of a lighthouse at dusk, warm amber light, ocean waves" \
--negative-prompt "blurry, low quality, deformed" \
--height 512 --width 512 --steps 20 \
--cfg-scale 7.0 --seed 42 \
--sampling-method euler_a \
--diffusion-fa \
--output working.png
Output: cinematic lighthouse at dusk, prompt-accurate, clean composition. 321 s total on this hardware.
Same model: Comfy-Org/stable-diffusion-v1-5-archive/v1-5-pruned-emaonly-fp16.safetensors (2.0 GB fp16).
Also reproduced with Wan 2.2 TI2V-5B Turbo
Using Kijai/WanVideo_comfy/Wan22-Turbo/Wan2_2-TI2V-5B-Turbo_fp16.safetensors + QuantStack/Wan2.2-TI2V-5B-GGUF/VAE/Wan2.2_VAE.safetensors + city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q5_K_M.gguf:
- Without
--diffusion-fa: noise
- With
--diffusion-fa AND --flow-shift 8.0: coherent output (smallest tested = 320×256 × 5 frames, --scheduler simple, --cfg-scale 1.0)
(--flow-shift 8.0 appears to be a separate Wan-2.2-specific fix — the auto default seems to mis-detect for TI2V-5B. Not the focus of this bug report, but noting in case it's a related upstream concern.)
Observations
- Neither loader path (native
gguf_init_from_file_ptr, GGUFReader fallback, safetensors) affects the outcome — the bug is downstream of loading.
- Sampler (
euler_a, euler) and scheduler (discrete, simple) don't affect the outcome.
- Model version (SD 1.5, Wan 2.2) doesn't affect the outcome.
- The single determining factor is
--diffusion-fa. Without it: noise. With it: clean output.
- The noise pattern is deterministic for a given seed and visually distinctive (horizontal teal/pink stripes for SD / base Wan, horizontal color bands for Turbo Wan).
Suggested fix directions
Not sure whether the bug is in ggml's Vulkan attention kernel, in the Vulkan shader codegen, or in sd.cpp's pre-attention tensor layout for the non-FA path. Starting points for investigation:
- Compare the non-FA vs FA attention kernel on Vulkan against the CUDA reference — is there a known numerical discrepancy?
- Is there an assumption about tensor layout / stride that holds on NVIDIA but breaks on Mesa anv?
- If the non-FA Vulkan attention path is deprecated or known-broken, it might be worth making
--diffusion-fa the Vulkan default (or warning loudly when attention is invoked without it).
Happy to run more targeted diagnostics on this hardware if it'd help narrow it down.
Workaround
Until fixed, on Intel Iris Xe Vulkan: always pass --diffusion-fa. It's the only known-working configuration for correct attention output on this backend.
Summary
On Intel Iris Xe (Raptor Lake-P) with Mesa
anvVulkan driver, the generic (non-flash) attention path in the Vulkan backend produces structured noise instead of correct outputs. Any model — tested with SD 1.5 and Wan 2.2 TI2V-5B Turbo — outputs the same characteristic horizontal teal/pink stripe pattern when--diffusion-fais omitted. Adding--diffusion-faflips the exact same config to correct, prompt-accurate output.This is the same symptom as discussion #1243, where Green-Sky wrote "Using --diffusion-fa with ROCm is absolutely necessary to get viable, non-scrambled output." It appears Mesa anv on Intel Iris Xe has the same issue.
May be related to #748 (Vulkan blank image) and #1031 (ZImage + Vulkan blank image), which could both be downstream symptoms of a broken generic Vulkan attention kernel.
Environment
8086:a7a0anv25.2.8, Vulkan 1.4.318master-540-f16a110-3-g6e5fa00+(branchwan2.2_5B_flf2v, commit6e5fa00c4f0b), also reproduced onmaster-585-44cca3d-DSD_VULKAN=ON -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=ONMinimal reproducer — SD 1.5
Identical config, single flag difference.
Broken (no
--diffusion-fa):./sd-cli \ --model v1-5-pruned-emaonly-fp16.safetensors \ --mode img_gen \ --prompt "A cinematic shot of a lighthouse at dusk, warm amber light, ocean waves" \ --negative-prompt "blurry, low quality, deformed" \ --height 512 --width 512 --steps 20 \ --cfg-scale 7.0 --seed 42 \ --sampling-method euler_a \ --output broken.pngOutput: structured horizontal stripes in teal/pink, no prompt content. Reproducible across seeds, steps, resolutions, samplers.
Working (add
--diffusion-fa):./sd-cli \ --model v1-5-pruned-emaonly-fp16.safetensors \ --mode img_gen \ --prompt "A cinematic shot of a lighthouse at dusk, warm amber light, ocean waves" \ --negative-prompt "blurry, low quality, deformed" \ --height 512 --width 512 --steps 20 \ --cfg-scale 7.0 --seed 42 \ --sampling-method euler_a \ --diffusion-fa \ --output working.pngOutput: cinematic lighthouse at dusk, prompt-accurate, clean composition. 321 s total on this hardware.
Same model:
Comfy-Org/stable-diffusion-v1-5-archive/v1-5-pruned-emaonly-fp16.safetensors(2.0 GB fp16).Also reproduced with Wan 2.2 TI2V-5B Turbo
Using
Kijai/WanVideo_comfy/Wan22-Turbo/Wan2_2-TI2V-5B-Turbo_fp16.safetensors+QuantStack/Wan2.2-TI2V-5B-GGUF/VAE/Wan2.2_VAE.safetensors+city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q5_K_M.gguf:--diffusion-fa: noise--diffusion-faAND--flow-shift 8.0: coherent output (smallest tested = 320×256 × 5 frames,--scheduler simple,--cfg-scale 1.0)(
--flow-shift 8.0appears to be a separate Wan-2.2-specific fix — theautodefault seems to mis-detect for TI2V-5B. Not the focus of this bug report, but noting in case it's a related upstream concern.)Observations
gguf_init_from_file_ptr,GGUFReaderfallback, safetensors) affects the outcome — the bug is downstream of loading.euler_a,euler) and scheduler (discrete,simple) don't affect the outcome.--diffusion-fa. Without it: noise. With it: clean output.Suggested fix directions
Not sure whether the bug is in ggml's Vulkan attention kernel, in the Vulkan shader codegen, or in sd.cpp's pre-attention tensor layout for the non-FA path. Starting points for investigation:
--diffusion-fathe Vulkan default (or warning loudly when attention is invoked without it).Happy to run more targeted diagnostics on this hardware if it'd help narrow it down.
Workaround
Until fixed, on Intel Iris Xe Vulkan: always pass
--diffusion-fa. It's the only known-working configuration for correct attention output on this backend.