Skip to content

[BUG] ZImage + VULKAN create a blank image #1031

@olivbrau

Description

@olivbrau

Hello everyone,
I'm using the last sd release with Vulkan backend
I've tried it with an old command with stableDiffusion1.4 and it works well.
But with ZImage, I get a blank image.
Does anybody have an idea how to fix it ?
Thanks in advance !
Olivier

Here is the command :
sd.exe --diffusion-model ..\ZImage\z_image_turbo-Q3_K.gguf --vae "..\Flux.1 Q4 F16\ae.safetensors" --llm ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v --offload-to-cpu -H 512 -W 512 -t 20 --steps 10 -s 123456

And here is the output :

`C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>sd.exe --diffusion-model ..\ZImage\z_image_turbo-Q3_K.gguf --vae "..\Flux.1 Q4 F16\ae.safetensors" --llm ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v --offload-to-cpu -H 512 -W 512 -t 20 --steps 10 -s 123456 --vae-on-cpu
Option:
n_threads: 20
mode: img_gen
model_path:
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
llm_path: ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf
llm_vision_path:
diffusion_model_path: ..\ZImage\z_image_turbo-Q3_K.gguf
high_noise_diffusion_model_path:
vae_path: ..\Flux.1 Q4 F16\ae.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
auto_resize_ref_image: true
increase_ref_index: false
offload_params_to_cpu: true
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: true
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic
negative_prompt:
clip_skip: -1
width: 512
height: 512
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 10, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
lora_apply_mode: auto
flow_shift: inf
strength(img2img): 0.75
rng: cuda
sampler rng: NONE
seed: 123456
batch_count: 1
vae_tiling: false
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
easycache: disabled (threshold=0.200, start=0.15, end=0.95)
vace_strength: 1.00
fps: 16
preview_mode: none (denoised)
preview_interval: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:167 - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: Found 2 Vulkan devices:
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
[DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 1 = NVIDIA RTX A1000 6GB Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none
[INFO ] stable-diffusion.cpp:234 - loading diffusion model from '..\ZImage\z_image_turbo-Q3_K.gguf'
[INFO ] model.cpp:378 - load ..\ZImage\z_image_turbo-Q3_K.gguf using gguf format
[DEBUG] model.cpp:420 - init from '..\ZImage\z_image_turbo-Q3_K.gguf'
[INFO ] stable-diffusion.cpp:281 - loading llm from '..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf'
[INFO ] model.cpp:378 - load ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf using gguf format
[DEBUG] model.cpp:420 - init from '..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf'
[INFO ] stable-diffusion.cpp:295 - loading vae from '..\Flux.1 Q4 F16\ae.safetensors'
[INFO ] model.cpp:381 - load ..\Flux.1 Q4 F16\ae.safetensors using safetensors format
[DEBUG] model.cpp:511 - init from '..\Flux.1 Q4 F16\ae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:318 - Version: Z-Image
[INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q3_K: 324 | q4_K: 104 | q5_K: 4 | q6_K: 1
[INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q3_K: 144 | q4_K: 104 | q5_K: 4 | q6_K: 1
[INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q3_K: 180
[INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244
[DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes
[DEBUG] llm.hpp:285 - merges size 151387
[DEBUG] llm.hpp:317 - vocab size: 151665
[DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 3153.25 MB(RAM) (398 tensors)
[DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2997.90 MB(RAM) (453 tensors)
[INFO ] stable-diffusion.cpp:555 - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:683 - loading weights
[DEBUG] model.cpp:1359 - using 20 threads for model loading
[DEBUG] model.cpp:1381 - loading tensors from ..\ZImage\z_image_turbo-Q3_K.gguf
|====================> | 453/1095 - 556.51it/s←[K
[DEBUG] model.cpp:1381 - loading tensors from ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf
|======================================> | 851/1095 - 418.39it/s←[K
[DEBUG] model.cpp:1381 - loading tensors from ..\Flux.1 Q4 F16\ae.safetensors
|==================================================| 1095/1095 - 486.67it/s←[K
[INFO ] model.cpp:1590 - loading tensors completed, taking 2.25s (process: 0.00s, read: 1.19s, memcpy: 0.00s, convert: 0.03s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:782 - total params memory size = 6245.72MB (VRAM 6151.15MB, RAM 94.57MB): text_encoders 3153.25MB(VRAM), diffusion_model 2997.90MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:883 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:908 - finished loaded file
[DEBUG] stable-diffusion.cpp:3138 - generate_image 512x512
[INFO ] stable-diffusion.cpp:3169 - sampling using Euler method
[INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3282 - TXT2IMG
[INFO ] stable-diffusion.cpp:1167 - apply at runtime
[DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user
A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>user
', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|>
<|im_start|>assistant
', 1], ]
[DEBUG] llm.hpp:259 - split prompt "<|im_start|>user
" to tokens ["<|im_start|>", "user", "─è", ]
[DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "─ácinematic", ",", "─ámelanch", "olic", "─áphotograph", "─áof", "─áa", "─ásolitary", "─áhood", "ed", "─áfigure", "─áwalking", "─áthrough", "─áa", "─ásprawling", ",", "─árain", "-s", "lick", "ed", "─ámet", "ropolis", "─áat", "─ánight", ".", "─áThe", "─ácity", "─álights", "─áare", "─áa", "─áchaotic", "─áblur", "─áof", "─áneon", "─áorange", "─áand", "─ácool", "─áblue", ",", "─áreflecting", "─áon", "─áthe", "─áwet", "─áasphalt", ".", "─áThe", "─áscene", "─áev", "okes", "─áa", "─ásense", "─áof", "─ábeing", "─áa", "─ásingle", "─ácomponent", "─áin", "─áa", "─ávast", "─ámachine", ".", "─áSuper", "im", "posed", "─áover", "─áthe", "─áimage", "─áin", "─áa", "─ásleek", ",", "─ámodern", ",", "─áslightly", "─áglitch", "ed", "─áfont", "─áis", "─áthe", "─áphilosophical", "─áquote", ":", "─á'", "THE", "─áCITY", "─áIS", "─áA", "─áC", "IR", "CU", "IT", "─áBOARD", ",", "─áAND", "─áI", "─áAM", "─áA", "─áBRO", "KEN", "─áTRANS", "IST", "OR", ".'", "─á--", "─ámo", "ody", ",", "─áatmospheric", ",", "─áprofound", ",", "─ádark", "─áacademic", ]
[DEBUG] llm.hpp:259 - split prompt "<|im_end|>
<|im_start|>assistant
" to tokens ["<|im_end|>", "─è", "<|im_start|>", "assistant", "─è", ]
[INFO ] ggml_extend.hpp:1791 - qwen3 offload params (3153.25 MB, 398 tensors) to runtime backend (Vulkan1), taking 2.45s
[DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 13.34 MB(VRAM)
[DEBUG] conditioner.hpp:1896 - computing condition graph completed, taking 3508 ms
[INFO ] stable-diffusion.cpp:2917 - get_learned_condition completed, taking 3557 ms
[INFO ] stable-diffusion.cpp:3028 - generating image: 1/1 - seed 123456
[INFO ] ggml_extend.hpp:1791 - z_image offload params (2997.90 MB, 453 tensors) to runtime backend (Vulkan1), taking 1.13s
[DEBUG] ggml_extend.hpp:1691 - z_image compute buffer size: 255.60 MB(VRAM)
|==================================================| 10/10 - 6.12s/it←[K
[INFO ] stable-diffusion.cpp:3069 - sampling completed, taking 61.45s
[INFO ] stable-diffusion.cpp:3077 - generating 1 latent images completed, taking 61.93s
[INFO ] stable-diffusion.cpp:3080 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1691 - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:2286 - computing vae decode graph completed, taking 11.13s
[INFO ] stable-diffusion.cpp:3090 - latent 1 decoded, taking 11.13s
[INFO ] stable-diffusion.cpp:3094 - decode_first_stage completed, taking 11.13s
[INFO ] stable-diffusion.cpp:3390 - generate_image completed in 76.66s
save result PNG image to 'output.png' (success)

C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>REM --vae-conv-direct --diffusion-conv-direct

C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>REM --diffusion-fa

C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>PAUSE`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions