SDXL on Snapdragon X Elite Adreno - Blank Image

Hello, thanks for this repository, it's extremely useful.
I am trying to run SDXL on Snapdragon X Elite, on the Adreno GPU. It runs the full diffusion process in under 10 seconds, and the result image is blank (grey). I tried --type f16 but did not work. Then, I tried --clip-on-cpu and it worked but it was extremely slow (50 sec/iter).Any ideas to fix it?


---------------------------------------------------------------------------------------------------------------------------------------------
PS C:\Users\sborse\dev\llm\stable-diffusion.cpp\build> ./bin/sd -m ..\weights\sd_xl_base_1.0.safetensors --vae ..\weights\sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v --vae-tiling
Option:
    n_threads:                         6
    mode:                              img_gen
    model_path:                        ..\weights\sd_xl_base_1.0.safetensors
    wtype:                             unspecified
    clip_l_path:
    clip_g_path:
    clip_vision_path:
    t5xxl_path:
    diffusion_model_path:
    high_noise_diffusion_model_path:
    vae_path:                          ..\weights\sdxl.vae.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    photo_maker_path:
    pm_id_images_dir:
    pm_id_embed_path:
    pm_style_strength:                 20.00
    output_path:                       output.png
    init_image_path:
    end_image_path:
    mask_image_path:
    control_image_path:
    ref_images_paths:
    control_video_path:
    increase_ref_index:                false
    offload_params_to_cpu:             false
    clip_on_cpu:                       false
    control_net_cpu:                   false
    vae_on_cpu:                        false
    diffusion flash attention:         false
    diffusion Conv2d direct:           false
    vae_conv_direct:                   false
    control_strength:                  0.90
    prompt:                            a lovely cat
    negative_prompt:
    clip_skip:                         -1
    width:                             1024
    height:                            1024
    sample_params:                     (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
    high_noise_sample_params:          (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
    moe_boundary:                      0.875
    flow_shift:                        inf
    strength(img2img):                 0.75
    rng:                               cuda
    seed:                              42
    batch_count:                       1
    vae_tiling:                        true
    upscale_repeats:                   1
    chroma_use_dit_mask:               true
    chroma_use_t5_mask:                false
    chroma_t5_mask_pad:                1
    video_frames:                      1
    vace_strength:                     1.00
    fps:                               16
System Info:
    SSE3 = 0
    AVX = 0
    AVX2 = 0
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 0
    NEON = 1
    ARM_FMA = 1
    F16C = 0
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:161  - Using OpenCL backend
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   -
ggml_opencl: device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 832.0 Compiler DX.18.12.00
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: vector subgroup broadcast support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: device FP16 support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: mem base addr align: 128
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: max mem alloc size: 2048 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: device max workgroup size: 1024
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: SVM coarse grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: SVM fine grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: SVM fine grain system support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: SVM atomics support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: loading OpenCL kernels
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74   -
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65   - ggml_opencl: default device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp:201  - loading model from '..\weights\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sd_xl_base_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255  - loading vae from '..\weights\sdxl.vae.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sdxl.vae.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sdxl.vae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267  - Version: SDXL
[INFO ] stable-diffusion.cpp:298  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:299  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:300  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:301  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:303  - ggml tensor size = 400 bytes
[DEBUG] stable-diffusion.cpp\clip.hpp:171  - vocab size: 49408
[DEBUG] stable-diffusion.cpp\clip.hpp:182  - trigger word img already in vocab
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size =  235.06 MB(VRAM) (196 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size =  1329.29 MB(VRAM) (517 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - unet params backend buffer size =  4900.07 MB(VRAM) (1680 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:565  - loading weights
[DEBUG] model.cpp:1961 - using 6 threads for model loading
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sd_xl_base_1.0.safetensors
  |=============================================>    | 2393/2641 - 627.92it/s
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sdxl.vae.safetensors
  |==================================================| 2641/2641 - 656.15it/s
[INFO ] model.cpp:2288 - loading tensors completed, taking 4.04s (process: 0.01s, read: 3.52s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.35s)
[INFO ] stable-diffusion.cpp:661  - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): text_encoders 1564.36MB(VRAM), diffusion_model 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:714  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:725  - finished loaded file
[DEBUG] stable-diffusion.cpp:2262 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2383 - TXT2IMG
[INFO ] stable-diffusion.cpp:874  - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:894  - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:895  - prompt after extract and remove lora: "a lovely cat"
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345  - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311  - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479  - computing condition graph completed, taking 3724 ms
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345  - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311  - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479  - computing condition graph completed, taking 74 ms
[INFO ] stable-diffusion.cpp:2049 - get_learned_condition completed, taking 3802 ms
[INFO ] stable-diffusion.cpp:2072 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2121 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - unet compute buffer size: 830.86 MB(VRAM)
  |==================================================| 20/20 - 3.61it/s
[INFO ] stable-diffusion.cpp:2158 - sampling completed, taking 5.54s
[INFO ] stable-diffusion.cpp:2166 - generating 1 latent images completed, taking 6.14s
[INFO ] stable-diffusion.cpp:2169 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1521 - VAE Tile size: 32x32
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:817  - num tiles : 7, 7
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:818  - optimal overlap : 0.500000, 0.500000 (targeting 0.500000)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:851  - tile work buffer size: 0.77 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:864  - processing 49 tiles
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - vae compute buffer size: 416.02 MB(VRAM)
  |==================================================| 49/49 - 76.92it/s
[DEBUG] stable-diffusion.cpp:1547 - computing vae decode graph completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2179 - latent 1 decoded, taking 0.56s
[INFO ] stable-diffusion.cpp:2183 - decode_first_stage completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2475 - generate_image completed in 10.51s
save result PNG image to 'output.png'
----------------------------------------------------------------------------------------------------------------------------

<img width="1024" height="1024" alt="Image" src="https://github.com/user-attachments/assets/337f7297-d9e7-4679-a7f9-756d4bf8ebd6" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SDXL on Snapdragon X Elite Adreno - Blank Image #876

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SDXL on Snapdragon X Elite Adreno - Blank Image #876

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions