Skip to content

SDXL on Snapdragon X Elite Adreno - Blank Image #876

@sborse3

Description

@sborse3

Hello, thanks for this repository, it's extremely useful.
I am trying to run SDXL on Snapdragon X Elite, on the Adreno GPU. It runs the full diffusion process in under 10 seconds, and the result image is blank (grey). I tried --type f16 but did not work. Then, I tried --clip-on-cpu and it worked but it was extremely slow (50 sec/iter).Any ideas to fix it?


PS C:\Users\sborse\dev\llm\stable-diffusion.cpp\build> ./bin/sd -m ..\weights\sd_xl_base_1.0.safetensors --vae ..\weights\sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v --vae-tiling
Option:
n_threads: 6
mode: img_gen
model_path: ..\weights\sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: ..\weights\sdxl.vae.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: true
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 0
AVX = 0
AVX2 = 0
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 0
NEON = 1
ARM_FMA = 1
F16C = 0
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:161 - Using OpenCL backend
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 -
ggml_opencl: device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 832.0 Compiler DX.18.12.00
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: vector subgroup broadcast support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: device FP16 support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: mem base addr align: 128
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: max mem alloc size: 2048 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: device max workgroup size: 1024
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM coarse grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM fine grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM fine grain system support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM atomics support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: loading OpenCL kernels
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 -
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: default device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp:201 - loading model from '..\weights\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sd_xl_base_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from '..\weights\sdxl.vae.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sdxl.vae.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sdxl.vae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f16
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] stable-diffusion.cpp\clip.hpp:171 - vocab size: 49408
[DEBUG] stable-diffusion.cpp\clip.hpp:182 - trigger word img already in vocab
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size = 235.06 MB(VRAM) (196 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size = 1329.29 MB(VRAM) (517 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - unet params backend buffer size = 4900.07 MB(VRAM) (1680 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:565 - loading weights
[DEBUG] model.cpp:1961 - using 6 threads for model loading
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sd_xl_base_1.0.safetensors
|=============================================> | 2393/2641 - 627.92it/s
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sdxl.vae.safetensors
|==================================================| 2641/2641 - 656.15it/s
[INFO ] model.cpp:2288 - loading tensors completed, taking 4.04s (process: 0.01s, read: 3.52s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.35s)
[INFO ] stable-diffusion.cpp:661 - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): text_encoders 1564.36MB(VRAM), diffusion_model 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:714 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:725 - finished loaded file
[DEBUG] stable-diffusion.cpp:2262 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2383 - TXT2IMG
[INFO ] stable-diffusion.cpp:874 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:894 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:895 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479 - computing condition graph completed, taking 3724 ms
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479 - computing condition graph completed, taking 74 ms
[INFO ] stable-diffusion.cpp:2049 - get_learned_condition completed, taking 3802 ms
[INFO ] stable-diffusion.cpp:2072 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2121 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 3.61it/s
[INFO ] stable-diffusion.cpp:2158 - sampling completed, taking 5.54s
[INFO ] stable-diffusion.cpp:2166 - generating 1 latent images completed, taking 6.14s
[INFO ] stable-diffusion.cpp:2169 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1521 - VAE Tile size: 32x32
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:817 - num tiles : 7, 7
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:818 - optimal overlap : 0.500000, 0.500000 (targeting 0.500000)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:851 - tile work buffer size: 0.77 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:864 - processing 49 tiles
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - vae compute buffer size: 416.02 MB(VRAM)
|==================================================| 49/49 - 76.92it/s
[DEBUG] stable-diffusion.cpp:1547 - computing vae decode graph completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2179 - latent 1 decoded, taking 0.56s
[INFO ] stable-diffusion.cpp:2183 - decode_first_stage completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2475 - generate_image completed in 10.51s
save result PNG image to 'output.png'

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions