-
Notifications
You must be signed in to change notification settings - Fork 431
Open
Description
Hello, thanks for this repository, it's extremely useful.
I am trying to run SDXL on Snapdragon X Elite, on the Adreno GPU. It runs the full diffusion process in under 10 seconds, and the result image is blank (grey). I tried --type f16 but did not work. Then, I tried --clip-on-cpu and it worked but it was extremely slow (50 sec/iter).Any ideas to fix it?
PS C:\Users\sborse\dev\llm\stable-diffusion.cpp\build> ./bin/sd -m ..\weights\sd_xl_base_1.0.safetensors --vae ..\weights\sdxl.vae.safetensors -H 1024 -W 1024 -p "a lovely cat" -v --vae-tiling
Option:
n_threads: 6
mode: img_gen
model_path: ..\weights\sd_xl_base_1.0.safetensors
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
diffusion_model_path:
high_noise_diffusion_model_path:
vae_path: ..\weights\sdxl.vae.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: false
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: a lovely cat
negative_prompt:
clip_skip: -1
width: 1024
height: 1024
sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: true
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 0
AVX = 0
AVX2 = 0
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 0
NEON = 1
ARM_FMA = 1
F16C = 0
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:161 - Using OpenCL backend
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: selected platform: 'QUALCOMM Snapdragon(TM)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 -
ggml_opencl: device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 832.0 Compiler DX.18.12.00
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: vector subgroup broadcast support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: device FP16 support: true
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: mem base addr align: 128
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: max mem alloc size: 2048 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: device max workgroup size: 1024
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM coarse grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM fine grain buffer support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM fine grain system support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: SVM atomics support: false
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: loading OpenCL kernels
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 - .
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:74 -
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:65 - ggml_opencl: default device: 'Qualcomm(R) Adreno(TM) X1-85 GPU (OpenCL 3.0 Qualcomm(R) Adreno(TM) X1-85 GPU)'
[INFO ] stable-diffusion.cpp:201 - loading model from '..\weights\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sd_xl_base_1.0.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:255 - loading vae from '..\weights\sdxl.vae.safetensors'
[INFO ] model.cpp:1044 - load ..\weights\sdxl.vae.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '..\weights\sdxl.vae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:267 - Version: SDXL
[INFO ] stable-diffusion.cpp:298 - Weight type: f16
[INFO ] stable-diffusion.cpp:299 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:300 - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:301 - VAE weight type: f16
[DEBUG] stable-diffusion.cpp:303 - ggml tensor size = 400 bytes
[DEBUG] stable-diffusion.cpp\clip.hpp:171 - vocab size: 49408
[DEBUG] stable-diffusion.cpp\clip.hpp:182 - trigger word img already in vocab
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size = 235.06 MB(VRAM) (196 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - clip params backend buffer size = 1329.29 MB(VRAM) (517 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - unet params backend buffer size = 4900.07 MB(VRAM) (1680 tensors)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1729 - vae params backend buffer size = 94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:565 - loading weights
[DEBUG] model.cpp:1961 - using 6 threads for model loading
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sd_xl_base_1.0.safetensors
|=============================================> | 2393/2641 - 627.92it/s
[DEBUG] model.cpp:2044 - loading tensors from ..\weights\sdxl.vae.safetensors
|==================================================| 2641/2641 - 656.15it/s
[INFO ] model.cpp:2288 - loading tensors completed, taking 4.04s (process: 0.01s, read: 3.52s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.35s)
[INFO ] stable-diffusion.cpp:661 - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): text_encoders 1564.36MB(VRAM), diffusion_model 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:714 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:725 - finished loaded file
[DEBUG] stable-diffusion.cpp:2262 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2383 - TXT2IMG
[INFO ] stable-diffusion.cpp:874 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:894 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:895 - prompt after extract and remove lora: "a lovely cat"
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479 - computing condition graph completed, taking 3724 ms
[DEBUG] stable-diffusion.cpp\conditioner.hpp:345 - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp\clip.hpp:311 - token length: 77
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] stable-diffusion.cpp\conditioner.hpp:479 - computing condition graph completed, taking 74 ms
[INFO ] stable-diffusion.cpp:2049 - get_learned_condition completed, taking 3802 ms
[INFO ] stable-diffusion.cpp:2072 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:2121 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - unet compute buffer size: 830.86 MB(VRAM)
|==================================================| 20/20 - 3.61it/s
[INFO ] stable-diffusion.cpp:2158 - sampling completed, taking 5.54s
[INFO ] stable-diffusion.cpp:2166 - generating 1 latent images completed, taking 6.14s
[INFO ] stable-diffusion.cpp:2169 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1521 - VAE Tile size: 32x32
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:817 - num tiles : 7, 7
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:818 - optimal overlap : 0.500000, 0.500000 (targeting 0.500000)
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:851 - tile work buffer size: 0.77 MB
[INFO ] stable-diffusion.cpp\ggml_extend.hpp:864 - processing 49 tiles
[DEBUG] stable-diffusion.cpp\ggml_extend.hpp:1553 - vae compute buffer size: 416.02 MB(VRAM)
|==================================================| 49/49 - 76.92it/s
[DEBUG] stable-diffusion.cpp:1547 - computing vae decode graph completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2179 - latent 1 decoded, taking 0.56s
[INFO ] stable-diffusion.cpp:2183 - decode_first_stage completed, taking 0.56s
[INFO ] stable-diffusion.cpp:2475 - generate_image completed in 10.51s
save result PNG image to 'output.png'

Metadata
Metadata
Assignees
Labels
No labels