-
Notifications
You must be signed in to change notification settings - Fork 431
Closed
Description
$ ./bin/sd --diffusion-model ../models/unet/flux/flux1-schnell-Q2_K.gguf \
--vae ../models/vae/FluxVAE.safetensors \
--clip_l ../models/clip/clip_l.safetensors \
--t5xxl ../models/clip/t5xxl_fp8_e4m3fn.safetensors \
--lora-model-dir ../models/loras/flux \
--prompt "Cute cat <lora:FLUXTASTIC_V3:0.9>" \
--cfg-scale 1.0 \
--sampling-method euler \
-v --steps 4 --width 1024 --height 1024 --seed -1 --vae-tiling --output "./output/SD_cpp_$(date +%Y%m%d_%H%M%S).png"
Option:
n_threads: 12
mode: img_gen
model_path:
wtype: unspecified
clip_l_path: ../models/clip/clip_l.safetensors
clip_g_path:
t5xxl_path: ../models/clip/t5xxl_fp8_e4m3fn.safetensors
diffusion_model_path: ../models/unet/flux/flux1-schnell-Q2_K.gguf
vae_path: ../models/vae/FluxVAE.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
stacked_id_embed_dir:
input_id_images_path:
style ratio: 20.00
normalize input image : false
output_path: ./output/SD_cpp_20250904_013051.png
init_img:
mask_img:
control_image:
ref_images_paths:
clip on cpu: false
controlnet cpu: false
vae decoder on cpu:false
diffusion flash attention:false
diffusion Conv2d direct:false
vae Conv2d direct:false
strength(control): 0.90
prompt: Cute cat <lora:FLUXTASTIC_V3:0.9>
negative_prompt:
min_cfg: 1.00
cfg_scale: 1.00
img_cfg_scale: 1.00
slg_scale: 0.00
guidance: 3.50
eta: 0.00
clip_skip: -1
width: 1024
height: 1024
sample_method: euler
schedule: default
sample_steps: 4
strength(img2img): 0.75
rng: cuda
seed: 263928822
batch_count: 1
vae_tiling: true
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:136 - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:199 - loading diffusion model from '../models/unet/flux/flux1-schnell-Q2_K.gguf'
[INFO ] model.cpp:1010 - load ../models/unet/flux/flux1-schnell-Q2_K.gguf using gguf format
[DEBUG] model.cpp:1027 - init from '../models/unet/flux/flux1-schnell-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:208 - loading clip_l from '../models/clip/clip_l.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:224 - loading t5xxl from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:231 - loading vae from '../models/vae/FluxVAE.safetensors'
[INFO ] model.cpp:1013 - load ../models/vae/FluxVAE.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/vae/FluxVAE.safetensors'
[INFO ] stable-diffusion.cpp:243 - Version: Flux
[INFO ] stable-diffusion.cpp:277 - Weight type: q2_K
[INFO ] stable-diffusion.cpp:278 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:279 - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:280 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:282 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:323 - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:326 - CLIP: Using CPU backend
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] flux.hpp:1094 - Flux blocks: 19 double, 38 single
[INFO ] flux.hpp:1098 - Flux guidance is disabled (Schnell mode)
[DEBUG] ggml_extend.hpp:1241 - clip params backend buffer size = 307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1241 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1241 - flux params backend buffer size = 3824.47 MB(VRAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1241 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:475 - loading weights
[DEBUG] model.cpp:1891 - loading tensors from ../models/unet/flux/flux1-schnell-Q2_K.gguf
|==================================================| 1435/1435 - 1926.17it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/clip_l.safetensors
|==================================================| 1435/1435 - 24741.38it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/t5xxl_fp8_e4m3fn.safetensors
|==================================================| 1435/1435 - 81.13it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/vae/FluxVAE.safetensors
|==================================================| 1435/1435 - 28700.00it/s
[INFO ] stable-diffusion.cpp:574 - total params memory size = 13310.25MB (VRAM 3919.04MB, RAM 9391.21MB): clip 9391.21MB(RAM), unet 3824.47MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:578 - loading model from '' completed, taking 18.56s
[INFO ] stable-diffusion.cpp:604 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:664 - finished loaded file
[DEBUG] stable-diffusion.cpp:1903 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2033 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1569 - lora FLUXTASTIC_V3:0.90
[DEBUG] stable-diffusion.cpp:1573 - prompt after extract and remove lora: "Cute cat "
[WARN ] stable-diffusion.cpp:736 - In quantized models when applying LoRA, the images have poor quality.
[INFO ] stable-diffusion.cpp:754 - Attempting to apply 1 LoRAs
[INFO ] model.cpp:1013 - load ../models/loras/flux/FLUXTASTIC_V3.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/loras/flux/FLUXTASTIC_V3.safetensors'
[INFO ] lora.hpp:117 - loading LoRA from '../models/loras/flux/FLUXTASTIC_V3.safetensors'
[DEBUG] model.cpp:1891 - loading tensors from ../models/loras/flux/FLUXTASTIC_V3.safetensors
|==================================================| 988/988 - 0.00it/s
[DEBUG] ggml_extend.hpp:1241 - lora params backend buffer size = 655.50 MB(VRAM) (988 tensors)
[DEBUG] model.cpp:1891 - loading tensors from ../models/loras/flux/FLUXTASTIC_V3.safetensors
|==================================================| 988/988 - 10400.00it/s
[DEBUG] lora.hpp:160 - lora type: ".lora_A"/".lora_B"
[DEBUG] lora.hpp:162 - finished loaded lora
[DEBUG] lora.hpp:832 - (988 / 988) LoRA tensors applied successfully
[DEBUG] ggml_extend.hpp:1192 - lora compute buffer size: 927.00 MB(VRAM)
[DEBUG] lora.hpp:832 - (988 / 988) LoRA tensors applied successfully
/home/markus/code/stable-diffusion.cpp/ggml/src/ggml-cuda/getrows.cu:201: ggml_cuda_get_rows_switch_src0_type: unsupported src0 type: q2_K
[New LWP 88030]
[New LWP 87975]
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f3bb229f042 in ?? () from /usr/lib/libc.so.6
#0 0x00007f3bb229f042 in ?? () from /usr/lib/libc.so.6
#1 0x00007f3bb22931ac in ?? () from /usr/lib/libc.so.6
#2 0x00007f3bb22931f4 in ?? () from /usr/lib/libc.so.6
#3 0x00007f3bb2303dcf in wait4 () from /usr/lib/libc.so.6
#4 0x000055f7298c71c7 in ggml_print_backtrace ()
#5 0x000055f7293b6419 in ggml_abort ()
#6 0x000055f7298953e6 in get_rows_cuda(void const*, ggml_type, int const*, void*, ggml_type, long, unsigned long, unsigned long, unsigned long, long, long, long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, ihipStream_t*) ()
#7 0x000055f72989557f in ggml_cuda_op_get_rows(ggml_backend_cuda_context&, ggml_tensor*) ()
#8 0x000055f72961693a in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#9 0x000055f7298de567 in ggml_backend_graph_compute ()
#10 0x000055f7294868ca in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#11 0x000055f7294654ba in LoraModel::apply(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ggml_tensor*> > >, SDVersion, int) ()
#12 0x000055f72950cb39 in StableDiffusionGGML::apply_lora(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, float) ()
#13 0x000055f729465362 in StableDiffusionGGML::apply_loras(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, float> > > const&) ()
#14 0x000055f72943fdeb in generate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, sd_guidance_params_t, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*, ggml_tensor*) ()
#15 0x000055f729446897 in generate_image ()
#16 0x000055f7293cdaa0 in main ()
[Inferior 1 (process 87974) detached]
Metadata
Metadata
Assignees
Labels
No labels