Skip to content

Cannot use lora on flux, is it supported? #786

@hartmark

Description

@hartmark
$ ./bin/sd --diffusion-model ../models/unet/flux/flux1-schnell-Q2_K.gguf \
  --vae ../models/vae/FluxVAE.safetensors \
  --clip_l ../models/clip/clip_l.safetensors \
  --t5xxl ../models/clip/t5xxl_fp8_e4m3fn.safetensors \
  --lora-model-dir ../models/loras/flux \
  --prompt "Cute cat <lora:FLUXTASTIC_V3:0.9>" \
  --cfg-scale 1.0 \
  --sampling-method euler \
  -v --steps 4 --width 1024 --height 1024 --seed -1 --vae-tiling --output "./output/SD_cpp_$(date +%Y%m%d_%H%M%S).png"

Option: 
    n_threads:         12
    mode:              img_gen
    model_path:        
    wtype:             unspecified
    clip_l_path:       ../models/clip/clip_l.safetensors
    clip_g_path:       
    t5xxl_path:        ../models/clip/t5xxl_fp8_e4m3fn.safetensors
    diffusion_model_path:   ../models/unet/flux/flux1-schnell-Q2_K.gguf
    vae_path:          ../models/vae/FluxVAE.safetensors
    taesd_path:        
    esrgan_path:       
    control_net_path:   
    embedding_dir:   
    stacked_id_embed_dir:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       ./output/SD_cpp_20250904_013051.png
    init_img:          
    mask_img:          
    control_image:     
    ref_images_paths:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:false
    diffusion Conv2d direct:false
    vae Conv2d direct:false
    strength(control): 0.90
    prompt:            Cute cat <lora:FLUXTASTIC_V3:0.9>
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         1.00
    img_cfg_scale:     1.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             1024
    height:            1024
    sample_method:     euler
    schedule:          default
    sample_steps:      4
    strength(img2img): 0.75
    rng:               cuda
    seed:              263928822
    batch_count:       1
    vae_tiling:        true
    upscale_repeats:   1
    chroma_use_dit_mask:   true
    chroma_use_t5_mask:    false
    chroma_t5_mask_pad:    1
System Info: 
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:136  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7800 XT, gfx1101 (0x1101), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:199  - loading diffusion model from '../models/unet/flux/flux1-schnell-Q2_K.gguf'
[INFO ] model.cpp:1010 - load ../models/unet/flux/flux1-schnell-Q2_K.gguf using gguf format
[DEBUG] model.cpp:1027 - init from '../models/unet/flux/flux1-schnell-Q2_K.gguf'
[INFO ] stable-diffusion.cpp:208  - loading clip_l from '../models/clip/clip_l.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:224  - loading t5xxl from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] model.cpp:1013 - load ../models/clip/t5xxl_fp8_e4m3fn.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/clip/t5xxl_fp8_e4m3fn.safetensors'
[INFO ] stable-diffusion.cpp:231  - loading vae from '../models/vae/FluxVAE.safetensors'
[INFO ] model.cpp:1013 - load ../models/vae/FluxVAE.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/vae/FluxVAE.safetensors'
[INFO ] stable-diffusion.cpp:243  - Version: Flux 
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q2_K
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:323  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:326  - CLIP: Using CPU backend
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[INFO ] flux.hpp:1094 - Flux blocks: 19 double, 38 single
[INFO ] flux.hpp:1098 - Flux guidance is disabled (Schnell mode)
[DEBUG] ggml_extend.hpp:1241 - clip params backend buffer size =  307.44 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1241 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1241 - flux params backend buffer size =  3824.47 MB(VRAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1241 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:475  - loading weights
[DEBUG] model.cpp:1891 - loading tensors from ../models/unet/flux/flux1-schnell-Q2_K.gguf
  |==================================================| 1435/1435 - 1926.17it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/clip_l.safetensors
  |==================================================| 1435/1435 - 24741.38it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/clip/t5xxl_fp8_e4m3fn.safetensors
  |==================================================| 1435/1435 - 81.13it/s
[DEBUG] model.cpp:1891 - loading tensors from ../models/vae/FluxVAE.safetensors
  |==================================================| 1435/1435 - 28700.00it/s
[INFO ] stable-diffusion.cpp:574  - total params memory size = 13310.25MB (VRAM 3919.04MB, RAM 9391.21MB): clip 9391.21MB(RAM), unet 3824.47MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:578  - loading model from '' completed, taking 18.56s
[INFO ] stable-diffusion.cpp:604  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:664  - finished loaded file
[DEBUG] stable-diffusion.cpp:1903 - generate_image 1024x1024
[INFO ] stable-diffusion.cpp:2033 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1569 - lora FLUXTASTIC_V3:0.90
[DEBUG] stable-diffusion.cpp:1573 - prompt after extract and remove lora: "Cute cat "
[WARN ] stable-diffusion.cpp:736  - In quantized models when applying LoRA, the images have poor quality.
[INFO ] stable-diffusion.cpp:754  - Attempting to apply 1 LoRAs
[INFO ] model.cpp:1013 - load ../models/loras/flux/FLUXTASTIC_V3.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '../models/loras/flux/FLUXTASTIC_V3.safetensors'
[INFO ] lora.hpp:117  - loading LoRA from '../models/loras/flux/FLUXTASTIC_V3.safetensors'
[DEBUG] model.cpp:1891 - loading tensors from ../models/loras/flux/FLUXTASTIC_V3.safetensors
  |==================================================| 988/988 - 0.00it/s
[DEBUG] ggml_extend.hpp:1241 - lora params backend buffer size =  655.50 MB(VRAM) (988 tensors)
[DEBUG] model.cpp:1891 - loading tensors from ../models/loras/flux/FLUXTASTIC_V3.safetensors
  |==================================================| 988/988 - 10400.00it/s
[DEBUG] lora.hpp:160  - lora type: ".lora_A"/".lora_B"
[DEBUG] lora.hpp:162  - finished loaded lora
[DEBUG] lora.hpp:832  - (988 / 988) LoRA tensors applied successfully
[DEBUG] ggml_extend.hpp:1192 - lora compute buffer size: 927.00 MB(VRAM)
[DEBUG] lora.hpp:832  - (988 / 988) LoRA tensors applied successfully
/home/markus/code/stable-diffusion.cpp/ggml/src/ggml-cuda/getrows.cu:201: ggml_cuda_get_rows_switch_src0_type: unsupported src0 type: q2_K

[New LWP 88030]
[New LWP 87975]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f3bb229f042 in ?? () from /usr/lib/libc.so.6
#0  0x00007f3bb229f042 in ?? () from /usr/lib/libc.so.6
#1  0x00007f3bb22931ac in ?? () from /usr/lib/libc.so.6
#2  0x00007f3bb22931f4 in ?? () from /usr/lib/libc.so.6
#3  0x00007f3bb2303dcf in wait4 () from /usr/lib/libc.so.6
#4  0x000055f7298c71c7 in ggml_print_backtrace ()
#5  0x000055f7293b6419 in ggml_abort ()
#6  0x000055f7298953e6 in get_rows_cuda(void const*, ggml_type, int const*, void*, ggml_type, long, unsigned long, unsigned long, unsigned long, long, long, long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, ihipStream_t*) ()
#7  0x000055f72989557f in ggml_cuda_op_get_rows(ggml_backend_cuda_context&, ggml_tensor*) ()
#8  0x000055f72961693a in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) ()
#9  0x000055f7298de567 in ggml_backend_graph_compute ()
#10 0x000055f7294868ca in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) ()
#11 0x000055f7294654ba in LoraModel::apply(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ggml_tensor*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ggml_tensor*> > >, SDVersion, int) ()
#12 0x000055f72950cb39 in StableDiffusionGGML::apply_lora(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, float) ()
#13 0x000055f729465362 in StableDiffusionGGML::apply_loras(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, float> > > const&) ()
#14 0x000055f72943fdeb in generate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, sd_guidance_params_t, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, ggml_tensor*, ggml_tensor*) ()
#15 0x000055f729446897 in generate_image ()
#16 0x000055f7293cdaa0 in main ()
[Inferior 1 (process 87974) detached]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions