Skip to content

[Bug] Black Output images on FLUX kontext when reference image is "big resolution" #894

@pedroCabrera

Description

@pedroCabrera

Git commit

40a6a87

Operating System & Version

Windows 11

GGML backends

CUDA

Command-line arguments used

sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_SD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu

Steps to reproduce

Hi, so i'm experimenting with FLUX kontext and i found an interesting bug, so while i can generate "big" images ( lets say 1920*1088 wich is supposed to be the maximum allowed by flux ) that only happens if my reference image is not that big.

Let me explain the diferent situations:
output -> 1920*1088 + image ref -> 960*544 -> OK ( i get an image, but is no ok )

Image

output -> 1920*1088 + image ref -> 1920*1088 -> BLACK OUTPUT

Image

output -> 960*544 + image ref -> 960*544 -> OK

Image

output -> 960*544 + image ref -> 1920*1088 -> BLACK OUTPUT

Image

THE IMAGE REFERENCES ARE
SD
Image
HD
Image

What you expected to happen

If its supported to generate images of 1920*1088 ( wich it is ) it should be posible to fed reference images of the same resolution

What actually happened

Black outputs when reference image is big

Logs / error messages / stack trace

sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu
Option:
n_threads: 12
mode: img_gen
model_path:
wtype: unspecified
clip_l_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
clip_g_path:
clip_vision_path:
t5xxl_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
qwen2vl_path:
qwen2vl_vision_path:
diffusion_model_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
high_noise_diffusion_model_path:
vae_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png
control_video_path:
increase_ref_index: false
offload_params_to_cpu: true
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: true
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: change 'flux.cpp' to 'kontext.cpp'
negative_prompt:
clip_skip: -1
width: 1920
height: 1088
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 15, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: true
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
upscale_tile: 128
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:147 - Using CUDA backend
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:69 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:211 - loading diffusion model from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] model.cpp:1098 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf using gguf format
[DEBUG] model.cpp:1115 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] stable-diffusion.cpp:227 - loading clip_l from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors', prefix = 'text_encoders.clip_l.transformer.'
[INFO ] stable-diffusion.cpp:251 - loading t5xxl from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors', prefix = 'text_encoders.t5xxl.transformer.'
[INFO ] stable-diffusion.cpp:272 - loading vae from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:293 - Version: Flux
[INFO ] stable-diffusion.cpp:324 - Weight type: q4_0
[INFO ] stable-diffusion.cpp:325 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:326 - Diffusion model weight type: q4_0
[INFO ] stable-diffusion.cpp:327 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:329 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:356 - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] flux.hpp:916 - Flux blocks: 19 double, 38 single
[DEBUG] ggml_extend.hpp:1758 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1758 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1758 - flux params backend buffer size = 6482.39 MB(RAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1758 - vae params backend buffer size = 160.00 MB(RAM) (244 tensors)
[DEBUG] stable-diffusion.cpp:604 - loading weights
[DEBUG] model.cpp:2031 - using 12 threads for model loading
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
|===========================> | 780/1439 - 636.73it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
|=================================> | 976/1439 - 683.47it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
|=========================================> | 1195/1439 - 390.52it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
|==================================================| 1439/1439 - 440.74it/s
[INFO ] model.cpp:2358 - loading tensors completed, taking 3.27s (process: 0.01s, read: 2.69s, memcpy: 0.00s, convert: 0.01s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:702 - total params memory size = 15961.23MB (VRAM 15961.23MB, RAM 0.00MB): text_encoders 9318.83MB(VRAM), diffusion_model 6482.39MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:769 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:811 - finished loaded file
[DEBUG] stable-diffusion.cpp:2481 - generate_image 1920x1088
[INFO ] stable-diffusion.cpp:2608 - TXT2IMG
[INFO ] stable-diffusion.cpp:2632 - EDIT mode
[DEBUG] stable-diffusion.cpp:1526 - VAE Tile size: 41x41
[DEBUG] ggml_extend.hpp:832 - num tiles : 10, 5
[DEBUG] ggml_extend.hpp:833 - optimal overlap : 0.460705, 0.420732 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866 - tile work buffer size: 1.44 MB
[INFO ] ggml_extend.hpp:879 - processing 50 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.10s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 348.22 MB(VRAM)
|==================================================| 50/50 - 62.50it/s
[DEBUG] stable-diffusion.cpp:1550 - computing vae encode graph completed, taking 1.07s
[INFO ] stable-diffusion.cpp:2680 - encode_first_stage completed, taking 1.36s
[INFO ] stable-diffusion.cpp:960 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:980 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:981 - prompt after extract and remove lora: "change 'flux.cpp' to 'kontext.cpp'"
[DEBUG] conditioner.hpp:1039 - parse 'change 'flux.cpp' to 'kontext.cpp'' to [['change 'flux.cpp' to 'kontext.cpp'', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:402 - token length: 256
[INFO ] ggml_extend.hpp:1682 - clip offload params (235.06 MB, 196 tensors) to runtime backend (CUDA0), taking 0.04s
[DEBUG] clip.hpp:741 - identity projection
[DEBUG] ggml_extend.hpp:1582 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] clip.hpp:741 - identity projection
[INFO ] ggml_extend.hpp:1682 - t5 offload params (9083.77 MB, 219 tensors) to runtime backend (CUDA0), taking 0.90s
[DEBUG] ggml_extend.hpp:1582 - t5 compute buffer size: 68.25 MB(VRAM)
[DEBUG] conditioner.hpp:1158 - computing condition graph completed, taking 1014 ms
[INFO ] stable-diffusion.cpp:2219 - get_learned_condition completed, taking 1017 ms
[INFO ] stable-diffusion.cpp:2244 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2338 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:1682 - flux offload params (6482.39 MB, 780 tensors) to runtime backend (CUDA0), taking 0.65s
[DEBUG] ggml_extend.hpp:1582 - flux compute buffer size: 5232.50 MB(VRAM)
|==================================================| 15/15 - 5.07s/it
[INFO ] stable-diffusion.cpp:2375 - sampling completed, taking 76.38s
[INFO ] stable-diffusion.cpp:2383 - generating 1 latent images completed, taking 76.86s
[INFO ] stable-diffusion.cpp:2386 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1651 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:832 - num tiles : 14, 7
[DEBUG] ggml_extend.hpp:833 - optimal overlap : 0.500000, 0.458333 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866 - tile work buffer size: 0.81 MB
[INFO ] ggml_extend.hpp:879 - processing 98 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.05s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 416.06 MB(VRAM)
|==================================================| 98/98 - 50.00it/s
[DEBUG] stable-diffusion.cpp:1677 - computing vae decode graph completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2396 - latent 1 decoded, taking 2.13s
[INFO ] stable-diffusion.cpp:2400 - decode_first_stage completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2714 - generate_image completed in 81.39s
save result PNG image to 'output.png'

Additional context / environment details

RTX 4090

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions