Skip to content

[Bug] Separate tiles / channels #1455

@WhyNotHugo

Description

@WhyNotHugo

Git commit

$ git rev-parse HEAD
c97702e1057c2fe13a7074cd9069cb9dd6edc1bf

Operating System & Version

Alpine Linux Edge

GGML backends

Vulkan

Command-line arguments used

./build/bin/sd-cli \
	--diffusion-model ../flux1-dev-q3_k.gguf \
	--vae ../ae.safetensors \
	--clip_l ../clip_l.safetensors \
	--t5xxl ../t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'lol'" \
	--cfg-scale 1.0 \
	--sampling-method euler -v \
	--clip-on-cpu

Steps to reproduce

  1. Download the exact files listed in the docs: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/flux.md#download-weights
  2. Generate an image with the above command

What you expected to happen

Image properly generated

What actually happened

The image has like, all channels separate?

Image

I'm sure there's a specific term to describe this. I don't know it.

Logs / error messages / stack trace

[DEBUG] main.cpp:549  - version: stable-diffusion.cpp version master-586-c97702e, commit c97702e
[DEBUG] main.cpp:550  - System Info: 
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 | 
[DEBUG] main.cpp:551  - SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  image_path: "",
  metadata_format: "text",
  verbose: true,
  color: false,
  canny_preprocess: false,
  convert_name: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false,
  metadata_raw: false,
  metadata_brief: false,
  metadata_all: false
}
[DEBUG] main.cpp:552  - SDContextParams {
  n_threads: 12,
  model_path: "",
  clip_l_path: "../clip_l.safetensors",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "../t5xxl_fp16.safetensors",
  llm_path: "",
  llm_vision_path: "",
  diffusion_model_path: "../flux1-dev-q3_k.gguf",
  high_noise_diffusion_model_path: "",
  vae_path: "../ae.safetensors",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  embeddings: {
  }
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: ".",
  hires_upscalers_dir: "",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  offload_params_to_cpu: false,
  enable_mmap: false,
  control_net_cpu: false,
  clip_on_cpu: true,
  vae_on_cpu: false,
  flash_attn: false,
  diffusion_flash_attn: false,
  diffusion_conv_direct: false,
  vae_conv_direct: false,
  circular: false,
  circular_x: false,
  circular_y: false,
  chroma_use_dit_mask: true,
  qwen_image_zero_cond_t: false,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:553  - SDGenerationParams {
  loras: "{
  }",
  high_noise_loras: "{
  }",
  prompt: "a lovely cat holding a sign says 'lol'",
  negative_prompt: "",
  clip_skip: -1,
  width: -1,
  height: -1,
  batch_count: 1,
  init_image_path: "",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: euler, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf),
  custom_sigmas: [],
  cache_mode: "",
  cache_option: "",
  cache: disabled (threshold=inf, start=0.15, end=0.95),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0.75,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
  upscale_tile_size: 128,
  hires: { enabled: false, upscaler: "Latent (nearest)", model_path: "", scale: 2, target_width: 0, target_height: 0, steps: 0, denoising_strength: 0.7, upscale_tile_size: 128 },
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
}
[DEBUG] stable-diffusion.cpp:184  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:78   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:78   - ggml_vulkan: 0 = AMD Radeon RX 7900 XT (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:205  - Vulkan: Using device 0
[INFO ] stable-diffusion.cpp:270  - loading diffusion model from '../flux1-dev-q3_k.gguf'
[INFO ] model.cpp:229  - load ../flux1-dev-q3_k.gguf using gguf format
[DEBUG] model.cpp:278  - init from '../flux1-dev-q3_k.gguf'
[INFO ] stable-diffusion.cpp:286  - loading clip_l from '../clip_l.safetensors'
[INFO ] model.cpp:232  - load ../clip_l.safetensors using safetensors format
[DEBUG] model.cpp:307  - init from '../clip_l.safetensors', prefix = 'text_encoders.clip_l.transformer.'
[INFO ] stable-diffusion.cpp:310  - loading t5xxl from '../t5xxl_fp16.safetensors'
[INFO ] model.cpp:232  - load ../t5xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:307  - init from '../t5xxl_fp16.safetensors', prefix = 'text_encoders.t5xxl.transformer.'
[INFO ] stable-diffusion.cpp:331  - loading vae from '../ae.safetensors'
[INFO ] model.cpp:232  - load ../ae.safetensors using safetensors format
[DEBUG] model.cpp:307  - init from '../ae.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:356  - Version: Flux 
[INFO ] stable-diffusion.cpp:384  - Weight type stat:                      f32: 720  |     f16: 415  |    q3_K: 304  
[INFO ] stable-diffusion.cpp:385  - Conditioner weight type stat:          f16: 415  
[INFO ] stable-diffusion.cpp:386  - Diffusion model weight type stat:      f32: 476  |    q3_K: 304  
[INFO ] stable-diffusion.cpp:387  - VAE weight type stat:                  f32: 244  
[DEBUG] stable-diffusion.cpp:389  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:434  - CLIP: Using CPU backend
[DEBUG] clip_tokenizer.cpp:65   - vocab size: 49408
[INFO ] flux.hpp:1283 - flux: depth = 19, depth_single_blocks = 38, guidance_embed = true, context_in_dim = 4096, hidden_size = 3072, num_heads = 24
[DEBUG] ggml_extend.hpp:2046 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:2046 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:2046 - flux params backend buffer size =  5105.72 MB(VRAM) (780 tensors)
[INFO ] stable-diffusion.cpp:682  - using VAE for encoding / decoding
[INFO ] auto_encoder_kl.hpp:517  - vae decoder: ch = 128
[DEBUG] ggml_extend.hpp:2046 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:806  - loading weights
[DEBUG] model.cpp:755  - using 12 threads for model loading
[DEBUG] model.cpp:777  - loading tensors from ../flux1-dev-q3_k.gguf
  |===========================>                      | 780/1439 - 8.15GB/s
[DEBUG] model.cpp:777  - loading tensors from ../clip_l.safetensors
  |=================================>                | 976/1439 - 6.41GB/s
[DEBUG] model.cpp:777  - loading tensors from ../t5xxl_fp16.safetensors
  |=========================================>        | 1195/1439 - 11.13GB/s
[DEBUG] model.cpp:777  - loading tensors from ../ae.safetensors
  |==================================================| 1439/1439 - 9.74GB/s
[INFO ] model.cpp:1006 - loading tensors completed, taking 1.47s (process: 0.00s, read: 0.55s, memcpy: 0.00s, convert: 0.01s, copy_to_backend: 0.40s)
[DEBUG] stable-diffusion.cpp:846  - finished loaded file
[INFO ] stable-diffusion.cpp:898  - total params memory size = 14519.13MB (VRAM 5200.30MB, RAM 9318.83MB): text_encoders 9318.83MB(RAM), diffusion_model 5105.72MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:987  - running in Flux FLOW mode
[INFO ] stable-diffusion.cpp:3320 - generate_image 512x512
[INFO ] denoiser.hpp:499  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:2835 - sampling using Euler method
[DEBUG] conditioner.hpp:1157 - parse 'a lovely cat holding a sign says 'lol'' to [['a lovely cat holding a sign says 'lol'', 1], ]
[DEBUG] bpe_tokenizer.cpp:183  - split prompt "a lovely cat holding a sign says 'lol'" to tokens ["a</w>", "lovely</w>", "cat</w>", "holding</w>", "a</w>", "sign</w>", "says</w>", "'</w>", "lol</w>", "'</w>", ]
[DEBUG] t5_unigram_tokenizer.cpp:336  - split prompt "a lovely cat holding a sign says 'lol'" to tokens ["▁", "a", "▁lovely", "▁cat", "▁holding", "▁", "a", "▁sign", "▁says", "▁", "'", "l", "o", "l", "'", ]
[DEBUG] clip.hpp:318  - identity projection
[DEBUG] ggml_extend.hpp:1859 - clip compute buffer size: 1.42 MB(RAM)
[DEBUG] clip.hpp:318  - identity projection
[DEBUG] ggml_extend.hpp:1859 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1272 - computing condition graph completed, taking 6017 ms
[INFO ] stable-diffusion.cpp:3189 - get_learned_condition completed, taking 6.02s
[INFO ] stable-diffusion.cpp:3354 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1859 - flux compute buffer size: 341.50 MB(VRAM)
  |==================================================| 20/20 - 1.30it/s
[INFO ] stable-diffusion.cpp:3385 - sampling completed, taking 15.37s
[INFO ] stable-diffusion.cpp:3403 - generating 1 latent images completed, taking 15.38s
[INFO ] stable-diffusion.cpp:3213 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1859 - vae compute buffer size: 1984.25 MB(VRAM)
[DEBUG] vae.hpp:206  - computing vae decode graph completed, taking 2.58s
[INFO ] stable-diffusion.cpp:3229 - latent 1 decoded, taking 2.58s
[INFO ] stable-diffusion.cpp:3233 - decode_first_stage completed, taking 2.58s
[INFO ] stable-diffusion.cpp:3540 - generate_image completed in 24.17s
[INFO ] main.cpp:440  - save result image 0 to 'output.png' (success)
[INFO ] main.cpp:489  - 1/1 images saved

Additional context / environment details

I tried other models. Exact same result. I don't think the issue are the models, because the image is always like the example above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions