Skip to content

Conversation

wbruna
Copy link
Contributor

@wbruna wbruna commented Oct 5, 2025

For #851 . Allow the model loading logic to tolerate missing layers, which is enough to run the 12B Pruning variant:

https://huggingface.co/OPPOer/Qwen-Image-Pruning

Tested with the Q4_K_M quant from https://huggingface.co/wsbagnsv1/Qwen-Image-Pruning-GGUF :

teste_1759693079

@wbruna
Copy link
Contributor Author

wbruna commented Oct 5, 2025

Quality seems a little worse than the Lightning model, with ~30% less peak VRAM usage, and similar speed gains.

wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 6, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 9, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 9, 2025
wbruna added a commit to wbruna/llama.cpp that referenced this pull request Oct 10, 2025
LostRuins pushed a commit to LostRuins/koboldcpp that referenced this pull request Oct 10, 2025
@wbruna
Copy link
Contributor Author

wbruna commented Oct 10, 2025

@leejet , looks like a123e25 from the qwen_image_edit branch is enough to support the '13b' pruned model. Thanks!

The '12b' variant still doesn't work, though, maybe because it has non-contiguous layers. I guess they're keeping the non-pruned layers with the same number as they have on the original model.

@wbruna wbruna marked this pull request as draft October 10, 2025 15:43
@LostRuins
Copy link
Contributor

@wbruna I think it might be a problem with the GGUF quant, not the model. Look at the GGUF https://huggingface.co/wsbagnsv1/Qwen-Image-Pruning-GGUF/tree/main?show_file_info=Qwen-Image-Pruning-12b-Q4_0.gguf Transformer block 18,

versus the original https://huggingface.co/OPPOer/Qwen-Image-Pruning/tree/main/Qwen-Image-12B/transformer?show_file_info=Qwen-Image-12B%2Ftransformer%2Fdiffusion_pytorch_model-00002-of-00003.safetensors

image image

I've reported it to the GGUF quant guy but he doesn't seem to get what i mean https://huggingface.co/wsbagnsv1/Qwen-Image-Pruning-GGUF/discussions/1

@wbruna wbruna mentioned this pull request Oct 12, 2025
@wbruna
Copy link
Contributor Author

wbruna commented Oct 12, 2025

@leejet , looks like a123e25 from the qwen_image_edit branch is enough to support the '13b' pruned model. Thanks!

Not working anymore after d21d1aa (didn't check which revision). I'll retest and update this PR.

@LostRuins
Copy link
Contributor

What do you mean "Not working anymore", is it still generating an image? It seems to work for me

@wbruna wbruna force-pushed the qwen_image_pruning branch from b9d7b2b to 9bc2e3c Compare October 12, 2025 10:34
@wbruna wbruna marked this pull request as ready for review October 12, 2025 10:35
@wbruna
Copy link
Contributor Author

wbruna commented Oct 12, 2025

What do you mean "Not working anymore", is it still generating an image? It seems to work for me

This PR still works; I was referring to my previous comment, which mentioned that a123e25 (from the qwen_image branch itself) made this PR unnecessary.

@leejet
Copy link
Owner

leejet commented Oct 12, 2025

The support for dynamic number of Qwen image transformer blocks is available in the qwen_image_edit branch.

@wbruna
Copy link
Contributor Author

wbruna commented Oct 12, 2025

The support for dynamic number of Qwen image transformer blocks is available in the qwen_image_edit branch.

I tested it with the same model I use to test this PR (the 13b, they renamed it afterwards):

[DEBUG] model.cpp:2088 - loading tensors from /opt/sdif/models/SD/Qwen-Image-Pruning-Q4_K_M.gguf
  |===================================>              | 1293/1825 - 45.33it/s
[DEBUG] model.cpp:2088 - loading tensors from /opt/llm/Qwen2.5-VL-7B-Instruct-IQ4_XS.gguf
  |============================================>     | 1631/1825 - 43.55it/s
[DEBUG] model.cpp:2088 - loading tensors from /opt/sdif/models/VAE/Qwen_Image-VAE.safetensors
  |============================================>     | 1634/1825 - 43.63it/s[INFO ] model.cpp:2358 - unknown tensor 'first_stage_model.conv1.bias | bf16 | 1 [32, 1, 1, 1, 1]' in model file
[INFO ] model.cpp:2358 - unknown tensor 'first_stage_model.conv1.weight | bf16 | 4 [1, 1, 1, 1024, 1]' in model file
  |==================================================| 1825/1825 - 48.21it/s
[INFO ] model.cpp:2326 - loading tensors completed, taking 37.90s (process: 0.05s, read: 24.94s, memcpy: 0.00s, convert: 0.47s, copy_to_backend: 12.08s)
[ERROR] model.cpp:2399 - tensor 'model.diffusion_model.transformer_blocks.40.attn.add_k_proj.bias' not in model file
[ERROR] model.cpp:2399 - tensor 'model.diffusion_model.transformer_blocks.40.attn.add_k_proj.weight' not in model file
[ERROR] model.cpp:2399 - tensor 'model.diffusion_model.transformer_blocks.40.attn.add_q_proj.bias' not in model file
[ERROR] model.cpp:2399 - tensor 'model.diffusion_model.transformer_blocks.40.attn.add_q_proj.weight' not in model file
[ERROR] model.cpp:2399 - tensor 'model.diffusion_model.transformer_blocks.40.attn.add_v_proj.bias' not in model file
(...)

@leejet
Copy link
Owner

leejet commented Oct 12, 2025

Please use the latest code from this PR #877.

@wbruna
Copy link
Contributor Author

wbruna commented Oct 12, 2025

Oh, I see: I forgot that rev was on the qwen_image_edit branch 🤦

@LostRuins , btw: if you're already syncing with that one, feel free to drop my changes, they shouldn't be necessary.

@wbruna wbruna marked this pull request as draft October 12, 2025 11:09
@leejet leejet deleted the branch leejet:qwen_image October 12, 2025 16:23
@leejet leejet closed this Oct 12, 2025
@leejet
Copy link
Owner

leejet commented Oct 12, 2025

I noticed that this PR was automatically closed after the qwen_image branch was deleted.
If the related changes are still relevant, please feel free to reopen it for further review.

@wbruna
Copy link
Contributor Author

wbruna commented Oct 12, 2025

I noticed that this PR was automatically closed after the qwen_image branch was deleted. If the related changes are still relevant, please feel free to reopen it for further review.

Thanks. I was keeping this open just to further investigate about the 12b variant, but I could just open another one instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants