update stable-diffusion.cpp to master-306-2abe945 #1732

wbruna · 2025-09-09T21:30:29Z

This updates the embedded sd.cpp code to ~~master-c648001, plus a few sd.cpp pending fixes~~ ~~master-52a97b3~~ ~~master-0ebe6fe~~ ~~master-301-fd693ac~~ master-306-2abe945 .

For anyone wondering, it does not bring in any new functionality; I'll be happy if everything is still working as before 🙂 The diff is massive, and at several points I had to basically rewrite the needed changes.

I did the sync through the following branches:

https://github.com/wbruna/stable-diffusion.cpp/tree/koboldcpp_sd_base : sd.cpp master + pending PRs
- changes: wbruna/stable-diffusion.cpp@master...koboldcpp_sd_base
https://github.com/wbruna/stable-diffusion.cpp/tree/koboldcpp_sd_changes : koboldcpp_sd_base + changes specific to Koboldcpp
- changes: wbruna/stable-diffusion.cpp@koboldcpp_sd_base...koboldcpp_sd_changes

~~The included pending PRs are~~ The pendings PRs below got merged upstream:

vae tiling improvements: encoding support and adaptative overlap leejet/stable-diffusion.cpp#484 (already included in Koboldcpp)
feat: reduce CLIP memory usage with no embeddings leejet/stable-diffusion.cpp#768 (already included in Koboldcpp)
cleanup: do not force VAE type to f32 on SDXL leejet/stable-diffusion.cpp#716 (I've been testing this one locally for a long time)
fix: accept NULL in sd_img_gen_params_t::input_id_images_path leejet/stable-diffusion.cpp#809 (fixed specifically for Koboldcpp)

For sdtype_adapter.cpp, instead of mirroring each new supported parameter, I decided to clean everything up, keeping only the parameters that can be changed through the Koboldcpp API. The others are either fixed on the code (like batch_count) or left at their defaults set by sd.cpp. Most changes are unrelated to the sd.cpp update, so I could submit them separately if you wish.

For the other changes, it may be easier to compare the previous diff chunks (added in 6cf0a96df1a8 ) against the new one between the branches above.

The PR is ready in the I'm-not-aware-of-anything-missing sense, but I've only tested it lightly, so I'll leave it as draft for now. I'll be reviewing and testing it for a few days. Feel free to bug me with anything you notice.

LostRuins · 2025-09-13T03:43:00Z

i'll be away for about a week, will try to look at this after. once you've synced and tested you can unmark as draft first.

LostRuins · 2025-09-13T07:18:30Z

Hmm seems like your previous PR conflicts with this?

wbruna · 2025-09-13T10:35:23Z

Yeah, I didn't touch the stb headers with this one, so we'll have at least a small conflict on main. I'll fix that one shortly.

Looks like leejet/stable-diffusion.cpp#484 is about to be included upstream, too! I should at least grab the current PR branch.

Edit: all four fixes got merged 🎉 I'll probably just reconstruct the branches.

wbruna · 2025-09-14T14:19:59Z

Removed local changes due to merged upstream PRs:

do not force CLIP to F32 (automatic now, since Koboldcpp doesn't enable textual inversions)
VAE tiling fixes
do not force VAE to F32 on SDXL
NULL handling for sd_img_gen_params_t::input_id_images_path
model loading progress display throttling (now using upstream version)

Also removed:

memory buffer increments, which got increased upstream: leejet/stable-diffusion.cpp@dc46993

And included:

forcing T5 to run on CPU (upstream now relies on the clip-on-cpu flag)
The flag isn't enough for us, because we may not know we need T5 until after the model is loaded (it could be a single 'fat' model file)

wbruna · 2025-09-15T16:15:04Z

master-0ebe6fe changed the Photomaker interface to receive a list of images instead of an image directory, which was exactly what our kcpp_img_gen_params_t did; so we don't need that diff anymore 🎉 I've removed the extra parameter and related processing code from the image generation functions. The only remaining Photomaker-related diff is the code that disables it for non-SDXL models.

wbruna · 2025-09-18T15:40:37Z

otherarch/sdcpp/stable-diffusion.cpp

+                if (conditioner_wtype != GGML_TYPE_F32) {
+                    LOG_INFO("CLIP: Forcing CPU backend for T5");
+                    clip_on_cpu = true;
+                }


Not really sure about this. For instance, t5xxl_q8_0.gguf makes black images on ROCm, but works fine on Vulkan. Forcing it to CPU in this case adds around 45s to generation time... OTOH, a Q4_K quant crashes on both.

LostRuins · 2025-09-21T00:44:59Z

i'm back from my trip, so I'll take a look at this over the next few days. kobold is overdue for a release though, so it'll probably have to wait for the next one.

wbruna · 2025-09-21T01:00:30Z

Welcome back!

Yeah, the changes are extensive, and there is at least one still unsolved problematic upstream issue (844). But take a look at the three commits before the sd updates, they may be simple enough to include right now (I can put them in a separate PR if that helps).

LostRuins · 2025-09-21T09:45:31Z

I can put them in a separate PR if that helps

Yes please, that would be good

wbruna · 2025-09-21T11:32:57Z

Image generation is kinda broken with the ggml version on the 1.99 release 😕 I'll open an upstream issue shortly.

Edit: leejet/stable-diffusion.cpp#847

LostRuins · 2025-09-21T13:18:54Z

Thanks, I will temporarily reverse ggml-org#16056 for a stable patch while this is investigated upstream.

Btw your other PR #1746 caused a merge conflict

LostRuins · 2025-09-21T16:02:37Z

@wbruna can you see if 1.99.1 has the prompt adherence work fine? I reverted the commit you listed.

wbruna · 2025-09-21T22:55:08Z

@LostRuins , 1.99.1 seems to be working fine 👍

0cc4m's PR (16151) seems to work, too ~~, but I need to run more tests~~ - edit: test results are good, so I'll go back to the sdtype_adapter refactoring.

LostRuins · 2025-09-22T08:40:48Z

merged 16151 for 1.99.2

henk717 · 2025-09-22T20:34:00Z

Will this be updated to support the new qwen model once that PR lands? Thats the main thing our users are waiting for

wbruna · 2025-09-22T21:57:35Z

Will this be updated to support the new qwen model once that PR lands? Thats the main thing our users are waiting for

Since there are a lot of changes, I'd be more comfortable if we gave the update a bit of testing before including changes for specific support. Of course, if Qwen-Image doesn't need any other special parameters or treatment, it may be just a matter of merging in the new revisions.

wbruna · 2025-09-22T22:38:39Z

@LostRuins , I moved the sdtype_adapter pre-update refactoring to #1753. I'll then focus this PR on the parts which need to be changed for the sd.cpp update.

Things that we'll need to consider at some point or another:

we're still waiting for a fix to SDLX Black output afer PR 55c2e05 leejet/stable-diffusion.cpp#838 . It doesn't actually block this update, though, since we could apply a small diff to force one thread only for model loading
the keep_clip_on_cpu parameter. It'd be really nice if we could run T5 on the GPU if possible, but that depends on the backend and the model quants. So the sensible approach would be yet another config parameter... Maybe a multi-state like the conv2d one: CLIP on CPU, VAE on CPU, both, neither?
leejet used the flow-shift on the Qwen Image example, and that'd be another loading-time parameter
possible new parameters for image generation: scheduler (the new smoothstep scheduler really help with Chroma generation, and I've always missed the ability to change it for specific gens), timestep-shift (the Nitro few-steps models would be really nice for in-chat image generation), guidance, img-cfg-scale...

Marking the PR as 'ready' since I've been testing it for several days.

wbruna · 2025-09-24T21:58:16Z

Updated to master-306-2abe945 , which includes the fixes for leejet/stable-diffusion.cpp#838 and leejet/stable-diffusion.cpp#837 . Just a minor API change, and I don't think there is any major bug left, so it should be ready to go.

Oh, and I forgot about another nice load-time config parameter: offload_params_to_cpu . We need a bigger config screen 😅

LostRuins · 2025-09-25T08:13:40Z

cool, is this ready for review? still kinda busy but i'm hoping to get qwen image and wan into the next release.

wbruna · 2025-09-25T10:27:24Z

It's ready for review. I still need to do a full round of testing, to catch anything I've missed, but all the code should be already in place.

As I mentioned before, the diff here is hard to follow, especially for the first updates. But the branches koboldcpp_sd_base and koboldcpp_sd_changes are up to date, so:

wbruna/stable-diffusion.cpp@master...koboldcpp_sd_base will have all pending PRs (empty right now);
wbruna/stable-diffusion.cpp@koboldcpp_sd_base...koboldcpp_sd_changes will have all Koboldcpp patches;
this PR <-> koboldcpp_sd_changes will be empty (updated by script)

The koboldcpp_sd_changes branch builds and runs, so it can be used for testing too.

wbruna · 2025-09-25T13:35:29Z

Tested: SD1.5, SDXL, LoRAs, SD3.5M, img2img, inpaint, Chroma, Flux Schnell, Photomaker, Kontext, TAESD (fixed a bug).

LostRuins · 2025-09-27T04:10:10Z

Remind me again - have we always been running the T5 on CPU previously? And what backend issues are there with using it on GPU

LostRuins · 2025-09-27T07:41:05Z

Alright did some basic testing and everything seems fine mostly. Will merge shortly

LostRuins · 2025-09-27T09:07:55Z

Merged. I'll download and check out WAN sometime, but for now please report any issues found

wbruna · 2025-09-27T11:29:04Z

Remind me again - have we always been running the T5 on CPU previously? And what backend issues are there with using it on GPU

Yes; that behavior was changed as part of the Wan changes (hard to see on the web because of the size of the diff):

leejet/stable-diffusion.cpp@cb1d975#diff-a8e875e8f7684a2f9e35212d88ee9574001952eb2c049385e73d5e24fd6e2a83L323

But it's not always needed: it depends on the T5 quant and the backend. For me, on Vulkan it works for legacy quants and crashes with k-quants (that may have changed in the last versions); ROCm either crashes or makes all-black images. So we should probably add a loading time control to turn on keep_clip_on_cpu...

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from 1e596c5 to c99ac4d Compare September 13, 2025 11:39

LostRuins force-pushed the concedo_experimental branch from be2b101 to bf8fc46 Compare September 13, 2025 16:12

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from c99ac4d to 9c61e8d Compare September 14, 2025 12:45

wbruna changed the title ~~update stable-diffusion.cpp to master-c648001 (+fixes)~~ update stable-diffusion.cpp to master-52a97b3 Sep 14, 2025

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from 9c61e8d to 1fdea05 Compare September 14, 2025 15:04

wbruna changed the title ~~update stable-diffusion.cpp to master-52a97b3~~ update stable-diffusion.cpp to master-0ebe6fe Sep 15, 2025

wbruna commented Sep 18, 2025

View reviewed changes

wbruna changed the title ~~update stable-diffusion.cpp to master-0ebe6fe~~ update stable-diffusion.cpp to master-301-fd693ac Sep 18, 2025

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from a48487c to 0bdc8e9 Compare September 21, 2025 10:09

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from 0bdc8e9 to 0eaf498 Compare September 21, 2025 15:37

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from 0eaf498 to ff08284 Compare September 22, 2025 21:33

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from ff08284 to 8374423 Compare September 22, 2025 21:59

wbruna mentioned this pull request Sep 22, 2025

sdtype_adapter refactor #1753

Merged

wbruna marked this pull request as ready for review September 22, 2025 22:40

wbruna added 3 commits September 24, 2025 11:26

update stable-diffusion.cpp to master-52a97b3

eef78df

update stable-diffusion.cpp to master-0ebe6fe

5839282

update stable-diffusion.cpp to master-301-fd693ac

0eb04d9

wbruna force-pushed the kcpp_update_sdcpp_2509 branch from 8374423 to 0eb04d9 Compare September 24, 2025 14:36

update stable-diffusion.cpp to master-306-2abe945

2c46b12

wbruna changed the title ~~update stable-diffusion.cpp to master-301-fd693ac~~ update stable-diffusion.cpp to master-306-2abe945 Sep 24, 2025

fix taesd file selection

572bf13

LostRuins merged commit 42087c3 into LostRuins:concedo_experimental Sep 27, 2025

update stable-diffusion.cpp to master-306-2abe945 #1732

update stable-diffusion.cpp to master-306-2abe945 #1732

Uh oh!

Conversation

wbruna commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 13, 2025

Uh oh!

wbruna commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Sep 14, 2025

Uh oh!

wbruna commented Sep 15, 2025

Uh oh!

wbruna Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LostRuins commented Sep 21, 2025

Uh oh!

wbruna commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 21, 2025

Uh oh!

wbruna commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 21, 2025

Uh oh!

LostRuins commented Sep 21, 2025

Uh oh!

wbruna commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Sep 22, 2025

Uh oh!

henk717 commented Sep 22, 2025

Uh oh!

wbruna commented Sep 22, 2025

Uh oh!

wbruna commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Sep 24, 2025

Uh oh!

LostRuins commented Sep 25, 2025

Uh oh!

wbruna commented Sep 25, 2025

Uh oh!

wbruna commented Sep 25, 2025

Uh oh!

LostRuins commented Sep 27, 2025

Uh oh!

LostRuins commented Sep 27, 2025

Uh oh!

LostRuins commented Sep 27, 2025

Uh oh!

wbruna commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wbruna commented Sep 9, 2025 •

edited

Loading

LostRuins commented Sep 13, 2025 •

edited

Loading

wbruna commented Sep 13, 2025 •

edited

Loading

wbruna Sep 18, 2025 •

edited

Loading

wbruna commented Sep 21, 2025 •

edited

Loading

wbruna commented Sep 21, 2025 •

edited

Loading

wbruna commented Sep 21, 2025 •

edited

Loading

wbruna commented Sep 22, 2025 •

edited

Loading