support for customizing LoRA multipliers through the sdapi by wbruna · Pull Request #1982 · LostRuins/koboldcpp

wbruna · 2026-02-19T10:54:51Z

~~This is still just an idea!~~

Since we just got support for multiple LoRAs, we could include LoRA customization on the API side, by:

internally allowing the weights to be changed at generation time
showing the preloaded LoRAs under /sdapi/v1/loras
accepting just changing the weights of the preloaded LoRAs through the lora fileld at /sdapi/v1/txt2img and /sdapi/v1/img2img

I recently implemented support on my Python client script for the mainline sd-server implementation, so I have a reasonable idea about how complicated that would be. I'm also aware that the sd.cpp C API would have to be adapted to allow changing LoRA weights without reloading the models.

Do you think this would be worth implementing?

LostRuins · 2026-02-19T11:36:07Z

Does it have any implications on memory use or runtime file loading?

wbruna · 2026-02-19T13:50:40Z

For at_runtime LoRA mode, I believe it wouldn't change at all.

For immediately LoRA mode, it could mean higher memory usage: currently, the code could be unloading the weights right after applying, since they wouldn't be needed anymore (need to check the code to be sure). And to change the weights, we need them back in memory, either reloading from disk or keeping them around in RAM. Generation latency would also increase a bit, because we'd need to reapply the LoRAs (but only when the weight is changed).

henk717 · 2026-02-26T19:19:59Z

Personally I have seen this request a few times. There is demand for it. If its a bit slower during a switch that is better than not having it at all. Just make sure nothing changes if its not used.

wbruna · 2026-02-27T02:14:58Z

Got a first somewhat-working version.

I've included code for the <lora:name:weight> syntax on the prompt, to make testing easier. The API code is implemented, but I didn't test it yet.

As suspected, immediately LoRA mode discards the weights as soon as they are applied (lora->free_params_buffer() in apply_loras_immediately). So we need to either remove that call (as I've done for now), or restrict changing LoRA weights to the at_runtime mode. One way to do that without an extra command-line flag could be:

allow --sdloramult to receive a list of multipliers
LoRAs with multiplier != 0 would have fixed weights, as they are now
LoRAs with multiplier 0 would be allowed runtime multiplier changes (through the sdapi and/or prompt). We could also add a parameter to gendefaults to still be able to set a default non-zero multiplier for them
the presence of any customizable LoRA would force at_runtime mode. This way, we could keep the free_params_buffer call as-is, so setups with no customizable LoRAs would keep working as they are now, with no extra memory usage. It may even be possible to force at_runtime only for the customizable LoRAs.

What do you think?

wbruna · 2026-02-27T02:17:00Z

By the way, it's also possible to support the <lora:name:weight> syntax as an UI functionality, parsing and converting it to the sdapi parameter on the stable-ui. I'm not exactly looking forward to implement it that way, but it would make sense from a compatibility POV, since it'd allow LoRA loading from other sdapi servers too.

wbruna · 2026-03-01T03:22:41Z

Should be ready enough for reviewing.

As described before:

sdloramult now receives a list of multipliers, one per LoRA
by default, the first LoRA have multiplier 1.0, and extra LoRAs 0.0 (no strong opinions about this, it was simply the easiest behavior to implement)
if all multipliers are non-zero, the LoRAs are loaded as before, with no changes to VRAM usage or inference time
if any LoRA is specified with multiplier 0, all LoRAs will be loaded in at_runtime mode
the LoRAs with multiplier 0 are advertised on the sdapi/v1/loras endpoint, and their multipliers can be changed both by the lora sdapi request field and the <lora:name:value> prompt syntax.

wbruna · 2026-03-01T12:42:56Z

Cleaned up the code, and reorganized the commits. Tested with Klein 9b and SDXL. Probably needs some polishing on the launcher and config side, once we decide the zero-multiplier approach is OK.

I'll leave this aside a bit, to focus on master-509-4cdfff5 🙂

Riztard · 2026-03-01T17:11:23Z

is this the intended behavior that both lora weight is 0 in here even tho the value is 1.0 in the launcher?

i thought you said the first lora is 1 by default

it working tho if i add and change the lora weigh with these
<lora:Yoo Ah-yeongilluLora:1><lora:S1 Dramatic Lighting Illustrious_V2:0.5>

i thought it will be like re load the lora & model if the value is changed, but it seems like real dynamic lora(well only for the weight)

Riztard · 2026-03-02T09:16:18Z

not showing absolute path?

LostRuins · 2026-03-02T13:27:33Z

the default behavior right now (before this PR), is when one multiplier is provided (which is the current status quo of the launcher), all loras are initialized at the same strength, which is what should be default i think. E.g. --loramult 0.6 --lora pixel_lora.gguf color_lora.gguf currently loads both loras at 0.6.

Then the API override should augment it to a new value temporarily for that request (only adjustable for those loras loaded at mult 0).

Also I think inputs.lora_apply_mode = 0 #auto for now currently in koboldcpp.py allows the loras to work automatically? I do not recall having to adjust the lora apply mode beyond this

wbruna · 2026-03-02T15:13:00Z

not showing absolute path?

Intentionally omitted, since it could be considered sensitive information. Usually, we'd have a root directory for all the LoRA files, then we could show subpaths under it. But all LoRAs now are specified by full path, so we can't know which part could be shown.

(@LostRuins , a lora-model-dir would make this easy to do, and avoid lots of long paths both on the command line and on the UI, at the cost of a new UI field. I could implement it, if it's OK for you)

wbruna · 2026-03-02T15:13:23Z

the default behavior right now (before this PR), is when one multiplier is provided (which is the current status quo of the launcher), all loras are initialized at the same strength, which is what should be default i think. E.g. --loramult 0.6 --lora pixel_lora.gguf color_lora.gguf currently loads both loras at 0.6.

Then the API override should augment it to a new value temporarily for that request (only adjustable for those loras loaded at mult 0).

Alright, I'll adjust it later (and fix the bug @Riztard mentioned).

Also I think inputs.lora_apply_mode = 0 #auto for now currently in koboldcpp.py allows the loras to work automatically? I do not recall having to adjust the lora apply mode beyond this

auto defaults to immediately, and switches to at_runtime only for quantized models; which is fine. The problem is: in immediately mode, the LoRA weights are discarded from memory as soon as they are applied; so, to change the multipliers for new generations, we'd need to reload the weights from disk. I could change that behavior, but it's tricky because just keeping the weights around would penalize the non-changeable-multiplier case. immediately can also be less accurate for multiplier changes, since precision errors would be cumulative.

at_runtime already keeps the LoRA objects around, so keeping a reference for them on the cache is enough to avoid the I/O. In principle, we could have a mix of fixed immediately and changeable at_runtime LoRAs; but sd.cpp currently doesn't track that property per-LoRA, so we'd need a more extensive and delicate code change.

wbruna · 2026-03-04T02:41:34Z

Rebased on top of #2006 to get a fix for zero-multiplier LoRAs getting stuck, and to be able to test both PRs at the same time; but I'll keep the branches separate.

Also restored the behavior when a single multiplier is specified. Now:

no multipliers: all LoRAs have multiplier 1
single multiplier: all LoRAs have that same multiplier
more than one multiplier: extend multiplier list with zeroes

Also fix typo in the function name.

The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. A single multiplier (1.0 by default) is applied to all LoRAs, to keep it compatible with the previous behavior.

wbruna force-pushed the kcpp_sdapi_loras branch from b0735b5 to 1ddd1a8 Compare February 27, 2026 00:37

wbruna force-pushed the kcpp_sdapi_loras branch from 8d4bc54 to f013f51 Compare February 28, 2026 16:11

wbruna changed the title ~~[WIP] support for customizing LoRA weights through the sdapi~~ support for customizing LoRA weights through the sdapi Mar 1, 2026

wbruna marked this pull request as ready for review March 1, 2026 02:43

wbruna force-pushed the kcpp_sdapi_loras branch from 730030d to b37b9dd Compare March 1, 2026 12:30

wbruna changed the title ~~support for customizing LoRA weights through the sdapi~~ support for customizing LoRA multipliers through the sdapi Mar 1, 2026

LostRuins added the enhancement New feature or request label Mar 2, 2026

wbruna force-pushed the kcpp_sdapi_loras branch from b37b9dd to 2978a85 Compare March 4, 2026 02:30

wbruna mentioned this pull request Mar 4, 2026

sd: sync to master-520-d950627 #2006

Open

LostRuins force-pushed the concedo_experimental branch from ca2cced to 54cf43a Compare March 4, 2026 03:00

wbruna force-pushed the kcpp_sdapi_loras branch from 2978a85 to f318e99 Compare March 4, 2026 10:10

wbruna added 5 commits March 6, 2026 10:27

sd: sync to master-509-4cdfff5

03927e6

sd: Anima support

9c4d0ab

sd: sync to master-514-5792c66

ffdb601

sd: additional workaround for Anima .safetensors model

993cdb1

sd: sync to master-517-ba35dd7

2402bfe

wbruna force-pushed the kcpp_sdapi_loras branch from f318e99 to e59abca Compare March 6, 2026 13:33

wbruna added 4 commits March 6, 2026 20:12

sd: sync to master-520-d950627

add239d

fix corner case in sd_oai_transform_params

f3ec5bd

Also fix typo in the function name.

add support for the <lora:name:multiplier> prompt syntax

8115263

wbruna force-pushed the kcpp_sdapi_loras branch from e59abca to 8115263 Compare March 6, 2026 23:17

wbruna mentioned this pull request Mar 7, 2026

add support for cache modes to accelerate image generation #2021

Draft

Conversation

wbruna commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Feb 19, 2026

Uh oh!

wbruna commented Feb 19, 2026

Uh oh!

henk717 commented Feb 26, 2026

Uh oh!

wbruna commented Feb 27, 2026

Uh oh!

wbruna commented Feb 27, 2026

Uh oh!

wbruna commented Mar 1, 2026

Uh oh!

wbruna commented Mar 1, 2026

Uh oh!

Riztard commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Riztard commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Mar 2, 2026

Uh oh!

wbruna commented Mar 2, 2026

Uh oh!

wbruna commented Mar 2, 2026

Uh oh!

wbruna commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wbruna commented Feb 19, 2026 •

edited

Loading

Riztard commented Mar 1, 2026 •

edited

Loading

Riztard commented Mar 2, 2026 •

edited

Loading