Support q8_0 quantization for image model loading #1692

wbruna · 2025-08-17T18:03:01Z

q4_0 may degrade quality significantly, especially for smaller models like SD 1.5 and SDXL. q8_0 provides a middle-ground, giving half the memory savings of q4_0 but with less loading time and quality loss.

~~Depends on #1678 for the combobox auxiliary function.~~

LostRuins · 2025-08-20T11:28:55Z

Please unmark from draft when ready for review

wbruna · 2025-08-20T12:07:46Z

Also tested other quants. q6_K could be nice to have: loading time seems more or less the same as q4_0, and it makes a lot of difference for SDXL. With the current code, it's just a matter of adding the types and saving values to the lists; but I'm not sure if it wouldn't clutter the interface.

LostRuins · 2025-08-20T12:32:42Z

It would definitely be kind of clutter. For power users who want a specific quant, it's far better for them to do the pre-quantization themselves and then load the GGUF. This option is more for those who want to keep safetensors and have a convenient option.

Additionally, k-quants present some challenges in that they have a block size requirement (128) that's larger than a classic quant (32) and can fail on some architectures.

So if you had to put 3 quants, I'd rather they be q4_0, q5_1 and q8_0. I think 3 quant types are the max we should have for this dynamic quant option.

wbruna · 2025-08-20T15:20:54Z

It would definitely be kind of clutter. For power users who want a specific quant, it's far better for them to do the pre-quantization themselves and then load the GGUF. This option is more for those who want to keep safetensors and have a convenient option.

I agree; I do that myself, especially to be able to apply different quants to different parts of the model. It's just q8_0 that's really convenient to have.

Additionally, k-quants present some challenges in that they have a block size requirement (128) that's larger than a classic quant (32) and can fail on some architectures.

Ah, that's unfortunate.

So if you had to put 3 quants, I'd rather they be q4_0, q5_1 and q8_0. I think 3 quant types are the max we should have for this dynamic quant option.

I don't know if it's just my CPU showing its age, but the _1 conversions seem too slow to be used on-demand:

quant	loading time
original	20s
q8_1	crashes
q8_0	94s
q6_K	860s
q5_1	2280s
q5_0	817s
q4_1	2017s
q4_0	826s

So maybe it would be better to leave it with just q8_0 and q4_0 for now, and see if other users have a need for an additional option.

wbruna · 2025-08-20T16:16:09Z

Just squashed the help and cleanup changes, and rebased. Should be good to go, unless you prefer adding that extra quant.

LostRuins · 2025-08-21T02:09:27Z

Just q8_0 should be fine. I'll review later

LostRuins · 2025-08-21T11:12:56Z

Okay took a look through, and while I do want the q8_0 quant added, I think we should follow the same approach as the existing --quantkv flag for consistency and backwards compatibility.

For reference, take a look at the current current --quantkv flag.

So we do the same thing, 0=no quant, 1=q8, 2=q4, same as we already have in quantkv.

Also for quantkv I used a slider but I agree a dropdown is probably better for sdquant, to save space, since the SD panel is nearly full.

For --quantkv it mandates 1 parameter, but since --sdquant didn't enforce parameters in the past, we could make it optional and default to q4 if used without a parameter to prevent breaking old scripts.

Thus, how about this?

sdparsergrouplora.add_argument("--sdquant",  metavar=('[quantization level 0/1/2]'), help="If specified, loads the model quantized to save memory. 0=off, 1=q8, 2=q4", type=int, choices=[0,1,2], nargs="?", const=2, default=0)

Unspecified = no quant
--sdquant without params = q4
--sdquant [level] = selected level where 0=off,1=q8, 2=q4

Also, we need to handle correctly migrating past kcpps save files. This can be done inside convert_invalid_args, but I think I can handle that part so don't worry about it.

wbruna · 2025-08-21T12:05:34Z

Yeah, the optional parameter seems sensible. I'll give it a try.

I also see a benefit in keeping the parameter different from a quantization name: being able to change its inner workings without changing its purpose - in the same way we don't care, say, how exactly a -3 level is different than -6 on gzip, we just know one is faster, the other saves more space. For instance, a 'light' compression level could mean 'all q8_0 except the VAE at f16' at first, and later be switched to 'all q8_0' under the hood, if we find out that works better.

But since the '1' in current .kcpps files means 'q4' right now, it may be challenging to switch it to mean 'q8' without some other change to differentiate new and old files. I only managed to keep its meaning in the new code by never writing that value back to the new files.

LostRuins · 2025-08-21T15:14:08Z

no, True in kcpps means q4.
1 shouldn't exist. The convert_invalid_args should be able to tell the difference.

Didn't test this but it might work

def strict_equal(a, b):
    return type(a) is type(b) and a == b

print(strict_equal(1, True))   # False
print(strict_equal(1, 1))      # True
print(strict_equal(True, True))# True

dont really need a fully function but just an example

q4_0 may degrade quality significantly, especially for smaller models like SD 1.5 and SDXL. q8_0 provides a middle-ground, giving half the memory savings of q4_0 but loading faster and with less quality loss.

wbruna · 2025-08-21T22:29:57Z

Done; please take a look.

It seemed a bit unhelpful to add a "0=off,1=q8, 2=q4" to the tooltip, so I abused the dropdown strings to add that information. But I'm sure there's a less hackish way to do that...

LostRuins · 2025-08-22T14:16:49Z

Alright seems good enough to merge, i'll probably resize that dropdown a bit as its really janky when scrolled horizontally now (see image).

I'll also edit this to be a little more defensive, so it doesnt break completely should there be some unknown value in future.

 sd_quant_types = {0: -1, 1: 8, 2: 2} # enum sd_type_t
    inputs.quant = sd_quant_types[args.sdquant]

LostRuins · 2025-08-22T15:24:03Z

minor adjustments for sdquant: allow backend to do the translation for the type more defensively, adjust the UI dropdown for clarity.

80dabbb

wbruna · 2025-08-22T16:08:51Z

Looks good to me. IMHO, it'd be a little less confusing if the values in the interface matched the values on the command line, but no strong feelings about it :-)

wbruna force-pushed the koboldcpp_quant_sd branch from 48fe5b4 to bc6a440 Compare August 19, 2025 09:47

wbruna mentioned this pull request Aug 19, 2025

Add flash attention and conv2d direct controls for image generation #1678

Merged

wbruna force-pushed the koboldcpp_quant_sd branch from bc6a440 to 431851b Compare August 19, 2025 21:55

LostRuins added the enhancement New feature or request label Aug 20, 2025

wbruna force-pushed the koboldcpp_quant_sd branch from 431851b to 3be29ab Compare August 20, 2025 11:08

wbruna marked this pull request as ready for review August 20, 2025 11:51

wbruna force-pushed the koboldcpp_quant_sd branch from 2a4fd75 to c0e4404 Compare August 20, 2025 16:15

wbruna added 3 commits August 21, 2025 19:17

Support q8_0 quantization for image model loading

58d3eb2

q4_0 may degrade quality significantly, especially for smaller models like SD 1.5 and SDXL. q8_0 provides a middle-ground, giving half the memory savings of q4_0 but loading faster and with less quality loss.

Accept --sdquant with no parameters

1cf3a19

Use numerical values for the sdquant option

9450531

wbruna force-pushed the koboldcpp_quant_sd branch from c0e4404 to 9450531 Compare August 21, 2025 22:22

LostRuins approved these changes Aug 22, 2025

View reviewed changes

LostRuins merged commit 2f8b0ec into LostRuins:concedo_experimental Aug 22, 2025

wbruna deleted the koboldcpp_quant_sd branch August 22, 2025 16:19

Support q8_0 quantization for image model loading #1692

Support q8_0 quantization for image model loading #1692

Uh oh!

Conversation

wbruna commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Aug 20, 2025

Uh oh!

wbruna commented Aug 20, 2025

Uh oh!

LostRuins commented Aug 20, 2025

Uh oh!

wbruna commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Aug 20, 2025

Uh oh!

LostRuins commented Aug 21, 2025

Uh oh!

LostRuins commented Aug 21, 2025

Uh oh!

wbruna commented Aug 21, 2025

Uh oh!

LostRuins commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Aug 21, 2025

Uh oh!

LostRuins commented Aug 22, 2025

Uh oh!

LostRuins commented Aug 22, 2025

Uh oh!

wbruna commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wbruna commented Aug 17, 2025 •

edited

Loading

wbruna commented Aug 20, 2025 •

edited

Loading

LostRuins commented Aug 21, 2025 •

edited

Loading