[quantization] Introduce `smse` by stamalakhov · Pull Request #549 · Samsung/TICO

stamalakhov · 2026-03-11T07:05:08Z

This PR adds smse option to GPTQ to improve accuracy.

model	orig_PPL	mse_PPL	smse_PPL
TLLama1B	7.97	8.64	8.55
LLama3.2-1B	13.17	18.39	15.26
LLama3.2-3B	11.05	12.92	11.95

TLLama1B

model	arc_challenge	arc_easy	openbookqa	winogrande
original	0.3080	0.6178	0.2520	0.6046
mse	0.2833	0.6035	0.2400	0.6038
smse	0.2756	0.6014	0.2520	0.5967

benchmarks

Original RESULTS

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.3080|±  |0.0135|
|             |       |none  |     0|acc_norm|↑  |0.3276|±  |0.0137|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6178|±  |0.0100|
|             |       |none  |     0|acc_norm|↑  |0.5480|±  |0.0102|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2520|±  |0.0194|
|             |       |none  |     0|acc_norm|↑  |0.3560|±  |0.0214|
|winogrande   |      1|none  |     0|acc     |↑  |0.6046|±  |0.0137|

MSE :

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.2833|±  |0.0132|
|             |       |none  |     0|acc_norm|↑  |0.3217|±  |0.0137|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6035|±  |0.0100|
|             |       |none  |     0|acc_norm|↑  |0.5396|±  |0.0102|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2400|±  |0.0191|
|             |       |none  |     0|acc_norm|↑  |0.3600|±  |0.0215|
|winogrande   |      1|none  |     0|acc     |↑  |0.6038|±  |0.0137|

SMSE :

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.2756|±  |0.0131|
|             |       |none  |     0|acc_norm|↑  |0.3038|±  |0.0134|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6014|±  |0.0100|
|             |       |none  |     0|acc_norm|↑  |0.5299|±  |0.0102|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2520|±  |0.0194|
|             |       |none  |     0|acc_norm|↑  |0.3380|±  |0.0212|
|winogrande   |      1|none  |     0|acc     |↑  |0.5967|±  |0.0138|

logs

mse

smse

LLama3.2-1B

model	arc_challenge	arc_easy	openbookqa	winogrande
original	0.3464	0.6907	0.2700	0.6140
mse	0.3225	0.6490	0.2340	0.5462
smse	0.3242	0.6494	0.2620	0.6062

benchmarks

Original:

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.3464|±  |0.0139|
|             |       |none  |     0|acc_norm|↑  |0.3780|±  |0.0142|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6907|±  |0.0095|
|             |       |none  |     0|acc_norm|↑  |0.6385|±  |0.0099|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2700|±  |0.0199|
|             |       |none  |     0|acc_norm|↑  |0.3720|±  |0.0216|
|winogrande   |      1|none  |     0|acc     |↑  |0.6140|±  |0.0137|

mse:

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.3225|±  |0.0137|
|             |       |none  |     0|acc_norm|↑  |0.3473|±  |0.0139|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6490|±  |0.0098|
|             |       |none  |     0|acc_norm|↑  |0.6178|±  |0.0100|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2340|±  |0.0190|
|             |       |none  |     0|acc_norm|↑  |0.3400|±  |0.0212|
|winogrande   |      1|none  |     0|acc     |↑  |0.5462|±  |0.0140|

smse:

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.3242|±  |0.0137|
|             |       |none  |     0|acc_norm|↑  |0.3524|±  |0.0140|
|arc_easy     |      1|none  |     0|acc     |↑  |0.6494|±  |0.0098|
|             |       |none  |     0|acc_norm|↑  |0.6006|±  |0.0101|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2620|±  |0.0197|
|             |       |none  |     0|acc_norm|↑  |0.3640|±  |0.0215|
|winogrande   |      1|none  |     0|acc     |↑  |0.6062|±  |0.0137|

logs

mse

smse

LLama3.2-3B

model	arc_challenge	arc_easy	openbookqa	winogrande
original	0.4352	0.7521	0.2920	0.6867
mse	0.4053	0.7172	0.2780	0.6646
smse	0.3865	0.7197	0.2800	0.6780

benchmarks

Original:

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.4352|±  |0.0145|
|             |       |none  |     0|acc_norm|↑  |0.4616|±  |0.0146|
|arc_easy     |      1|none  |     0|acc     |↑  |0.7521|±  |0.0089|
|             |       |none  |     0|acc_norm|↑  |0.7104|±  |0.0093|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2920|±  |0.0204|
|             |       |none  |     0|acc_norm|↑  |0.3920|±  |0.0219|
|winogrande   |      1|none  |     0|acc     |↑  |0.6867|±  |0.0130|

mse :

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.4053|±  |0.0143|
|             |       |none  |     0|acc_norm|↑  |0.4360|±  |0.0145|
|arc_easy     |      1|none  |     0|acc     |↑  |0.7172|±  |0.0092|
|             |       |none  |     0|acc_norm|↑  |0.6553|±  |0.0098|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2780|±  |0.0201|
|             |       |none  |     0|acc_norm|↑  |0.3780|±  |0.0217|
|winogrande   |      1|none  |     0|acc     |↑  |0.6646|±  |0.0133|

smse:

|    Tasks    |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|-------------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.3865|±  |0.0142|
|             |       |none  |     0|acc_norm|↑  |0.4232|±  |0.0144|
|arc_easy     |      1|none  |     0|acc     |↑  |0.7197|±  |0.0092|
|             |       |none  |     0|acc_norm|↑  |0.6650|±  |0.0097|
|openbookqa   |      1|none  |     0|acc     |↑  |0.2800|±  |0.0201|
|             |       |none  |     0|acc_norm|↑  |0.3700|±  |0.0216|
|winogrande   |      1|none  |     0|acc     |↑  |0.6780|±  |0.0131|

logs

mse

smse

Note for reviewers:
Although smse provides the best PPL on wikipedia it does not provide the best performance on benchmarks:
(Seems like smse can overfit over wikipedia in exchange for some other tasks).

for increasing number of samples we get:

model	ppl	arc_challenge	arc_easy	openbookqa	winogrande
original	11.05	0.4352	0.7521	0.2920	0.6867
smse_128	11.95	0.3865	0.7197	0.2800	0.6780
smse_256	12.23	0.4010	0.7298	0.2820	0.6606
smse_512	12.27	0.4172	0.7247	0.2680	0.6646

although ppl increased slightly we got no performance drop on any of the benchmarks.

logs

smse_256

smse_512

So in personal and humble opinion, in case we had some refined balanced dataset (not wikitext) smse will be able to considerably improve model performance.

TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This PR adds `smse` option to GPTQ to improve accuracy. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov · 2026-03-13T10:51:18Z

@mhs4670go
Seems like we can remove performance drop on arc_challenge for 3B model by increasing number of samples which seems to be fair.

mhs4670go · 2026-03-16T03:19:18Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+        type=str,
+        default=None,
+        help="Whether and how to use mse in gptq (none/mse/smse/)",


How about using choices instead?

mhs4670go · 2026-03-16T03:20:14Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py


+        sens = None
+        if args.gptq_mse is not None and (
+            args.gptq_mse == "smse" or args.gptq_mse == "smse_for_gptq"


smse_for_gptq seems a bit duplicate option name. Is it necessar to have?

Sorry. I'll remove it.

mhs4670go · 2026-03-16T03:35:10Z

tico/quantization/algorithm/gptq/quant.py

IMHO, there's no explanation about the smse feature. What is smse or why sensitivity come in, etc. Could you add some documents for this? at REMADME.md or some places that you think is a good place?

IMHO, there's no explanation about the smse feature. What is smse or why sensitivity come in, etc. Could you add some documents for this? at REMADME.md or some places that you think is a good place?

ok.

mhs4670go · 2026-03-16T03:45:50Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+            outputs = logits.squeeze()
+            targets = targets.squeeze()
+
+            b_indices = [outputs.shape[0] - 1]  # priority to the last token


Just curiosity, b_indices alwyas a list whose size is one. Below for loop is necessary?

Currently - no. I'll remove it. Thank you!

mhs4670go · 2026-03-16T04:19:09Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+    return dataloader
+
+
+class SensitivityCalibrator:


Empirical Fisher Information

For the reviewers, Empirical Fisher Information is a practical way to estimate how important each model parameter is.

Intuitively, the idea is simple:

If changing a weight causes the model's output to change a lot, that weight is important.
If changing a weight barely affects the output, that weight is less important.

A common way to estimate this is to look at the squared gradients of the loss. Large gradients mean the model is sensitive to that weight, so it should be treated more carefully (e.g., during quantization).

However, Fisher Information is defined with respect to samples drawn from the model's own probability distribution. In practice, instead of sampling from the full distribution (which is expensive), many implementations simply use the model's own prediction as a pseudo label:

target ≈ argmax(logits)

So the procedure becomes:

Run the model to obtain logits.

Use the predicted token (argmax) as a pseudo target.

Compute the loss and gradients.

Accumulate squared gradients to estimate parameter sensitivity.

The resulting values serve as a weight importance / sensitivity estimate, which can then be used to guide quantization.

mhs4670go · 2026-03-16T04:23:50Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

+                # update second order information as current weights gradients are ready
+                for name in modules_to_process:
+                    cur_module = modules_to_process[name]
+                    cur_grad = copy.deepcopy(cur_module.weight.grad.detach())  # type: ignore[union-attr]


Is this deepcopy necessary? Or, how about using clone() instead?

Previously I had some issues with clone(). But currently clone() seems to work just fine. Thank you!

mhs4670go · 2026-03-16T04:31:31Z

tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py

I'm not sure but current structure looks like this.

# pass 1: sensitivity for inp in calib_inputs: forward backward accumlate grad^2 # pass 2: GPTQ for inp in calib_inputs: forward accululate GPTQ stats

You don't have to revisit right now but it would be better to do the same thing wiht only one pass later.

Addtionally, classes or apis for smse could be moved to gptq/utils.py instead.

I'm not sure but current structure looks like this.

@mhs4670go
Yes. Right now they are similar (just iterating through the dataset). I just tried to introduce minimal changes to GPTQ . But to merge them in a single pass is a good idea. There is just a possibility to use external sensitivity, to try something different, may be calibrating sensitivity on another dataset. That's why IMHO merging them can be done later. Or should i remove external sensitivity and merge sensitivity pass with inference pass?

Addtionally, classes or apis for smse could be moved to gptq/utils.py instead.

I'll do it.

Thank you!

@mhs4670go
Moreover sensitivites can be used elsewhere (MPQ solution e.g.).

That's why IMHO merging them can be done later.

As you said, I think you can do it later. Please feel free to do the work later.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

mhs4670go · 2026-03-16T08:48:29Z

tico/quantization/algorithm/gptq/README.md

+`mse` parameter of `GPTQConfig` is supposed to tune quantizer for using in GPTQ.
+There are two options :
+1. `mse`- vanilla `mse`. Produce quantization parameters for GPTQ quantizer (`min`\`max`) which minimize mean squared error of quantization. $MSE\_MIN\_MAX\_FOR\_W = argmin_{min, max}||W-Q_{min, max}(W)||^2$.
+2. `smse` - sensitivity-based `mse`. Use sensitivity of some global feature (e.g. float model logits) to parameters change to minimize global effect of quantization. $SMSE\_MIN\_MAX\_FOR\_W = argmin_{min, max}|(W-Q_{min, max}(W))^2*Sensitivity(W)|$. So we try to keep `important` parameters unchanged, while quantizing `unimportant` parameters more aggressively.


The md format words doesn't show properly.

Ahhh. That's sad. Thank you!

@mhs4670go
Seems to be fixed.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

mhs4670go

LGTM. Thank you!

stamalakhov self-assigned this Mar 11, 2026

stamalakhov changed the title ~~[quantization] Introduce smse~~ [quantization][draft] Introduce smse Mar 11, 2026

[quantization] Introduce smse

3b39c68

This PR adds `smse` option to GPTQ to improve accuracy. TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov force-pushed the smse branch from cf82a23 to 3b39c68 Compare March 11, 2026 16:04

stamalakhov changed the title ~~[quantization][draft] Introduce smse~~ [quantization] Introduce smse Mar 13, 2026

stamalakhov marked this pull request as ready for review March 13, 2026 10:46

stamalakhov requested a review from mhs4670go March 13, 2026 10:47

stamalakhov requested a review from a team March 13, 2026 11:37

mhs4670go reviewed Mar 16, 2026

View reviewed changes

[quantization] Review suggestions

860bd1d

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov requested a review from mhs4670go March 16, 2026 07:49

mhs4670go reviewed Mar 16, 2026

View reviewed changes

[quantization] Fix codereview suggestion

ed1a8cd

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

mhs4670go approved these changes Mar 16, 2026

View reviewed changes

mhs4670go merged commit 098c57d into Samsung:main Mar 16, 2026
7 checks passed

stamalakhov deleted the smse branch March 16, 2026 09:20

Conversation

stamalakhov commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stamalakhov commented Mar 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Empirical Fisher Information

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhs4670go left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stamalakhov commented Mar 11, 2026 •

edited

Loading