feat (quant/mx): Added midmax scale rounding option to MX types by nickfraser · Pull Request #1409 · Xilinx/brevitas

nickfraser · 2025-11-06T12:56:31Z

Add "midmax" scaling for MX datatypes. Midmax is a rounding mode for the shared scale in MX datatypes. Also, plugs MidMax scaling into the LLM example. The standard mode (as referenced in the OCP MX datatype spec) is "floor" and does the following:

$$ po2\_shared\_scale = \lfloor log_2 (\| x \|_\infty) \rfloor $$

Midmax replaces this floor operation with a special rounding mode that reduces the rounding error in the maximum value in $x$, the the cost of potentially increasing the rounding error in the smallest values in $x$.

Rerunning the experiments from the Post-Training Model Expansion paper, we get the following results:

model	spinquant	expansion_step	scale_round_func	float_ppl	quant_ppl	ARC-C	ARC-E	HS	WG	PIQA	all_acc
meta-llama/Llama-3.2-1B	False	0	floor	8.938	11.694	0.289	0.597	0.418	0.559	0.693	0.511
meta-llama/Llama-3.2-1B	False	0	midmax	8.938	11.574	0.289	0.610	0.425	0.556	0.712	0.519
meta-llama/Llama-3.2-1B	False	7	floor	8.938	11.452	0.283	0.611	0.426	0.575	0.701	0.519
meta-llama/Llama-3.2-1B	False	7	midmax	8.938	11.241	0.272	0.619	0.430	0.574	0.701	0.519
meta-llama/Llama-3.2-1B	True	0	floor	8.938	11.518	0.305	0.628	0.422	0.569	0.709	0.527
meta-llama/Llama-3.2-1B	True	0	midmax	8.938	11.552	0.293	0.595	0.424	0.562	0.707	0.516
meta-llama/Llama-3.2-1B	True	7	floor	8.938	11.359	0.298	0.590	0.433	0.556	0.697	0.515
meta-llama/Llama-3.2-1B	True	7	midmax	8.938	11.294	0.303	0.606	0.430	0.559	0.699	0.519

Note that beyond the OCP MX v1 spec datatypes, MidMax is not thoroughly tested and should tested further when looking beyond these types.

Also, while adding this feature, I took the opportunity to remove some duplicated code between MXWeightMixin and MXActMixin into a parent class (MXMixin).

src/brevitas_examples/common/generative/quant_blocks.py

…tion support

nickfraser

1 comment, otherwise ready for review.

src/brevitas/quant/experimental/mx_quant_ocp.py

Giuseppe5

One small change, then it can be merged.

src/brevitas/core/scaling/float_scaling.py

feat (ex/llm): Added midmax rounding to LLM example

e07c69d

nickfraser marked this pull request as draft November 6, 2025 12:56

nickfraser added the do not merge This should not be merged just yet label Nov 6, 2025

nickfraser self-assigned this Nov 6, 2025

pablomlago self-requested a review November 6, 2025 15:56

nickfraser commented Nov 10, 2025

View reviewed changes

src/brevitas_examples/common/generative/quant_blocks.py Outdated Show resolved Hide resolved

nickfraser removed the do not merge This should not be merged just yet label Nov 10, 2025

nickfraser marked this pull request as ready for review November 10, 2025 14:21

nickfraser added 3 commits November 18, 2025 12:15

feat (midmax): Move MidMax scale round to core

cf8a893

tests (mx/midmax): Added mixmax tests

94b6fa6

style

232df4f

nickfraser changed the title ~~feat (ex/llm): Added midmax rounding to LLM example~~ feat (quant/mx): Added midmax scale rounding option to MX types Nov 19, 2025

nickfraser added 4 commits November 19, 2025 14:04

fix (ex/llm): revert changes to quant_blocks.py

cd86536

feat (quant/mx): Extended midmax to any pattern of reserved values

763bf54

Fix style

4f0e111

Fix (utils): Added comment about extending to runtime dynamic calcula…

2fb222b

…tion support

nickfraser requested a review from Giuseppe5 November 21, 2025 15:03

fix (quant): Remove unused args from midmax calculation

aea2b58

nickfraser commented Dec 2, 2025

View reviewed changes

src/brevitas/quant/experimental/mx_quant_ocp.py Outdated Show resolved Hide resolved

nickfraser added 2 commits December 2, 2025 12:24

fix (core/quant): update midmax to be jit-compatible

75d6ecd

fix (core/quant): fix import

70cdae7

nickfraser requested review from Giuseppe5 and removed request for Giuseppe5 and pablomlago December 2, 2025 12:31

Giuseppe5 approved these changes Dec 2, 2025

View reviewed changes

src/brevitas/core/scaling/float_scaling.py Outdated Show resolved Hide resolved

fix (core/function): move midmax bias to brevitas.core.function.ops

b29b314

nickfraser merged commit 587494a into Xilinx:dev Dec 2, 2025
29 checks passed

nickfraser deleted the feat/midmax branch December 2, 2025 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat (quant/mx): Added midmax scale rounding option to MX types#1409

feat (quant/mx): Added midmax scale rounding option to MX types#1409
nickfraser merged 12 commits intoXilinx:devfrom
nickfraser:feat/midmax

nickfraser commented Nov 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

nickfraser left a comment

Uh oh!

Uh oh!

Giuseppe5 left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nickfraser commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nickfraser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Giuseppe5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nickfraser commented Nov 6, 2025 •

edited

Loading