[OV] Optimized compression to MXFP4 data type #3550

nikita-savelyevv · 2025-06-18T15:23:28Z

Changes

Added optimized compression to MXFP4 data type for OpenVINO backend.

Model	Memory Before (MiB)	Memory After (MiB)	Time Before (sec)	Time After (sec)
llama-3.2-1b bf16	2778.55	2548.37 (-8.29%)	66.33	24.95 (-62.40%)
llama-3.2-1b fp16	3434.61	2963.03 (-13.73%)	62.12	24.85 (-59.98%)
llama-3.2-1b fp32	2041.79	1576.43 (-22.81%)	62.72	25.85 (-58.77%)

phi4-mini bf16	6384.81	5725.66 (-10.33%)	197.36	66.22 (-66.44%)
phi4-mini fp16	8863.53	8375.75 (-5.51%)	195.85	66.93 (-65.82%)
phi4-mini fp32	4406.82	3897.91 (-11.54%)	195.25	68.83 (-64.76%)

llama-3.1-8b bf16	7297.72	6096.06 (-16.46%)	413.86	135.25 (-67.32%)
llama-3.1-8b fp16	7946.93	8311.89 (+4.58%)	413.64	136.05 (-67.11%)
llama-3.1-8b fp32	7310.48	5043.89 (-30.03%)	411.45	140.66 (-65.81%)

Reason for changes

Improving user experience.

Related tickets

164717

Tests

Extended optimized compression tests.

src/nncf/version.py

Co-authored-by: Lyalyushkin Nikolay <[email protected]>

This reverts commit e4d47ab.

…ering.py Co-authored-by: andreyanufr <[email protected]>

ljaljushkin · 2025-10-13T13:41:18Z

src/nncf/quantization/algorithms/weight_compression/weight_lowering.py

+    if mode == CompressWeightsMode.MXFP4:
+        # If in-between two quantiles, round to the nearest even quantile.
+        shifted_indexes = fns.clip(indexes + 1, 0, quantiles.size - 1)
+        dist_left = fns.abs(norm_weight - quantiles[indexes])
+        dist_right = fns.abs(norm_weight - quantiles[shifted_indexes])
+        choose_right = (dist_right < dist_left) | ((dist_left == dist_right) & ((shifted_indexes + 1) % 2 == 0))
+        indexes = fns.where(choose_right, shifted_indexes, indexes)


what about adding small unit test for such rounding?
maybe also add some corner cases: -0.0, 0.0, more than max, less then min

@pytest.mark.parametrize( "center_idx,expected_idx", [ (0, 1), (1, 1), (2, 3), (6, 7), (7, 7), (-1, -2), ], ) def test_exact_quantile_center_values(center_idx, expected_idx, description): """Test that values exactly at quantile centers round to nearest even index.""" center_val = CENTER_OF_MXFP4_QUANTILES[center_idx] expected_q = MXFP4_QUANTILES[expected_idx] norm_weight = Tensor(np.array([center_val], dtype=np.float32)) result = _calculate_float_quantized_weight(norm_weight, CompressWeightsMode.MXFP4) # Verify the result matches expected quantile assert result.data[0] == expected_q, ( f"{description}: Expected {expected_q}, got {result.data[0]} " f"for center value {center_val}" )

Added. Also, tests/openvino/optimized_functions/test_compression_functions.py::test_quantization_alignment fails without the highlighted if logic.

ljaljushkin · 2025-10-13T13:43:06Z

src/nncf/openvino/optimized_functions/functions.py

    For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval.
-    TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved
+    For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval.


Suggested change

For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval.

TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved

For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval.

NF4 format uses 16 levels in [-1, 1] range, while MXFP4 uses 16 levels in [-6, 6].

ljaljushkin · 2025-10-13T13:46:07Z

tests/cross_fw/test_templates/template_test_nncf_tensor.py

+        assert isinstance(res_nncf, Tensor)
+        if (
+            self.backend() != TensorBackend.tf
+        ):  # native Tensorflow operaors do not guarantee to return a tensor on an initial device.


Suggested change

): # native Tensorflow operaors do not guarantee to return a tensor on an initial device.

): # native Tensorflow operators do not guarantee to return a tensor on an initial device.

WIP

a42f1eb

nikita-savelyevv changed the title ~~Optimized openvino compression to f4e2m1 data type~~ Optimized openvino weights compression to f4e2m1 data type Jun 18, 2025

github-actions bot added NNCF OpenVINO Pull requests that updates NNCF OpenVINO NNCF PTQ Pull requests that updates NNCF PTQ labels Jun 18, 2025

nikita-savelyevv added 3 commits June 23, 2025 13:30

Merge branch 'develop' into ns/ov-f4e2m1-support

17a3aec

Add round to nearest logic for numpy case

b2e090c

Merge branch 'develop' into ns/ov-f4e2m1-support

66c0366

nikita-savelyevv force-pushed the ns/ov-f4e2m1-support branch from 8a30046 to 66c0366 Compare July 29, 2025 08:08

github-actions bot removed the NNCF PTQ Pull requests that updates NNCF PTQ label Jul 29, 2025

Tweaks

a345984

nikita-savelyevv changed the title ~~Optimized openvino weights compression to f4e2m1 data type~~ [OV] Optimized compression to f4e2m1 data type Jul 29, 2025

Temporarily install OV nightly

6e3ba6e

nikita-savelyevv commented Aug 21, 2025

View reviewed changes

src/nncf/version.py Outdated Show resolved Hide resolved

nikita-savelyevv and others added 10 commits August 21, 2025 10:51

Update src/nncf/version.py

7555794

Merge branch 'develop' into ns/ov-f4e2m1-support

999d54f

[OpenVINO][WC] E5M2 and E4M3 FP8 weights compression support

83770e1

MXFP4/MXFP8_E4M3

8054217

Expand wc docs with a table

3d944de

Codebook is removed from wc docs

0c48792

Type

ac2f05c

Apply suggestions from code review

1e23ecf

Co-authored-by: Lyalyushkin Nikolay <[email protected]>

Typos/pre-commit

33aae33

Fix adjust group size

e4d47ab

nikita-savelyevv added the Code Freeze label Oct 8, 2025

daniil-lyakhov and others added 6 commits October 8, 2025 13:38

Revert "Fix adjust group size"

2aaec38

This reverts commit e4d47ab.

Fail for MX with adjust fallback mode

ab6aa74

Update src/nncf/quantization/algorithms/weight_compression/weight_low…

a25b5c3

…ering.py Co-authored-by: andreyanufr <[email protected]>

Merge branch 'develop' into ns/ov-f4e2m1-support

a64b30c

Merge branch 'dl/FP8' into ns/ov-f4e2m1-support

83d09fc

Merge branch 'develop' into ns/ov-f4e2m1-support

c227bac

nikita-savelyevv added 4 commits October 8, 2025 15:30

Revert nightly installation

c026573

Post-merge fixes

a000f87

Post-merge fixes part 2

831bf25

Increase test weight channel size

344b94b

nikita-savelyevv changed the title ~~[OV] Optimized compression to f4e2m1 data type~~ [OV] Optimized compression to MXFP4 data type Oct 8, 2025

nikita-savelyevv marked this pull request as ready for review October 9, 2025 08:45

nikita-savelyevv requested a review from a team as a code owner October 9, 2025 08:45

nikita-savelyevv requested review from andreyanufr and ljaljushkin October 9, 2025 08:46

ljaljushkin requested changes Oct 13, 2025

View reviewed changes

Address suggested changes

089631a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OV] Optimized compression to MXFP4 data type #3550

[OV] Optimized compression to MXFP4 data type #3550

nikita-savelyevv commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

ljaljushkin Oct 13, 2025

Uh oh!

nikita-savelyevv Oct 13, 2025

Uh oh!

ljaljushkin Oct 13, 2025

Uh oh!

nikita-savelyevv Oct 13, 2025

Uh oh!

ljaljushkin Oct 13, 2025

Uh oh!

nikita-savelyevv Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	): # native Tensorflow operaors do not guarantee to return a tensor on an initial device.
	): # native Tensorflow operators do not guarantee to return a tensor on an initial device.

[OV] Optimized compression to MXFP4 data type #3550

Are you sure you want to change the base?

[OV] Optimized compression to MXFP4 data type #3550

Conversation

nikita-savelyevv commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Tests

Uh oh!

Uh oh!

ljaljushkin Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

ljaljushkin Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikita-savelyevv commented Jun 18, 2025 •

edited

Loading