-
Notifications
You must be signed in to change notification settings - Fork 259
[OV] Optimized compression to MXFP4 data type #3550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[OV] Optimized compression to MXFP4 data type #3550
Conversation
8a30046
to
66c0366
Compare
Co-authored-by: Lyalyushkin Nikolay <[email protected]>
This reverts commit e4d47ab.
…ering.py Co-authored-by: andreyanufr <[email protected]>
if mode == CompressWeightsMode.MXFP4: | ||
# If in-between two quantiles, round to the nearest even quantile. | ||
shifted_indexes = fns.clip(indexes + 1, 0, quantiles.size - 1) | ||
dist_left = fns.abs(norm_weight - quantiles[indexes]) | ||
dist_right = fns.abs(norm_weight - quantiles[shifted_indexes]) | ||
choose_right = (dist_right < dist_left) | ((dist_left == dist_right) & ((shifted_indexes + 1) % 2 == 0)) | ||
indexes = fns.where(choose_right, shifted_indexes, indexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about adding small unit test for such rounding?
maybe also add some corner cases: -0.0, 0.0, more than max, less then min
@pytest.mark.parametrize(
"center_idx,expected_idx",
[
(0, 1),
(1, 1),
(2, 3),
(6, 7),
(7, 7),
(-1, -2),
],
)
def test_exact_quantile_center_values(center_idx, expected_idx, description):
"""Test that values exactly at quantile centers round to nearest even index."""
center_val = CENTER_OF_MXFP4_QUANTILES[center_idx]
expected_q = MXFP4_QUANTILES[expected_idx]
norm_weight = Tensor(np.array([center_val], dtype=np.float32))
result = _calculate_float_quantized_weight(norm_weight, CompressWeightsMode.MXFP4)
# Verify the result matches expected quantile
assert result.data[0] == expected_q, (
f"{description}: Expected {expected_q}, got {result.data[0]} "
f"for center value {center_val}"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Also, tests/openvino/optimized_functions/test_compression_functions.py::test_quantization_alignment
fails without the highlighted if
logic.
For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval. | ||
TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved | ||
For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NF4 quantization quantizes the weights to 16 levels on [-1, 1] interval. | |
TODO(nikita-savelyevv): add support for MXFP4 and MXFP8_E4M3 once ticket 164851 is resolved | |
For MXFP4 quantization quantizes the weights to 16 levels on [-6, 6] interval. | |
NF4 format uses 16 levels in [-1, 1] range, while MXFP4 uses 16 levels in [-6, 6]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
assert isinstance(res_nncf, Tensor) | ||
if ( | ||
self.backend() != TensorBackend.tf | ||
): # native Tensorflow operaors do not guarantee to return a tensor on an initial device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
): # native Tensorflow operaors do not guarantee to return a tensor on an initial device. | |
): # native Tensorflow operators do not guarantee to return a tensor on an initial device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Changes
Added optimized compression to MXFP4 data type for OpenVINO backend.
Reason for changes
Improving user experience.
Related tickets
164717
Tests
Extended optimized compression tests.