-
Notifications
You must be signed in to change notification settings - Fork 275
LUT-based compressed data type #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/onnx_backend.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/onnx_backend.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/weight_lowering.py
Outdated
Show resolved
Hide resolved
nikita-savelyevv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
Outdated
Show resolved
Hide resolved
| ranks = [advanced_parameters.lora_adapter_rank, advanced_parameters.lora_correction_params.adapter_rank] | ||
|
|
||
| if advanced_parameters.codebook_params.codebook is not None: | ||
| codebook = Tensor(advanced_parameters.codebook_params.codebook).as_numpy_tensor().data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what is it need Tensor(advanced_parameters.codebook_params.codebook).as_numpy_tensor().data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure that
if (codebook[:-1] >= codebook[1:]).any():
works correctly for all data types.
| return WeightCompressionConfig( | ||
| mode=self._mode, | ||
| group_size=self._group_size, | ||
| codebook_values=get_cb4_quantiles() | ||
| if self._mode == CompressWeightsMode.CB4_F8E4M3 | ||
| else Tensor(self._advanced_parameters.codebook_params.codebook), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return WeightCompressionConfig( | |
| mode=self._mode, | |
| group_size=self._group_size, | |
| codebook_values=get_cb4_quantiles() | |
| if self._mode == CompressWeightsMode.CB4_F8E4M3 | |
| else Tensor(self._advanced_parameters.codebook_params.codebook), | |
| ) | |
| codebook_values = get_cb4_quantiles() if self._mode == CompressWeightsMode.CB4_F8E4M3 else Tensor(self._advanced_parameters.codebook_params.codebook) | |
| return WeightCompressionConfig( | |
| mode=self._mode, | |
| group_size=self._group_size, | |
| codebook_values=codebook_values, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| scale = fns.max(fns.abs(weight), axis=reduction_axes, keepdims=True) | ||
| if config.mode in [CompressWeightsMode.E2M1, CompressWeightsMode.CODEBOOK, CompressWeightsMode.CB4_F8E4M3]: | ||
| max_val = 6.0 if config.mode == CompressWeightsMode.E2M1 else max(np.abs(config.get_numpy_codebook())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use nncf.Tensor in common code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| ) | ||
|
|
||
| if center_of_quantiles is None: | ||
| quantiles = np.array(quantiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't combine operations with backend specific types and nncf.Tensor in common code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/nncf/quantization/algorithms/weight_compression/constants.py
Outdated
Show resolved
Hide resolved
src/nncf/quantization/algorithms/weight_compression/constants.py
Outdated
Show resolved
Hide resolved
2) Changed custom codebook to smaller in codebook example.
Changes
Implementation of compression to fixed codebook (LUT) values .
Reason for changes
CVS-167084
Related tickets
CVS-167084
Tests
tests/openvino/native/quantization/test_weights_compression.py
https://github.com/openvinotoolkit/nncf/actions/runs/16024264575