-
Notifications
You must be signed in to change notification settings - Fork 278
[release_v2180] Release notes #3629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release_v2180] Release notes #3629
Conversation
ReleaseNotes.md
Outdated
| - ... | ||
| - Features: | ||
| - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size. | ||
| - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Fixes: | ||
| - ... | ||
| - Improvements: | ||
| - Support of weight compression for models with the Rotary Positional Embedding block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - ... | ||
| - Improvements: | ||
| - Support of weight compression for models with the Rotary Positional Embedding block. | ||
| - Support of weight compression for models with stateful self-attention blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - General: | ||
| - ... | ||
| - Features: | ||
| - (PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve [superior final accuracy](https://github.com/openvinotoolkit/nncf/pull/3577). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Fixes: | ||
| - ... | ||
| - Improvements: | ||
| - (PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While [the results on Wikitext](/examples/llm_compression/torch/distillation_qat_with_lora/README.md#results-on-wikitext) are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Features: | ||
| - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size. | ||
| - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). | ||
| - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReleaseNotes.md
Outdated
| - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size. | ||
| - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). | ||
| - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format. | ||
| - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - (TorchFX) Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). | ||
| - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format. | ||
| - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types. | ||
| - (OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReleaseNotes.md
Outdated
|
|
||
| Deprecations/Removals: | ||
|
|
||
| - Removed examples that used `create_compressed_model` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexanderDokuchaev , Removed examples that used create_compressed_model => Removed examples that used create_compressed_model API.
|
@AlexanderDokuchaev , as all updates have been added now, please remove empty chapters. |
1e6b694
into
openvinotoolkit:release_v2180
### Reason for changes Upcoming release ### Related tickets 172462 --------- Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Daniil Lyakhov <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Andrey Churkin <[email protected]> Co-authored-by: andreyanufr <[email protected]> Co-authored-by: Alexander Suslov <[email protected]>
### Changes Bump OV version to 2025.3 Update docs Cherry-pick from release branch: - #3637 - #3634 - #3633 - #3629 ### Reason Changes from release branch ### Related tickets 172462 ### Tests https://github.com/openvinotoolkit/nncf/actions/runs/17545330049 https://github.com/openvinotoolkit/nncf/actions/runs/17545486898 --------- Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Daniil Lyakhov <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Andrey Churkin <[email protected]> Co-authored-by: andreyanufr <[email protected]> Co-authored-by: Alexander Suslov <[email protected]>
Reason for changes
Upcoming release
Related tickets
172462