Skip to content

Conversation

@AlexanderDokuchaev
Copy link
Collaborator

Reason for changes

Upcoming release

Related tickets

172462

@AlexanderDokuchaev AlexanderDokuchaev requested a review from a team as a code owner August 20, 2025 12:49
@AlexanderDokuchaev AlexanderDokuchaev changed the base branch from develop to release_v2180 August 20, 2025 12:49
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 20, 2025
ReleaseNotes.md Outdated
- ...
- Features:
- Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
- Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Fixes:
- ...
- Improvements:
- Support of weight compression for models with the Rotary Positional Embedding block.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- ...
- Improvements:
- Support of weight compression for models with the Rotary Positional Embedding block.
- Support of weight compression for models with stateful self-attention blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- General:
- ...
- Features:
- (PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve [superior final accuracy](https://github.com/openvinotoolkit/nncf/pull/3577).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Fixes:
- ...
- Improvements:
- (PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While [the results on Wikitext](/examples/llm_compression/torch/distillation_qat_with_lora/README.md#results-on-wikitext) are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Features:
- Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
- Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
- (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrey-churkin andrey-churkin self-requested a review September 2, 2025 08:05
ReleaseNotes.md Outdated
- Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
- Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
- (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
- (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- (TorchFX) Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
- (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
- (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.
- (OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReleaseNotes.md Outdated

Deprecations/Removals:

- Removed examples that used `create_compressed_model`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexanderDokuchaev , Removed examples that used create_compressed_model => Removed examples that used create_compressed_model API.

@MaximProshin
Copy link
Collaborator

@AlexanderDokuchaev , as all updates have been added now, please remove empty chapters.

@AlexanderDokuchaev AlexanderDokuchaev merged commit 1e6b694 into openvinotoolkit:release_v2180 Sep 4, 2025
9 checks passed
AlexanderDokuchaev added a commit to AlexanderDokuchaev/nncf that referenced this pull request Sep 8, 2025
### Reason for changes

Upcoming release

### Related tickets

172462

---------

Co-authored-by: Nikita Savelyev <[email protected]>
Co-authored-by: Daniil Lyakhov <[email protected]>
Co-authored-by: Liubov Talamanova <[email protected]>
Co-authored-by: Lyalyushkin Nikolay <[email protected]>
Co-authored-by: Andrey Churkin <[email protected]>
Co-authored-by: andreyanufr <[email protected]>
Co-authored-by: Alexander Suslov <[email protected]>
AlexanderDokuchaev added a commit that referenced this pull request Sep 8, 2025
### Changes

Bump OV version to 2025.3
Update docs

Cherry-pick from release branch: 

- #3637 
- #3634
- #3633
- #3629

### Reason

Changes from release branch 

### Related tickets

172462

### Tests 

https://github.com/openvinotoolkit/nncf/actions/runs/17545330049
https://github.com/openvinotoolkit/nncf/actions/runs/17545486898

---------

Co-authored-by: Nikita Savelyev <[email protected]>
Co-authored-by: Daniil Lyakhov <[email protected]>
Co-authored-by: Liubov Talamanova <[email protected]>
Co-authored-by: Lyalyushkin Nikolay <[email protected]>
Co-authored-by: Andrey Churkin <[email protected]>
Co-authored-by: andreyanufr <[email protected]>
Co-authored-by: Alexander Suslov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants