[release_v2180] Release notes #3629

AlexanderDokuchaev · 2025-08-20T12:49:17Z

Reason for changes

Upcoming release

Related tickets

172462

PRs: openvinotoolkit#3556 and openvinotoolkit#3583

daniil-lyakhov · 2025-08-26T10:01:55Z

ReleaseNotes.md

+  - ...
+- Features:
+  - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
+  - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). 


#3487
#3593

daniil-lyakhov · 2025-08-26T10:02:13Z

ReleaseNotes.md

+- Fixes:
+  - ...
+- Improvements:
+  - Support of weight compression for models with the Rotary Positional Embedding block.


daniil-lyakhov · 2025-08-26T10:02:29Z

ReleaseNotes.md

+  - ...
+- Improvements:
+  - Support of weight compression for models with the Rotary Positional Embedding block.
+  - Support of weight compression for models with stateful self-attention blocks.


MaximProshin · 2025-08-26T11:30:11Z

@l-bat , please also help with new notebooks. I remember these ones at least:
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-agent-mcp
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/flux.1-kontext
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/qwen3-embedding
https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/glm4.1-v-thinking

ljaljushkin · 2025-08-26T14:31:46Z

ReleaseNotes.md

+- General:
+  - ...
+- Features:
+  - (PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve [superior final accuracy](https://github.com/openvinotoolkit/nncf/pull/3577).


ljaljushkin · 2025-08-26T14:31:50Z

ReleaseNotes.md

+- Fixes:
+  - ...
+- Improvements:
+  - (PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While [the results on Wikitext](/examples/llm_compression/torch/distillation_qat_with_lora/README.md#results-on-wikitext) are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B).


andrey-churkin · 2025-09-02T08:03:49Z

ReleaseNotes.md

+- Features:
+  - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
+  - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
+  - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.


#3575, #3571

andreyanufr · 2025-09-02T12:12:38Z

ReleaseNotes.md

+  - Introduced `group_size_fallback_mode` advanced weight compression parameter. This specifies how to handle nodes that do not support a default group size value. By default it is set to `GroupSizeFallbackMode.IGNORE`. This corresponds to skipping nodes that cannot be compressed with the given group size.
+  - Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
+  - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
+  - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.


alexsu52 · 2025-09-02T20:14:08Z

ReleaseNotes.md

+  - (TorchFX) Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization).
+  - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
+  - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitraty user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.
+  - (OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1.


ReleaseNotes.md

MaximProshin · 2025-09-03T06:21:15Z

ReleaseNotes.md

+
+Deprecations/Removals:
+
+- Removed examples that used `create_compressed_model`


@AlexanderDokuchaev , Removed examples that used create_compressed_model => Removed examples that used create_compressed_model API.

MaximProshin · 2025-09-03T06:24:26Z

@AlexanderDokuchaev , as all updates have been added now, please remove empty chapters.

### Reason for changes Upcoming release ### Related tickets 172462 --------- Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Daniil Lyakhov <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Andrey Churkin <[email protected]> Co-authored-by: andreyanufr <[email protected]> Co-authored-by: Alexander Suslov <[email protected]>

### Changes Bump OV version to 2025.3 Update docs Cherry-pick from release branch: - #3637 - #3634 - #3633 - #3629 ### Reason Changes from release branch ### Related tickets 172462 ### Tests https://github.com/openvinotoolkit/nncf/actions/runs/17545330049 https://github.com/openvinotoolkit/nncf/actions/runs/17545486898 --------- Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Daniil Lyakhov <[email protected]> Co-authored-by: Liubov Talamanova <[email protected]> Co-authored-by: Lyalyushkin Nikolay <[email protected]> Co-authored-by: Andrey Churkin <[email protected]> Co-authored-by: andreyanufr <[email protected]> Co-authored-by: Alexander Suslov <[email protected]>

release notes template

cb93782

AlexanderDokuchaev requested a review from a team as a code owner August 20, 2025 12:49

AlexanderDokuchaev changed the base branch from develop to release_v2180 August 20, 2025 12:49

github-actions bot added the documentation Improvements or additions to documentation label Aug 20, 2025

AlexanderDokuchaev requested review from MaximProshin, alexsu52, andrey-churkin, andreyanufr, anzr299, daniil-lyakhov, l-bat, ljaljushkin and nikita-savelyevv August 20, 2025 12:50

AlexanderDokuchaev added the release target label Aug 20, 2025

Update ReleaseNotes.md

587f924

PRs: openvinotoolkit#3556 and openvinotoolkit#3583

nikita-savelyevv approved these changes Aug 25, 2025

View reviewed changes

Update ReleaseNotes.md

f8835ef

daniil-lyakhov approved these changes Aug 26, 2025

View reviewed changes

Update ReleaseNotes.md

7b46810

anzr299 approved these changes Aug 26, 2025

View reviewed changes

Add list of OV notebooks with NNCF to release notes

e3cdea7

l-bat approved these changes Aug 26, 2025

View reviewed changes

Update ReleaseNotes.md

7946416

ljaljushkin approved these changes Aug 26, 2025

View reviewed changes

Update ReleaseNotes.md

ccd87de

andrey-churkin reviewed Sep 2, 2025

View reviewed changes

andrey-churkin self-requested a review September 2, 2025 08:05

andrey-churkin approved these changes Sep 2, 2025

View reviewed changes

Update ReleaseNotes.md

13b3872

andreyanufr reviewed Sep 2, 2025

View reviewed changes

AlexanderDokuchaev and others added 2 commits September 2, 2025 23:06

Update ReleaseNotes.md

5104f19

Update ReleaseNotes.md

dad32fb

alexsu52 reviewed Sep 2, 2025

View reviewed changes

MaximProshin reviewed Sep 3, 2025

View reviewed changes

daniil-lyakhov and others added 3 commits September 3, 2025 09:55

Update ReleaseNotes.md

10b8848

Update ReleaseNotes.md

c7aa7ca

Extend group_size_fallback_mode argument description

df8d66e

andreyanufr approved these changes Sep 3, 2025

View reviewed changes

AlexanderDokuchaev added 2 commits September 3, 2025 19:05

up

ee39d87

sort

233cf6b

MaximProshin approved these changes Sep 3, 2025

View reviewed changes

Update ReleaseNotes.md

9e4de4c

alexsu52 approved these changes Sep 4, 2025

View reviewed changes

AlexanderDokuchaev merged commit 1e6b694 into openvinotoolkit:release_v2180 Sep 4, 2025
9 checks passed

AlexanderDokuchaev mentioned this pull request Sep 8, 2025

Post release 2.18.0 actions #3654

Merged


		Deprecations/Removals:

		- Removed examples that used `create_compressed_model`

[release_v2180] Release notes #3629

[release_v2180] Release notes #3629

Uh oh!

Conversation

AlexanderDokuchaev commented Aug 20, 2025

Reason for changes

Related tickets

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaximProshin commented Aug 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaximProshin commented Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants