Post release 2.18.0 actions (#3654)

AlexanderDokuchaev · Nikita Savelyev · daniil-lyakhov · web-flow · commit 74caa6b87be1 · 2025-09-08T15:40:54.000+03:00
### Changes Bump OV version to 2025.3 Update docs Cherry-pick from release branch: - #3637 - #3634 - #3633 - #3629 ### Reason Changes from release branch ### Related tickets 172462 ### Tests https://github.com/openvinotoolkit/nncf/actions/runs/17545330049 https://github.com/openvinotoolkit/nncf/actions/runs/17545486898 --------- Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com> Co-authored-by: Daniil Lyakhov <daniil.lyakhov@intel.com> Co-authored-by: Liubov Talamanova <liubov.talamanova@intel.com> Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com> Co-authored-by: Andrey Churkin <andrey.churkin@intel.com> Co-authored-by: andreyanufr <andrey.anufriev@intel.com> Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>
diff --git a/ReleaseNotes.md b/ReleaseNotes.md
@@ -1,5 +1,44 @@
 # Release Notes
 
+## New in Release 2.18.0
+
+Post-training Quantization:
+
+- Features:
+  - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitrary user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types.
+  - (OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1.
+  - Added `group_size_fallback_mode` parameter for advanced weight compression. It controls how nodes that do not support the default group size are handled. By default (`IGNORE`), such nodes are skipped. With `ERROR`, an exception is raised if the channel size is not divisible by the group size, while `ADJUST` attempts to modify the group size so it becomes valid.
+  - (TorchFX) Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). Users now can quantize their models in [ExecuTorch](https://github.com/pytorch/executorch) for the XNNPACK and CoreML backends via the nncf `quantize_pt2e` employing smooth quant, bias correction algorithms and a wide range of statistic collectors.
+  - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format.
+- Improvements:
+  - Support of weight compression for models with the Rotary Positional Embedding block.
+  - Support of weight compression for models with stateful self-attention blocks.
+- Tutorials:
+  - [Post-Training Optimization of Qwen-Agent](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-mcp/llm-agent-mcp.ipynb)
+  - [Post-Training Optimization of FLUX.1 Kontext Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/flux.1-kontext/flux.1-kontext.ipynb)
+  - [Post-Training Optimization of Qwen3 Embedding Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/qwen3-embedding/qwen3-embedding.ipynb)
+  - [Post-Training Optimization of GLM-4.1V-9B-Thinking Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/glm4.1-v-thinking/glm4.1-v-thinking.ipynb)
+
+Compression-aware training:
+
+- Features:
+  - (PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve [superior final accuracy](https://github.com/openvinotoolkit/nncf/pull/3577).
+- Improvements:
+  - (PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While [the results on Wikitext](/examples/llm_compression/torch/distillation_qat_with_lora/README.md#results-on-wikitext) are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B).
+- Tutorials:
+  - (TorchFX) Added [example](examples/llm_compression/torch_fx/tiny_llama/README.md) for compression of TinnyLama-1.1B.
+  - Updated [example](examples/llm_compression/onnx/tiny_llama/main.py) to meet NPU implementation.
+  - Implemented fast evaluation and improved output in [example](examples/llm_compression/torch/downstream_qat_with_nls/README.md).
+
+Deprecations/Removals:
+
+- Removed examples that used `create_compressed_model` API.
+
+Requirements:
+
+- Updated PyTorch (2.8.0) and Torchvision (0.23.0) versions.
+- Set require `setuptools>=77` to build package.
+
 ## New in Release 2.17.0
 
 Post-training Quantization:
diff --git a/constraints.txt b/constraints.txt
@@ -1,5 +1,5 @@
 # Openvino
-openvino==2025.2.0
+openvino==2025.3.0
 
 # Pytorch
 torch==2.8.0
diff --git a/docs/Installation.md b/docs/Installation.md
@@ -49,7 +49,8 @@ as well as the supported versions of Python:
 
 | NNCF      | OpenVINO   | PyTorch  | ONNX     | TensorFlow | Python |
 |-----------|------------|----------|----------|------------|--------|
-| `develop` | `2025.2.0` | `2.8.0`  | `1.17.0` | `2.15.1`   | `3.10` |
+| `develop` | `2025.3.0` | `2.8.0`  | `1.17.0` | `2.15.1`   | `3.10` |
+| `2.18.0`  | `2025.3.0` | `2.8.0`  | `1.17.0` | `2.15.1`   | `3.10` |
 | `2.17.0`  | `2025.2.0` | `2.7.1`  | `1.17.0` | `2.15.1`   | `3.10` |
 | `2.16.0`  | `2025.1.0` | `2.6.0`  | `1.17.0` | `2.15.1`   | `3.10` |
 | `2.15.0`  | `2025.0.0` | `2.5.1`  | `1.17.0` | `2.15.1`   | `3.10` |
diff --git a/examples/llm_compression/onnx/tiny_llama/requirements.txt b/examples/llm_compression/onnx/tiny_llama/requirements.txt
@@ -1,5 +1,5 @@
 transformers
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel[openvino]
 git+https://github.com/onnx/onnx.git@c25eebcf51b781dbfcc75a9c8bdf5dd1781367fe # onnx-1.19.0.dev
 onnxruntime==1.21.1
diff --git a/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt b/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt
@@ -1,6 +1,6 @@
 torch==2.8.0
 transformers>=4.48.0
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel[openvino]
 onnx==1.17.0
 onnxruntime==1.21.1
diff --git a/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt b/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt
@@ -1,4 +1,4 @@
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel[openvino]>=1.22.0
 transformers>=4.48.0
 onnx==1.17.0
diff --git a/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt b/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt
@@ -1,5 +1,5 @@
 datasets
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel[openvino]>=1.22.0
 transformers>=4.48.0
 onnx==1.17.0
diff --git a/examples/llm_compression/openvino/tiny_llama/requirements.txt b/examples/llm_compression/openvino/tiny_llama/requirements.txt
@@ -1,5 +1,5 @@
 transformers>=4.48.0
 datasets==2.14.7
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel[openvino]>=1.22.0
 onnx==1.17.0
diff --git a/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt b/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt
@@ -1,7 +1,7 @@
 datasets
-whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.2.0.0#subdirectory=tools/who_what_benchmark
+whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.3.0.0#subdirectory=tools/who_what_benchmark
 numpy>=1.23.5,<2
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel>=1.22.0
 transformers>=4.48.0
 onnx==1.17.0
diff --git a/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt b/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt
@@ -1,7 +1,7 @@
 torch==2.8.0
 datasets==3.0.1
 numpy>=1.23.5,<2
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel>=1.22.0
 transformers>=4.48.0
 onnx==1.17.0
diff --git a/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt b/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt
@@ -1,7 +1,7 @@
 tensorboard==2.13.0
 torch==2.8.0
 numpy>=1.23.5,<2
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel>=1.22.0
 transformers>=4.48.0
 lm_eval==0.4.8
diff --git a/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt b/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt
@@ -1,7 +1,7 @@
 tensorboard==2.13.0
 torch==2.8.0
 numpy>=1.23.5,<2
-openvino==2025.2.0
+openvino==2025.3.0
 optimum-intel>=1.22.0
 transformers>=4.48.0
 lm_eval==0.4.8
diff --git a/examples/llm_compression/torch_fx/tiny_llama/requirements.txt b/examples/llm_compression/torch_fx/tiny_llama/requirements.txt
@@ -1,6 +1,6 @@
 transformers==4.52.1
 datasets==2.14.7
-openvino==2025.2.0
+openvino==2025.3.0
 optimum==1.24.0
 torch==2.8.0
 torchvision==0.23.0
diff --git a/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt b/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt
@@ -4,5 +4,5 @@ scikit-learn
 fastdownload
 onnx==1.17.0
 onnxruntime==1.21.1
-openvino==2025.2.0
+openvino==2025.3.0
 numpy<2
diff --git a/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt
@@ -1,4 +1,4 @@
 ultralytics==8.3.22
 onnx==1.17.0
 onnxruntime==1.21.1
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt
@@ -1,3 +1,3 @@
 anomalib==0.6.0
-openvino==2025.2.0
+openvino==2025.3.0
 numpy<2
diff --git a/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt b/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt
@@ -2,4 +2,4 @@ torchvision
 tqdm
 scikit-learn
 fastdownload
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/post_training_quantization/openvino/yolov8/requirements.txt b/examples/post_training_quantization/openvino/yolov8/requirements.txt
@@ -1,3 +1,3 @@
 ultralytics==8.3.22
 onnx==1.17.0
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt
@@ -1,3 +1,3 @@
 ultralytics==8.3.22
 onnx==1.17.0
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt b/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt
@@ -1,4 +1,4 @@
 tensorflow==2.15.1
 tensorflow-datasets
 tqdm
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt b/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt
@@ -1,5 +1,5 @@
 fastdownload==0.0.7
-openvino==2025.2.0
+openvino==2025.3.0
 scikit-learn
 torch==2.8.0
 torchvision==0.23.0
diff --git a/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt b/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt
@@ -1,6 +1,6 @@
 fastdownload==0.0.7
 onnx==1.17.0
-openvino==2025.2.0
+openvino==2025.3.0
 pycocotools==2.0.7
 torch==2.8.0
 torchmetrics==1.0.1
diff --git a/examples/post_training_quantization/torch_fx/resnet18/requirements.txt b/examples/post_training_quantization/torch_fx/resnet18/requirements.txt
@@ -1,4 +1,4 @@
 fastdownload==0.0.7
-openvino==2025.2.0
+openvino==2025.3.0
 torch==2.8.0
 torchvision==0.23.0
diff --git a/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt b/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt
@@ -2,4 +2,4 @@ tensorflow~=2.12.0; python_version < '3.9'
 tensorflow~=2.15.1; python_version >= '3.9'
 tensorflow-datasets
 tqdm
-openvino==2025.2.0
+openvino==2025.3.0
diff --git a/examples/quantization_aware_training/torch/resnet18/requirements.txt b/examples/quantization_aware_training/torch/resnet18/requirements.txt
@@ -1,4 +1,4 @@
 fastdownload==0.0.7
-openvino==2025.2.0
+openvino==2025.3.0
 torch==2.8.0
 torchvision==0.23.0
diff --git a/src/nncf/openvino/cpu_info.py b/src/nncf/openvino/cpu_info.py
@@ -24,14 +24,21 @@ def _get_cpu_name() -> str:
     return ov.Core().get_property("CPU", ov.properties.device.full_name)
 
 
+def _get_cpu_architecture() -> str:
+    """
+    :return: The architecture of the CPU.
+    """
+    return ov.Core().get_property("CPU", ov.properties.device.architecture)
+
+
 def is_arm_cpu() -> bool:
     """
     Checks whether current CPU is an ARM CPU or not.
     :return: True if current CPU is an ARM CPU, False otherwise.
     """
     global _IS_ARM_CPU
     if _IS_ARM_CPU is None:
-        _IS_ARM_CPU = "arm" in _get_cpu_name().lower()
+        _IS_ARM_CPU = "arm" in _get_cpu_architecture().lower()
     return _IS_ARM_CPU
 
 
diff --git a/tests/post_training/requirements.txt b/tests/post_training/requirements.txt
@@ -19,5 +19,5 @@ tensorflow-io==0.32.0
 timm==0.9.2
 accelerate==1.9.0
 transformers==4.52.1
-whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.2.0.0#subdirectory=tools/who_what_benchmark
-datasets==3.1.0
+whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.3.0.0#subdirectory=tools/who_what_benchmark
+datasets==3.6.0
diff --git a/tests/tensorflow/requirements.txt b/tests/tensorflow/requirements.txt
@@ -1,13 +1,15 @@
 -c ../../constraints.txt
 PyYAML
-tensorflow-metadata==1.13.0
 pytest
 pytest-cov
 pytest-mock
 pytest-dependency
 pytest-xdist
 pydot
-tensorflow_hub
+tensorflow
+tensorflow-hub==0.16.1
+tensorflow-metadata==1.13.0
+tf_keras==2.15.1
 virtualenv
 
 openvino
diff --git a/tests/torch/sparsity/movement/test_model_saving.py b/tests/torch/sparsity/movement/test_model_saving.py
@@ -150,7 +150,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B
         [
             ParamDesc(
                 nncf_weight_ratio=0.14,
-                ov_weight_ratio=0.11,
+                ov_weight_ratio=0.17,
                 recipe=BertRunRecipe().model_config_(
                     max_position_embeddings=2,
                     intermediate_size=4,
@@ -163,7 +163,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B
             ),
             ParamDesc(
                 nncf_weight_ratio=0.1,
-                ov_weight_ratio=0.08,
+                ov_weight_ratio=0.12,
                 recipe=Wav2Vec2RunRecipe().model_config_(
                     intermediate_size=4,
                     num_labels=1,
@@ -211,12 +211,12 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B
             ),
             ParamDesc(
                 nncf_weight_ratio=0.63,
-                ov_weight_ratio=0.38,
+                ov_weight_ratio=0.29,
                 recipe=MobileBertRunRecipe().model_config_(),
             ),
             ParamDesc(
                 nncf_weight_ratio=0.42,
-                ov_weight_ratio=0.33,
+                ov_weight_ratio=0.25,
                 recipe=MobileBertRunRecipe()
                 .model_config_()
                 .algo_config_(
@@ -230,7 +230,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B
             ),
             ParamDesc(
                 nncf_weight_ratio=0.15,
-                ov_weight_ratio=0.12,
+                ov_weight_ratio=0.18,
                 recipe=ClipVisionRunRecipe().model_config_(),
             ),
         ],