diff --git a/ReleaseNotes.md b/ReleaseNotes.md index 1316d854263..ef4b6dd879c 100644 --- a/ReleaseNotes.md +++ b/ReleaseNotes.md @@ -1,5 +1,44 @@ # Release Notes +## New in Release 2.18.0 + +Post-training Quantization: + +- Features: + - (OpenVINO) Introduced new compression data types CB4_F8E4M3 and CODEBOOK. CB4_F8E4M3 is a fixed codebook with 16 fp8 values based on NF4 data type values. CODEBOOK is an arbitrary user-selectable codebook that can be used to experiment with different data types. Both data types are used for weight compression. The AWQ and scale estimation algorithms are supported for these data types. + - (OpenVINO) Added support for compressing FP8 (f8e4m3 and f8e5m2) weights to 4-bit data types, which is particularly beneficial for models like DeepSeek-R1. + - Added `group_size_fallback_mode` parameter for advanced weight compression. It controls how nodes that do not support the default group size are handled. By default (`IGNORE`), such nodes are skipped. With `ERROR`, an exception is raised if the channel size is not divisible by the group size, while `ADJUST` attempts to modify the group size so it becomes valid. + - (TorchFX) Added support for external quantizers in the `quantize_pt2e` API, including [XNNPACKQuantizer](https://docs.pytorch.org/executorch/stable/backends-xnnpack.html#quantization) and [CoreMLQuantizer](https://docs.pytorch.org/executorch/stable/backends-coreml.html#quantization). Users now can quantize their models in [ExecuTorch](https://github.com/pytorch/executorch) for the XNNPACK and CoreML backends via the nncf `quantize_pt2e` employing smooth quant, bias correction algorithms and a wide range of statistic collectors. + - (ONNX) Added support for data-aware weight compression in the ONNX backend, including the AWQ and Scale Estimation algorithms. Provided an [example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/onnx/tiny_llama_scale_estimation) demonstrating the data-aware weight compression pipeline using the `TinyLlama/TinyLlama-1.1B-Chat-v1.0` model in ONNX format. +- Improvements: + - Support of weight compression for models with the Rotary Positional Embedding block. + - Support of weight compression for models with stateful self-attention blocks. +- Tutorials: + - [Post-Training Optimization of Qwen-Agent](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-mcp/llm-agent-mcp.ipynb) + - [Post-Training Optimization of FLUX.1 Kontext Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/flux.1-kontext/flux.1-kontext.ipynb) + - [Post-Training Optimization of Qwen3 Embedding Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/qwen3-embedding/qwen3-embedding.ipynb) + - [Post-Training Optimization of GLM-4.1V-9B-Thinking Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/glm4.1-v-thinking/glm4.1-v-thinking.ipynb) + +Compression-aware training: + +- Features: + - (PyTorch) Enhanced initialization for "QAT with absorbable LoRA" using advanced compression methods (AWQ + Scale Estimation). This improvement replaces the previous basic data-free compression approach, enabling QAT to start with a more accurate model baseline and achieve [superior final accuracy](https://github.com/openvinotoolkit/nncf/pull/3577). +- Improvements: + - (PyTorch) Streamlined "QAT with absorbable LoRA" by removing checkpoint selection based on validation set. This change significantly reduces overall tuning time and maximum allocated memory. While [the results on Wikitext](/examples/llm_compression/torch/distillation_qat_with_lora/README.md#results-on-wikitext) are slightly worse, it provides a more efficient and faster tuning pipeline (e.g. reduced from 32 minutes to 25 minutes for SmoLM-1.7B). +- Tutorials: + - (TorchFX) Added [example](examples/llm_compression/torch_fx/tiny_llama/README.md) for compression of TinnyLama-1.1B. + - Updated [example](examples/llm_compression/onnx/tiny_llama/main.py) to meet NPU implementation. + - Implemented fast evaluation and improved output in [example](examples/llm_compression/torch/downstream_qat_with_nls/README.md). + +Deprecations/Removals: + +- Removed examples that used `create_compressed_model` API. + +Requirements: + +- Updated PyTorch (2.8.0) and Torchvision (0.23.0) versions. +- Set require `setuptools>=77` to build package. + ## New in Release 2.17.0 Post-training Quantization: diff --git a/constraints.txt b/constraints.txt index 02c931e7d30..95cb19c2126 100644 --- a/constraints.txt +++ b/constraints.txt @@ -1,5 +1,5 @@ # Openvino -openvino==2025.2.0 +openvino==2025.3.0 # Pytorch torch==2.8.0 diff --git a/docs/Installation.md b/docs/Installation.md index 81e5461e7b1..5dc39e28afa 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -49,7 +49,8 @@ as well as the supported versions of Python: | NNCF | OpenVINO | PyTorch | ONNX | TensorFlow | Python | |-----------|------------|----------|----------|------------|--------| -| `develop` | `2025.2.0` | `2.8.0` | `1.17.0` | `2.15.1` | `3.10` | +| `develop` | `2025.3.0` | `2.8.0` | `1.17.0` | `2.15.1` | `3.10` | +| `2.18.0` | `2025.3.0` | `2.8.0` | `1.17.0` | `2.15.1` | `3.10` | | `2.17.0` | `2025.2.0` | `2.7.1` | `1.17.0` | `2.15.1` | `3.10` | | `2.16.0` | `2025.1.0` | `2.6.0` | `1.17.0` | `2.15.1` | `3.10` | | `2.15.0` | `2025.0.0` | `2.5.1` | `1.17.0` | `2.15.1` | `3.10` | diff --git a/examples/llm_compression/onnx/tiny_llama/requirements.txt b/examples/llm_compression/onnx/tiny_llama/requirements.txt index d4dab94c957..c6950658e5f 100644 --- a/examples/llm_compression/onnx/tiny_llama/requirements.txt +++ b/examples/llm_compression/onnx/tiny_llama/requirements.txt @@ -1,5 +1,5 @@ transformers -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel[openvino] git+https://github.com/onnx/onnx.git@c25eebcf51b781dbfcc75a9c8bdf5dd1781367fe # onnx-1.19.0.dev onnxruntime==1.21.1 diff --git a/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt b/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt index 57a4825619c..3ef0c58ea7d 100644 --- a/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt +++ b/examples/llm_compression/onnx/tiny_llama_scale_estimation/requirements.txt @@ -1,6 +1,6 @@ torch==2.8.0 transformers>=4.48.0 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel[openvino] onnx==1.17.0 onnxruntime==1.21.1 diff --git a/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt b/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt index d604c249e1c..b4aa067275a 100644 --- a/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt +++ b/examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt @@ -1,4 +1,4 @@ -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel[openvino]>=1.22.0 transformers>=4.48.0 onnx==1.17.0 diff --git a/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt b/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt index fcab45b1c74..f2c6dc719d9 100644 --- a/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt +++ b/examples/llm_compression/openvino/smollm2_360m_fp8/requirements.txt @@ -1,5 +1,5 @@ datasets -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel[openvino]>=1.22.0 transformers>=4.48.0 onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama/requirements.txt b/examples/llm_compression/openvino/tiny_llama/requirements.txt index 31ec6786f80..780f15367f8 100644 --- a/examples/llm_compression/openvino/tiny_llama/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama/requirements.txt @@ -1,5 +1,5 @@ transformers>=4.48.0 datasets==2.14.7 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel[openvino]>=1.22.0 onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt b/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt index 9e394560cbc..839e1b3b973 100644 --- a/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama_find_hyperparams/requirements.txt @@ -1,7 +1,7 @@ datasets -whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.2.0.0#subdirectory=tools/who_what_benchmark +whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.3.0.0#subdirectory=tools/who_what_benchmark numpy>=1.23.5,<2 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel>=1.22.0 transformers>=4.48.0 onnx==1.17.0 diff --git a/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt b/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt index 8478ea0bbbb..7458a2bb443 100644 --- a/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt +++ b/examples/llm_compression/openvino/tiny_llama_synthetic_data/requirements.txt @@ -1,7 +1,7 @@ torch==2.8.0 datasets==3.0.1 numpy>=1.23.5,<2 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel>=1.22.0 transformers>=4.48.0 onnx==1.17.0 diff --git a/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt b/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt index ab7be7c3cd1..8bb7dc11764 100644 --- a/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt +++ b/examples/llm_compression/torch/distillation_qat_with_lora/requirements.txt @@ -1,7 +1,7 @@ tensorboard==2.13.0 torch==2.8.0 numpy>=1.23.5,<2 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel>=1.22.0 transformers>=4.48.0 lm_eval==0.4.8 diff --git a/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt b/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt index ab7be7c3cd1..8bb7dc11764 100644 --- a/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt +++ b/examples/llm_compression/torch/downstream_qat_with_nls/requirements.txt @@ -1,7 +1,7 @@ tensorboard==2.13.0 torch==2.8.0 numpy>=1.23.5,<2 -openvino==2025.2.0 +openvino==2025.3.0 optimum-intel>=1.22.0 transformers>=4.48.0 lm_eval==0.4.8 diff --git a/examples/llm_compression/torch_fx/tiny_llama/requirements.txt b/examples/llm_compression/torch_fx/tiny_llama/requirements.txt index 4b9b42e7234..600925868fe 100644 --- a/examples/llm_compression/torch_fx/tiny_llama/requirements.txt +++ b/examples/llm_compression/torch_fx/tiny_llama/requirements.txt @@ -1,6 +1,6 @@ transformers==4.52.1 datasets==2.14.7 -openvino==2025.2.0 +openvino==2025.3.0 optimum==1.24.0 torch==2.8.0 torchvision==0.23.0 diff --git a/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt b/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt index e7d39fabf40..3132a5807cd 100644 --- a/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/onnx/mobilenet_v2/requirements.txt @@ -4,5 +4,5 @@ scikit-learn fastdownload onnx==1.17.0 onnxruntime==1.21.1 -openvino==2025.2.0 +openvino==2025.3.0 numpy<2 diff --git a/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt index ad4ad2c25e0..6f8786f922b 100644 --- a/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control/requirements.txt @@ -1,4 +1,4 @@ ultralytics==8.3.22 onnx==1.17.0 onnxruntime==1.21.1 -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt index 95ed7709e42..d3a062cea19 100644 --- a/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/requirements.txt @@ -1,3 +1,3 @@ anomalib==0.6.0 -openvino==2025.2.0 +openvino==2025.3.0 numpy<2 diff --git a/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt b/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt index fe080270f33..cac45060eff 100644 --- a/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/openvino/mobilenet_v2/requirements.txt @@ -2,4 +2,4 @@ torchvision tqdm scikit-learn fastdownload -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/post_training_quantization/openvino/yolov8/requirements.txt b/examples/post_training_quantization/openvino/yolov8/requirements.txt index ff91b643c33..cc3300bebe6 100644 --- a/examples/post_training_quantization/openvino/yolov8/requirements.txt +++ b/examples/post_training_quantization/openvino/yolov8/requirements.txt @@ -1,3 +1,3 @@ ultralytics==8.3.22 onnx==1.17.0 -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt index ff91b643c33..cc3300bebe6 100644 --- a/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt +++ b/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/requirements.txt @@ -1,3 +1,3 @@ ultralytics==8.3.22 onnx==1.17.0 -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt b/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt index cfba182956b..2d4abf69a6e 100644 --- a/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/tensorflow/mobilenet_v2/requirements.txt @@ -1,4 +1,4 @@ tensorflow==2.15.1 tensorflow-datasets tqdm -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt b/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt index 528ac5b3f4b..bde2bf05b6d 100644 --- a/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt +++ b/examples/post_training_quantization/torch/mobilenet_v2/requirements.txt @@ -1,5 +1,5 @@ fastdownload==0.0.7 -openvino==2025.2.0 +openvino==2025.3.0 scikit-learn torch==2.8.0 torchvision==0.23.0 diff --git a/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt b/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt index e94fd751458..277810df3b4 100644 --- a/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt +++ b/examples/post_training_quantization/torch/ssd300_vgg16/requirements.txt @@ -1,6 +1,6 @@ fastdownload==0.0.7 onnx==1.17.0 -openvino==2025.2.0 +openvino==2025.3.0 pycocotools==2.0.7 torch==2.8.0 torchmetrics==1.0.1 diff --git a/examples/post_training_quantization/torch_fx/resnet18/requirements.txt b/examples/post_training_quantization/torch_fx/resnet18/requirements.txt index 002e939b0fc..7600d884353 100644 --- a/examples/post_training_quantization/torch_fx/resnet18/requirements.txt +++ b/examples/post_training_quantization/torch_fx/resnet18/requirements.txt @@ -1,4 +1,4 @@ fastdownload==0.0.7 -openvino==2025.2.0 +openvino==2025.3.0 torch==2.8.0 torchvision==0.23.0 diff --git a/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt b/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt index 7ea91de8ace..b2191e89dc3 100644 --- a/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt +++ b/examples/quantization_aware_training/tensorflow/mobilenet_v2/requirements.txt @@ -2,4 +2,4 @@ tensorflow~=2.12.0; python_version < '3.9' tensorflow~=2.15.1; python_version >= '3.9' tensorflow-datasets tqdm -openvino==2025.2.0 +openvino==2025.3.0 diff --git a/examples/quantization_aware_training/torch/resnet18/requirements.txt b/examples/quantization_aware_training/torch/resnet18/requirements.txt index 002e939b0fc..7600d884353 100644 --- a/examples/quantization_aware_training/torch/resnet18/requirements.txt +++ b/examples/quantization_aware_training/torch/resnet18/requirements.txt @@ -1,4 +1,4 @@ fastdownload==0.0.7 -openvino==2025.2.0 +openvino==2025.3.0 torch==2.8.0 torchvision==0.23.0 diff --git a/src/nncf/openvino/cpu_info.py b/src/nncf/openvino/cpu_info.py index a5debd39724..58af0614030 100644 --- a/src/nncf/openvino/cpu_info.py +++ b/src/nncf/openvino/cpu_info.py @@ -24,6 +24,13 @@ def _get_cpu_name() -> str: return ov.Core().get_property("CPU", ov.properties.device.full_name) +def _get_cpu_architecture() -> str: + """ + :return: The architecture of the CPU. + """ + return ov.Core().get_property("CPU", ov.properties.device.architecture) + + def is_arm_cpu() -> bool: """ Checks whether current CPU is an ARM CPU or not. @@ -31,7 +38,7 @@ def is_arm_cpu() -> bool: """ global _IS_ARM_CPU if _IS_ARM_CPU is None: - _IS_ARM_CPU = "arm" in _get_cpu_name().lower() + _IS_ARM_CPU = "arm" in _get_cpu_architecture().lower() return _IS_ARM_CPU diff --git a/tests/post_training/requirements.txt b/tests/post_training/requirements.txt index e1fe4227d54..cb93049e1de 100644 --- a/tests/post_training/requirements.txt +++ b/tests/post_training/requirements.txt @@ -19,5 +19,5 @@ tensorflow-io==0.32.0 timm==0.9.2 accelerate==1.9.0 transformers==4.52.1 -whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.2.0.0#subdirectory=tools/who_what_benchmark -datasets==3.1.0 +whowhatbench @ git+https://github.com/openvinotoolkit/openvino.genai@2025.3.0.0#subdirectory=tools/who_what_benchmark +datasets==3.6.0 diff --git a/tests/tensorflow/requirements.txt b/tests/tensorflow/requirements.txt index b63a60eeed2..4c8bbacc401 100644 --- a/tests/tensorflow/requirements.txt +++ b/tests/tensorflow/requirements.txt @@ -1,13 +1,15 @@ -c ../../constraints.txt PyYAML -tensorflow-metadata==1.13.0 pytest pytest-cov pytest-mock pytest-dependency pytest-xdist pydot -tensorflow_hub +tensorflow +tensorflow-hub==0.16.1 +tensorflow-metadata==1.13.0 +tf_keras==2.15.1 virtualenv openvino diff --git a/tests/torch/sparsity/movement/test_model_saving.py b/tests/torch/sparsity/movement/test_model_saving.py index 5af9cf54bd0..d13b5ce7e77 100644 --- a/tests/torch/sparsity/movement/test_model_saving.py +++ b/tests/torch/sparsity/movement/test_model_saving.py @@ -150,7 +150,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B [ ParamDesc( nncf_weight_ratio=0.14, - ov_weight_ratio=0.11, + ov_weight_ratio=0.17, recipe=BertRunRecipe().model_config_( max_position_embeddings=2, intermediate_size=4, @@ -163,7 +163,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B ), ParamDesc( nncf_weight_ratio=0.1, - ov_weight_ratio=0.08, + ov_weight_ratio=0.12, recipe=Wav2Vec2RunRecipe().model_config_( intermediate_size=4, num_labels=1, @@ -211,12 +211,12 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B ), ParamDesc( nncf_weight_ratio=0.63, - ov_weight_ratio=0.38, + ov_weight_ratio=0.29, recipe=MobileBertRunRecipe().model_config_(), ), ParamDesc( nncf_weight_ratio=0.42, - ov_weight_ratio=0.33, + ov_weight_ratio=0.25, recipe=MobileBertRunRecipe() .model_config_() .algo_config_( @@ -230,7 +230,7 @@ def test_same_outputs_in_torch_and_exported_onnx(self, tmp_path: Path, recipe: B ), ParamDesc( nncf_weight_ratio=0.15, - ov_weight_ratio=0.12, + ov_weight_ratio=0.18, recipe=ClipVisionRunRecipe().model_config_(), ), ],