huggingface
diff --git a/‎.github/workflows/test_inc.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/test_inc.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/test_openvino_slow.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/test_openvino_slow.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/openvino/export.mdx‎
Lines changed: 3 additions & 3 deletions b/‎docs/source/openvino/export.mdx‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/openvino/models.mdx‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/openvino/models.mdx‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/openvino/optimization.mdx‎
Lines changed: 16 additions & 16 deletions b/‎docs/source/openvino/optimization.mdx‎
Lines changed: 16 additions & 16 deletions
diff --git a/‎examples/neural_compressor/config/distillation.yml‎
Lines changed: 0 additions & 33 deletions b/‎examples/neural_compressor/config/distillation.yml‎
Lines changed: 0 additions & 33 deletions
diff --git a/‎examples/neural_compressor/config/prune.yml‎
Lines changed: 0 additions & 31 deletions b/‎examples/neural_compressor/config/prune.yml‎
Lines changed: 0 additions & 31 deletions
diff --git a/‎examples/neural_compressor/config/prune_pattern_lock.yml‎
Lines changed: 0 additions & 28 deletions b/‎examples/neural_compressor/config/prune_pattern_lock.yml‎
Lines changed: 0 additions & 28 deletions
@@ -32,7 +32,7 @@ jobs:
       - name: Setup Python
         uses: actions/setup-python@v5
         with:
-          python-version: 3.9
+          python-version: "3.10"
 
       - name: Install dependencies
         run: |
 
@@ -58,7 +58,7 @@ jobs:
 
       - name: Install dependencies
         run: |
-          pip install --upgrade pip uv
+          python -m pip install --upgrade pip uv
           uv pip install .[openvino,tests,diffusers] transformers[testing]
 
       - if: ${{ matrix.transformers-version != 'latest' && matrix.transformers-version != 'main' }}
 
@@ -141,7 +141,7 @@ For more details, please refer to the [documentation](https://intel.github.io/in
 
 ## Running the examples
 
-Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) and [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
+Check out the [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
 
 Do not forget to install requirements for every example:
 
 
@@ -32,7 +32,7 @@ Check out the help for more options:
 ```text
 usage: optimum-cli export openvino [-h] -m MODEL [--task TASK] [--framework {pt}] [--trust-remote-code]
                                    [--weight-format {fp32,fp16,int8,int4,mxfp4,nf4,cb4}]
-                                   [--quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}]
+                                   [--quant-mode {int8,f8e4m3,f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}]
                                    [--library {transformers,diffusers,timm,sentence_transformers,open_clip}]
                                    [--cache_dir CACHE_DIR] [--pad-token-id PAD_TOKEN_ID] [--ratio RATIO] [--sym]
                                    [--group-size GROUP_SIZE] [--backup-precision {none,int8_sym,int8_asym}]
@@ -69,7 +69,7 @@ Optional arguments:
   --weight-format {fp32,fp16,int8,int4,mxfp4,nf4,cb4}
                         The weight format of the exported model. Option 'cb4' represents a codebook with 16
                         fixed fp8 values in E4M3 format.
-  --quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}
+  --quant-mode {int8,f8e4m3,f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}
                         Quantization precision mode. This is used for applying full model quantization including
                         activations.
   --library {transformers,diffusers,timm,sentence_transformers,open_clip}
@@ -283,5 +283,5 @@ Once the model is exported, you can now [load your OpenVINO model](inference) by
 
 ## Troubleshooting
 
-Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`. 
+Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`.
 The supported transformers versions compatible with each optimum-intel release are listed on the [Github releases page](https://github.com/huggingface/optimum-intel/releases/).
@@ -148,6 +148,7 @@ Here is the list of the supported architectures :
 - XLM
 - XLM-Roberta
 - XVERSE
+- Zamba2
 
 ## [Diffusers](https://huggingface.co/docs/diffusers/index)
 - Stable Diffusion
 
@@ -144,7 +144,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
+                    navigator.clipboard.writeText('optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -155,7 +155,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('OVModelForCausalLM.from_pretrained(\'TinyLlama/TinyLlama-1.1B-Chat-v1.0\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
+                    navigator.clipboard.writeText('OVModelForCausalLM.from_pretrained(\'TinyLlama/TinyLlama-1.1B-Chat-v1.0\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -416,7 +416,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('optimum-cli export openvino -m microsoft/codebert-base --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
+                    navigator.clipboard.writeText('optimum-cli export openvino -m microsoft/codebert-base --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -427,7 +427,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('OVModelForFeatureExtraction.from_pretrained(\'microsoft/codebert-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
+                    navigator.clipboard.writeText('OVModelForFeatureExtraction.from_pretrained(\'microsoft/codebert-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -509,7 +509,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('optimum-cli export openvino --library sentence_transformers -m sentence-transformers/all-mpnet-base-v2 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
+                    navigator.clipboard.writeText('optimum-cli export openvino --library sentence_transformers -m sentence-transformers/all-mpnet-base-v2 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -520,7 +520,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('OVSentenceTransformer.from_pretrained(\'sentence-transformers/all-mpnet-base-v2\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
+                    navigator.clipboard.writeText('OVSentenceTransformer.from_pretrained(\'sentence-transformers/all-mpnet-base-v2\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -602,7 +602,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('optimum-cli export openvino -m FacebookAI/roberta-base --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
+                    navigator.clipboard.writeText('optimum-cli export openvino -m FacebookAI/roberta-base --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -613,7 +613,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('OVModelForMaskedLM.from_pretrained(\'FacebookAI/roberta-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
+                    navigator.clipboard.writeText('OVModelForMaskedLM.from_pretrained(\'FacebookAI/roberta-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -665,13 +665,13 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button
-                    onclick="navigator.clipboard.writeText('optimum-cli export openvino -m google-t5/t5-small --quant-mode nf4_f8e4m3 --dataset wikitext2 --smooth-quant-alpha -1 ./save_dir')">
+                    onclick="navigator.clipboard.writeText('optimum-cli export openvino -m google-t5/t5-small --quant-mode cb4_f8e4m3 --dataset wikitext2 --smooth-quant-alpha -1 ./save_dir')">
                     ✅
                 </button>
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button
-                    onclick="navigator.clipboard.writeText('OVModelForSeq2SeqLM.from_pretrained(\'google-t5/t5-small\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\', smooth_quant_alpha=-1))).save_pretrained(\'save_dir\')')">
+                    onclick="navigator.clipboard.writeText('OVModelForSeq2SeqLM.from_pretrained(\'google-t5/t5-small\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\', smooth_quant_alpha=-1))).save_pretrained(\'save_dir\')')">
                     ✅
                 </button>
             </td>
@@ -748,7 +748,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('optimum-cli export openvino -m openai/clip-vit-base-patch16 --quant-mode nf4_f8e4m3 --dataset conceptual_captions ./save_dir');
+                    navigator.clipboard.writeText('optimum-cli export openvino -m openai/clip-vit-base-patch16 --quant-mode cb4_f8e4m3 --dataset conceptual_captions ./save_dir');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -759,7 +759,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
             </td>
             <td style="text-align: center; vertical-align: middle;">
                 <button onclick="
-                    navigator.clipboard.writeText('OVModelForZeroShotImageClassification.from_pretrained(\'openai/clip-vit-base-patch16\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'conceptual_captions\'))).save_pretrained(\'save_dir\')');
+                    navigator.clipboard.writeText('OVModelForZeroShotImageClassification.from_pretrained(\'openai/clip-vit-base-patch16\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'conceptual_captions\'))).save_pretrained(\'save_dir\')');
                     let m=document.getElementById('copyMsg');
                     m.style.display='block';
                     clearTimeout(window._copyTimeout);
@@ -1022,8 +1022,8 @@ With this, encoder, decoder and decoder-with-past models of the Whisper pipeline
 Traditional optimization methods like post-training 8-bit quantization do not work well for Stable Diffusion (SD) models and can lead to poor generation results. On the other hand, weight compression does not improve performance significantly when applied to Stable Diffusion models, as the size of activations is comparable to weights.
 The U-Net component takes up most of the overall execution time of the pipeline. Thus, optimizing just this one component can bring substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial accuracy degradation.
 Therefore, the proposal is to apply quantization in *hybrid mode* for the U-Net model and weight-only quantization for the rest of the pipeline components :
-* U-Net : quantization applied on both the weights and activations 
-* The text encoder, VAE encoder / decoder : quantization applied on the weights 
+* U-Net : quantization applied on both the weights and activations
+* The text encoder, VAE encoder / decoder : quantization applied on the weights
 
 The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
 
@@ -1057,7 +1057,7 @@ When running this kind of optimization through Python API, `OVMixedQuantizationC
 model = OVModelForCausalLM.from_pretrained(
     'TinyLlama/TinyLlama-1.1B-Chat-v1.0',
     quantization_config=OVMixedQuantizationConfig(
-        weight_quantization_config=OVWeightQuantizationConfig(bits=4, dtype='nf4'),
+        weight_quantization_config=OVWeightQuantizationConfig(bits=4, dtype='cb4'),
         full_quantization_config=OVQuantizationConfig(dtype='f8e4m3', dataset='wikitext2')
     )
 )
@@ -1066,7 +1066,7 @@ model = OVModelForCausalLM.from_pretrained(
 To apply mixed quantization through CLI, the `--quant-mode` argument should be used. For example:
 
 ```bash
-optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir
+optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir
 ```
 
 Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.