Skip to content

Commit 6aec742

Browse files
fix conflict
2 parents bd1adbd + 81f28fd commit 6aec742

File tree

89 files changed

+1399
-15115
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+1399
-15115
lines changed

.github/workflows/test_inc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
- name: Setup Python
3333
uses: actions/setup-python@v5
3434
with:
35-
python-version: 3.9
35+
python-version: "3.10"
3636

3737
- name: Install dependencies
3838
run: |

.github/workflows/test_openvino_slow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ jobs:
5858

5959
- name: Install dependencies
6060
run: |
61-
pip install --upgrade pip uv
61+
python -m pip install --upgrade pip uv
6262
uv pip install .[openvino,tests,diffusers] transformers[testing]
6363
6464
- if: ${{ matrix.transformers-version != 'latest' && matrix.transformers-version != 'main' }}

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ For more details, please refer to the [documentation](https://intel.github.io/in
141141

142142
## Running the examples
143143

144-
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) and [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
144+
Check out the [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
145145

146146
Do not forget to install requirements for every example:
147147

docs/source/openvino/export.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Check out the help for more options:
3232
```text
3333
usage: optimum-cli export openvino [-h] -m MODEL [--task TASK] [--framework {pt}] [--trust-remote-code]
3434
[--weight-format {fp32,fp16,int8,int4,mxfp4,nf4,cb4}]
35-
[--quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}]
35+
[--quant-mode {int8,f8e4m3,f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}]
3636
[--library {transformers,diffusers,timm,sentence_transformers,open_clip}]
3737
[--cache_dir CACHE_DIR] [--pad-token-id PAD_TOKEN_ID] [--ratio RATIO] [--sym]
3838
[--group-size GROUP_SIZE] [--backup-precision {none,int8_sym,int8_asym}]
@@ -69,7 +69,7 @@ Optional arguments:
6969
--weight-format {fp32,fp16,int8,int4,mxfp4,nf4,cb4}
7070
The weight format of the exported model. Option 'cb4' represents a codebook with 16
7171
fixed fp8 values in E4M3 format.
72-
--quant-mode {int8,f8e4m3,f8e5m2,nf4_f8e4m3,nf4_f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}
72+
--quant-mode {int8,f8e4m3,f8e5m2,cb4_f8e4m3,int4_f8e4m3,int4_f8e5m2}
7373
Quantization precision mode. This is used for applying full model quantization including
7474
activations.
7575
--library {transformers,diffusers,timm,sentence_transformers,open_clip}
@@ -283,5 +283,5 @@ Once the model is exported, you can now [load your OpenVINO model](inference) by
283283

284284
## Troubleshooting
285285

286-
Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`.
286+
Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`.
287287
The supported transformers versions compatible with each optimum-intel release are listed on the [Github releases page](https://github.com/huggingface/optimum-intel/releases/).

docs/source/openvino/models.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ Here is the list of the supported architectures :
148148
- XLM
149149
- XLM-Roberta
150150
- XVERSE
151+
- Zamba2
151152

152153
## [Diffusers](https://huggingface.co/docs/diffusers/index)
153154
- Stable Diffusion

docs/source/openvino/optimization.mdx

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
144144
</td>
145145
<td style="text-align: center; vertical-align: middle;">
146146
<button onclick="
147-
navigator.clipboard.writeText('optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
147+
navigator.clipboard.writeText('optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
148148
let m=document.getElementById('copyMsg');
149149
m.style.display='block';
150150
clearTimeout(window._copyTimeout);
@@ -155,7 +155,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
155155
</td>
156156
<td style="text-align: center; vertical-align: middle;">
157157
<button onclick="
158-
navigator.clipboard.writeText('OVModelForCausalLM.from_pretrained(\'TinyLlama/TinyLlama-1.1B-Chat-v1.0\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
158+
navigator.clipboard.writeText('OVModelForCausalLM.from_pretrained(\'TinyLlama/TinyLlama-1.1B-Chat-v1.0\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
159159
let m=document.getElementById('copyMsg');
160160
m.style.display='block';
161161
clearTimeout(window._copyTimeout);
@@ -416,7 +416,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
416416
</td>
417417
<td style="text-align: center; vertical-align: middle;">
418418
<button onclick="
419-
navigator.clipboard.writeText('optimum-cli export openvino -m microsoft/codebert-base --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
419+
navigator.clipboard.writeText('optimum-cli export openvino -m microsoft/codebert-base --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
420420
let m=document.getElementById('copyMsg');
421421
m.style.display='block';
422422
clearTimeout(window._copyTimeout);
@@ -427,7 +427,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
427427
</td>
428428
<td style="text-align: center; vertical-align: middle;">
429429
<button onclick="
430-
navigator.clipboard.writeText('OVModelForFeatureExtraction.from_pretrained(\'microsoft/codebert-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
430+
navigator.clipboard.writeText('OVModelForFeatureExtraction.from_pretrained(\'microsoft/codebert-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
431431
let m=document.getElementById('copyMsg');
432432
m.style.display='block';
433433
clearTimeout(window._copyTimeout);
@@ -509,7 +509,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
509509
</td>
510510
<td style="text-align: center; vertical-align: middle;">
511511
<button onclick="
512-
navigator.clipboard.writeText('optimum-cli export openvino --library sentence_transformers -m sentence-transformers/all-mpnet-base-v2 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
512+
navigator.clipboard.writeText('optimum-cli export openvino --library sentence_transformers -m sentence-transformers/all-mpnet-base-v2 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
513513
let m=document.getElementById('copyMsg');
514514
m.style.display='block';
515515
clearTimeout(window._copyTimeout);
@@ -520,7 +520,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
520520
</td>
521521
<td style="text-align: center; vertical-align: middle;">
522522
<button onclick="
523-
navigator.clipboard.writeText('OVSentenceTransformer.from_pretrained(\'sentence-transformers/all-mpnet-base-v2\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
523+
navigator.clipboard.writeText('OVSentenceTransformer.from_pretrained(\'sentence-transformers/all-mpnet-base-v2\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
524524
let m=document.getElementById('copyMsg');
525525
m.style.display='block';
526526
clearTimeout(window._copyTimeout);
@@ -602,7 +602,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
602602
</td>
603603
<td style="text-align: center; vertical-align: middle;">
604604
<button onclick="
605-
navigator.clipboard.writeText('optimum-cli export openvino -m FacebookAI/roberta-base --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir');
605+
navigator.clipboard.writeText('optimum-cli export openvino -m FacebookAI/roberta-base --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir');
606606
let m=document.getElementById('copyMsg');
607607
m.style.display='block';
608608
clearTimeout(window._copyTimeout);
@@ -613,7 +613,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
613613
</td>
614614
<td style="text-align: center; vertical-align: middle;">
615615
<button onclick="
616-
navigator.clipboard.writeText('OVModelForMaskedLM.from_pretrained(\'FacebookAI/roberta-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
616+
navigator.clipboard.writeText('OVModelForMaskedLM.from_pretrained(\'FacebookAI/roberta-base\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\'))).save_pretrained(\'save_dir\')');
617617
let m=document.getElementById('copyMsg');
618618
m.style.display='block';
619619
clearTimeout(window._copyTimeout);
@@ -665,13 +665,13 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
665665
</td>
666666
<td style="text-align: center; vertical-align: middle;">
667667
<button
668-
onclick="navigator.clipboard.writeText('optimum-cli export openvino -m google-t5/t5-small --quant-mode nf4_f8e4m3 --dataset wikitext2 --smooth-quant-alpha -1 ./save_dir')">
668+
onclick="navigator.clipboard.writeText('optimum-cli export openvino -m google-t5/t5-small --quant-mode cb4_f8e4m3 --dataset wikitext2 --smooth-quant-alpha -1 ./save_dir')">
669669
670670
</button>
671671
</td>
672672
<td style="text-align: center; vertical-align: middle;">
673673
<button
674-
onclick="navigator.clipboard.writeText('OVModelForSeq2SeqLM.from_pretrained(\'google-t5/t5-small\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\', smooth_quant_alpha=-1))).save_pretrained(\'save_dir\')')">
674+
onclick="navigator.clipboard.writeText('OVModelForSeq2SeqLM.from_pretrained(\'google-t5/t5-small\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'wikitext2\', smooth_quant_alpha=-1))).save_pretrained(\'save_dir\')')">
675675
676676
</button>
677677
</td>
@@ -748,7 +748,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
748748
</td>
749749
<td style="text-align: center; vertical-align: middle;">
750750
<button onclick="
751-
navigator.clipboard.writeText('optimum-cli export openvino -m openai/clip-vit-base-patch16 --quant-mode nf4_f8e4m3 --dataset conceptual_captions ./save_dir');
751+
navigator.clipboard.writeText('optimum-cli export openvino -m openai/clip-vit-base-patch16 --quant-mode cb4_f8e4m3 --dataset conceptual_captions ./save_dir');
752752
let m=document.getElementById('copyMsg');
753753
m.style.display='block';
754754
clearTimeout(window._copyTimeout);
@@ -759,7 +759,7 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
759759
</td>
760760
<td style="text-align: center; vertical-align: middle;">
761761
<button onclick="
762-
navigator.clipboard.writeText('OVModelForZeroShotImageClassification.from_pretrained(\'openai/clip-vit-base-patch16\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'nf4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'conceptual_captions\'))).save_pretrained(\'save_dir\')');
762+
navigator.clipboard.writeText('OVModelForZeroShotImageClassification.from_pretrained(\'openai/clip-vit-base-patch16\', quantization_config=OVMixedQuantizationConfig(OVWeightQuantizationConfig(bits=4, dtype=\'cb4\'), OVQuantizationConfig(dtype=\'f8e4m3\', dataset=\'conceptual_captions\'))).save_pretrained(\'save_dir\')');
763763
let m=document.getElementById('copyMsg');
764764
m.style.display='block';
765765
clearTimeout(window._copyTimeout);
@@ -1022,8 +1022,8 @@ With this, encoder, decoder and decoder-with-past models of the Whisper pipeline
10221022
Traditional optimization methods like post-training 8-bit quantization do not work well for Stable Diffusion (SD) models and can lead to poor generation results. On the other hand, weight compression does not improve performance significantly when applied to Stable Diffusion models, as the size of activations is comparable to weights.
10231023
The U-Net component takes up most of the overall execution time of the pipeline. Thus, optimizing just this one component can bring substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial accuracy degradation.
10241024
Therefore, the proposal is to apply quantization in *hybrid mode* for the U-Net model and weight-only quantization for the rest of the pipeline components :
1025-
* U-Net : quantization applied on both the weights and activations
1026-
* The text encoder, VAE encoder / decoder : quantization applied on the weights
1025+
* U-Net : quantization applied on both the weights and activations
1026+
* The text encoder, VAE encoder / decoder : quantization applied on the weights
10271027

10281028
The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
10291029

@@ -1057,7 +1057,7 @@ When running this kind of optimization through Python API, `OVMixedQuantizationC
10571057
model = OVModelForCausalLM.from_pretrained(
10581058
'TinyLlama/TinyLlama-1.1B-Chat-v1.0',
10591059
quantization_config=OVMixedQuantizationConfig(
1060-
weight_quantization_config=OVWeightQuantizationConfig(bits=4, dtype='nf4'),
1060+
weight_quantization_config=OVWeightQuantizationConfig(bits=4, dtype='cb4'),
10611061
full_quantization_config=OVQuantizationConfig(dtype='f8e4m3', dataset='wikitext2')
10621062
)
10631063
)
@@ -1066,7 +1066,7 @@ model = OVModelForCausalLM.from_pretrained(
10661066
To apply mixed quantization through CLI, the `--quant-mode` argument should be used. For example:
10671067

10681068
```bash
1069-
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir
1069+
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode cb4_f8e4m3 --dataset wikitext2 ./save_dir
10701070
```
10711071

10721072
Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.

examples/neural_compressor/config/distillation.yml

Lines changed: 0 additions & 33 deletions
This file was deleted.

examples/neural_compressor/config/prune.yml

Lines changed: 0 additions & 31 deletions
This file was deleted.

examples/neural_compressor/config/prune_pattern_lock.yml

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)