You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -141,7 +141,7 @@ For more details, please refer to the [documentation](https://intel.github.io/in
141
141
142
142
## Running the examples
143
143
144
-
Check out the [`examples`](https://github.com/huggingface/optimum-intel/tree/main/examples) and [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
144
+
Check out the [`notebooks`](https://github.com/huggingface/optimum-intel/tree/main/notebooks) directory to see how 🤗 Optimum Intel can be used to optimize models and accelerate inference.
145
145
146
146
Do not forget to install requirements for every example:
@@ -283,5 +283,5 @@ Once the model is exported, you can now [load your OpenVINO model](inference) by
283
283
284
284
## Troubleshooting
285
285
286
-
Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`.
286
+
Some models do not work with the latest transformers release. You may see an error message with a maximum supported version. To export these models, install a transformers version that supports the model, for example `pip install transformers==4.53.3`.
287
287
The supported transformers versions compatible with each optimum-intel release are listed on the [Github releases page](https://github.com/huggingface/optimum-intel/releases/).
@@ -1022,8 +1022,8 @@ With this, encoder, decoder and decoder-with-past models of the Whisper pipeline
1022
1022
Traditional optimization methods like post-training 8-bit quantization do not work well for Stable Diffusion (SD) models and can lead to poor generation results. On the other hand, weight compression does not improve performance significantly when applied to Stable Diffusion models, as the size of activations is comparable to weights.
1023
1023
The U-Net component takes up most of the overall execution time of the pipeline. Thus, optimizing just this one component can bring substantial benefits in terms of inference speed while keeping acceptable accuracy without fine-tuning. Quantizing the rest of the diffusion pipeline does not significantly improve inference performance but could potentially lead to substantial accuracy degradation.
1024
1024
Therefore, the proposal is to apply quantization in *hybrid mode* for the U-Net model and weight-only quantization for the rest of the pipeline components :
1025
-
* U-Net : quantization applied on both the weights and activations
1026
-
* The text encoder, VAE encoder / decoder : quantization applied on the weights
1025
+
* U-Net : quantization applied on both the weights and activations
1026
+
* The text encoder, VAE encoder / decoder : quantization applied on the weights
1027
1027
1028
1028
The hybrid mode involves the quantization of weights in MatMul and Embedding layers, and activations of other layers, facilitating accuracy preservation post-optimization while reducing the model size.
1029
1029
@@ -1057,7 +1057,7 @@ When running this kind of optimization through Python API, `OVMixedQuantizationC
0 commit comments