Prepare release v1.21 (#2371)

AKloniecki · web-flow · commit dbe641d2d7f5 · 2026-03-12T10:52:27.000+01:00
* Workaround timm dependencies for readme examples.

Current synapse torch version is 2.10.0a0, which is lower than 2.10.0, which
is declared as required by shipped torchvision version. This in turn causes
torch reinstallation when pip dependency resolver is triggered, trying to install
timm, which depends on torchvision, in order to satisfy torchvision dependency.

Signed-off-by: Artur Kloniecki &lt;arturx.kloniecki@intel.com&gt;

* Remove deprecated text-generation README tests.

Signed-off-by: Artur Kloniecki &lt;arturx.kloniecki@intel.com&gt;

---------

Signed-off-by: Artur Kloniecki &lt;arturx.kloniecki@intel.com&gt;
diff --git a/examples/image-classification/README.md b/examples/image-classification/README.md
@@ -23,7 +23,7 @@ This directory contains a script that showcases how to fine-tune any model suppo
 
 First, you should install the requirements:
 ```bash
-pip install -r requirements.txt
+pip install -r requirements.txt --no-deps
 ```
 
 ## Single-HPU training
diff --git a/examples/image-classification/requirements.txt b/examples/image-classification/requirements.txt
@@ -4,3 +4,22 @@ datasets>=4.0.0
 evaluate == 0.4.3
 scikit-learn == 1.5.2
 timm>=0.9.16
+
+# dependencies for evaluate and datasets
+dill<0.4.1,>=0.3.0
+multiprocess<0.70.19
+xxhash
+httpx
+pyarrow
+
+# dependencies for httpx
+anyio
+httpcore
+
+# dependencies for httpcore
+h11
+
+# dependencies for timm
+huggingface-hub
+pyyaml
+safetensors
diff --git a/examples/pytorch-image-models/README.md b/examples/pytorch-image-models/README.md
@@ -23,7 +23,7 @@ This directory contains the scripts that showcase how to inference/fine-tune the
 First, you should install the requirements:
 
 ```bash
-pip install -r requirements.txt
+pip install -r requirements.txt --no-deps
 ```
 
 ## Training
diff --git a/examples/pytorch-image-models/requirements.txt b/examples/pytorch-image-models/requirements.txt
@@ -1,2 +1,16 @@
 timm
 datasets>=4.0.0
+
+# dependencies for datasets
+dill<0.4.1,>=0.3.0
+multiprocess<0.70.19
+xxhash
+httpx
+pyarrow
+
+# dependencies for httpx
+anyio
+httpcore
+
+# dependencies for httpcore
+h11
diff --git a/examples/table-detection/README.md b/examples/table-detection/README.md
@@ -22,7 +22,7 @@ This folder contains an example for using the [Table Transformer](https://huggin
 
 First, you should install the requirements:
 ```bash
-pip install -r requirements.txt
+pip install -r requirements.txt --no-deps
 ```
 
 ## Single HPU Inference
diff --git a/examples/table-detection/requirements.txt b/examples/table-detection/requirements.txt
@@ -1 +1,4 @@
 timm
+huggingface-hub
+pyyaml
+safetensors
diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
@@ -365,47 +365,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python .
 --flash_attention_causal_mask
 ```
 
-Here is an example to measure the tensor quantization statistics on Llama3-405B with 8 cards:
-> Please note that Llama3-405B requires minimum 16 cards Gaudi2 and 8 cards Gaudi3.
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure_include_outputs.json python ../gaudi_spawn.py \
---use_deepspeed --world_size 8 run_lm_eval.py \
--o acc_llama3_405b_bs1_quant.txt \
---model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
---use_hpu_graphs \
---use_kv_cache \
---trim_logits \
---batch_size 1 \
---bf16 \
---reuse_cache \
---use_flash_attention \
---flash_attention_recompute \
---flash_attention_causal_mask \
---trust_remote_code
-
-python quantization_tools/postprocess_measurements.py -m hqt_output
-```
-
-Here is an example to quantize the model based on previous measurements for Llama3-405B with 8 cards:
-> Please note that Llama3-405B requires minimum 16 cards Gaudi2 and 8 cards Gaudi3.
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
---use_deepspeed --world_size 8 run_generation.py \
---model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
---use_hpu_graphs \
---use_kv_cache \
---limit_hpu_graphs \
---max_input_tokens 2048 \
---max_new_tokens 2048 \
---batch_size 2 \
---bf16 \
---reuse_cache \
---trim_logits \
---use_flash_attention \
---flash_attention_recompute \
---flash_attention_causal_mask
-```
-
 Here is an example to measure the tensor quantization statistics on Llama3-8b with 1 card:
 
 ```bash
@@ -437,35 +396,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python r
 --reuse_cache
 ```
 
-Here is an example to measure the tensor quantization statistics on gemma with 1 card:
-
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py \
---model_name_or_path google/gemma-7b \
---use_hpu_graphs \
---use_kv_cache \
---max_new_tokens 100 \
---batch_size 1 \
---reuse_cache \
---bf16 \
---sdp_on_bf16
-
-python quantization_tools/postprocess_measurements.py -m hqt_output
-```
-
-Here is an example to quantize the model based on previous measurements for gemma with 1 card:
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_gemma.json python run_generation.py \
---model_name_or_path google/gemma-7b \
---use_hpu_graphs \
---use_kv_cache \
---max_new_tokens 100 \
---batch_size 1 \
---reuse_cache \
---bf16 \
---sdp_on_bf16
-```
-
 Here is an example for running DeepSeek-R1 FP8 dynamic quantization without INC on 8-cards.
 ```bash
 PT_HPU_LAZY_MODE=1 python3  ../gaudi_spawn.py  --world_size 8 \
@@ -574,28 +504,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python .
 > [!NOTE]
 > For multi-card usage, the number of cards loaded and used needs to be kept consistent with that when saving.
 
-### Loading FP8 Checkpoints from Hugging Face
-You can load pre-quantized FP8 models using the `--load_quantized_model_with_inc` argument. The `model_name_or_path` should be a model name from [Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) or a path to FP8 Checkpoints saved in Hugging Face format.
-
-Below is an example of how to load `neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8` on two cards.
-```bash
-PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
---use_deepspeed --world_size 2 run_lm_eval.py \
--o acc_load_fp8_model.txt \
---model_name_or_path neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
---use_hpu_graphs \
---use_kv_cache \
---trim_logits \
---batch_size 1 \
---bf16 \
---use_flash_attention \
---flash_attention_recompute \
---attn_softmax_bf16 \
---bucket_size=128 \
---bucket_internal \
---load_quantized_model_with_inc
-```
-
 ### Loading 4 Bit Checkpoints from Hugging Face
 
 You can load pre-quantized 4bit models with the argument `--load_quantized_model_with_inc`.
diff --git a/examples/visual-question-answering/README.md b/examples/visual-question-answering/README.md
@@ -34,7 +34,7 @@ The `run_openclip_vqa.py` can be used to run zero shot image classification with
 The requirements for `run_openclip_vqa.py` can be installed with `openclip_requirements.txt` as follows:
 
 ```bash
-pip install -r openclip_requirements.txt
+pip install -r openclip_requirements.txt --no-deps
 ```
 
 By default, the script runs the sample outlined in [BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 notebook](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/blob/main/biomed_clip_example.ipynb). One can also can also run other OpenCLIP models by specifying model, classifier labels and image URL(s) like so:
diff --git a/examples/visual-question-answering/openclip_requirements.txt b/examples/visual-question-answering/openclip_requirements.txt
@@ -1,3 +1,12 @@
 open_clip_torch==2.23.0
 matplotlib
 
+# dependencies for open_clip_torch
+ftfy
+timm
+
+# dependencies for matplotlib
+contourpy
+cycler
+fonttools
+kiwisolver

-Original file line number
+Diff line change
@@ @@ -1 +1,4 @@ @@
 timm
 +huggingface-hub
 +pyyaml
 +safetensors