Skip to content

Commit dbe641d

Browse files
authored
Prepare release v1.21 (#2371)
* Workaround timm dependencies for readme examples. Current synapse torch version is 2.10.0a0, which is lower than 2.10.0, which is declared as required by shipped torchvision version. This in turn causes torch reinstallation when pip dependency resolver is triggered, trying to install timm, which depends on torchvision, in order to satisfy torchvision dependency. Signed-off-by: Artur Kloniecki <arturx.kloniecki@intel.com> * Remove deprecated text-generation README tests. Signed-off-by: Artur Kloniecki <arturx.kloniecki@intel.com> --------- Signed-off-by: Artur Kloniecki <arturx.kloniecki@intel.com>
1 parent 3e3dea3 commit dbe641d

File tree

9 files changed

+49
-96
lines changed

9 files changed

+49
-96
lines changed

examples/image-classification/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This directory contains a script that showcases how to fine-tune any model suppo
2323

2424
First, you should install the requirements:
2525
```bash
26-
pip install -r requirements.txt
26+
pip install -r requirements.txt --no-deps
2727
```
2828

2929
## Single-HPU training

examples/image-classification/requirements.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,22 @@ datasets>=4.0.0
44
evaluate == 0.4.3
55
scikit-learn == 1.5.2
66
timm>=0.9.16
7+
8+
# dependencies for evaluate and datasets
9+
dill<0.4.1,>=0.3.0
10+
multiprocess<0.70.19
11+
xxhash
12+
httpx
13+
pyarrow
14+
15+
# dependencies for httpx
16+
anyio
17+
httpcore
18+
19+
# dependencies for httpcore
20+
h11
21+
22+
# dependencies for timm
23+
huggingface-hub
24+
pyyaml
25+
safetensors

examples/pytorch-image-models/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This directory contains the scripts that showcase how to inference/fine-tune the
2323
First, you should install the requirements:
2424

2525
```bash
26-
pip install -r requirements.txt
26+
pip install -r requirements.txt --no-deps
2727
```
2828

2929
## Training
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,16 @@
11
timm
22
datasets>=4.0.0
3+
4+
# dependencies for datasets
5+
dill<0.4.1,>=0.3.0
6+
multiprocess<0.70.19
7+
xxhash
8+
httpx
9+
pyarrow
10+
11+
# dependencies for httpx
12+
anyio
13+
httpcore
14+
15+
# dependencies for httpcore
16+
h11

examples/table-detection/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ This folder contains an example for using the [Table Transformer](https://huggin
2222

2323
First, you should install the requirements:
2424
```bash
25-
pip install -r requirements.txt
25+
pip install -r requirements.txt --no-deps
2626
```
2727

2828
## Single HPU Inference
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
11
timm
2+
huggingface-hub
3+
pyyaml
4+
safetensors

examples/text-generation/README.md

Lines changed: 0 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -365,47 +365,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python .
365365
--flash_attention_causal_mask
366366
```
367367
368-
Here is an example to measure the tensor quantization statistics on Llama3-405B with 8 cards:
369-
> Please note that Llama3-405B requires minimum 16 cards Gaudi2 and 8 cards Gaudi3.
370-
```bash
371-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure_include_outputs.json python ../gaudi_spawn.py \
372-
--use_deepspeed --world_size 8 run_lm_eval.py \
373-
-o acc_llama3_405b_bs1_quant.txt \
374-
--model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
375-
--use_hpu_graphs \
376-
--use_kv_cache \
377-
--trim_logits \
378-
--batch_size 1 \
379-
--bf16 \
380-
--reuse_cache \
381-
--use_flash_attention \
382-
--flash_attention_recompute \
383-
--flash_attention_causal_mask \
384-
--trust_remote_code
385-
386-
python quantization_tools/postprocess_measurements.py -m hqt_output
387-
```
388-
389-
Here is an example to quantize the model based on previous measurements for Llama3-405B with 8 cards:
390-
> Please note that Llama3-405B requires minimum 16 cards Gaudi2 and 8 cards Gaudi3.
391-
```bash
392-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python ../gaudi_spawn.py \
393-
--use_deepspeed --world_size 8 run_generation.py \
394-
--model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
395-
--use_hpu_graphs \
396-
--use_kv_cache \
397-
--limit_hpu_graphs \
398-
--max_input_tokens 2048 \
399-
--max_new_tokens 2048 \
400-
--batch_size 2 \
401-
--bf16 \
402-
--reuse_cache \
403-
--trim_logits \
404-
--use_flash_attention \
405-
--flash_attention_recompute \
406-
--flash_attention_causal_mask
407-
```
408-
409368
Here is an example to measure the tensor quantization statistics on Llama3-8b with 1 card:
410369
411370
```bash
@@ -437,35 +396,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python r
437396
--reuse_cache
438397
```
439398
440-
Here is an example to measure the tensor quantization statistics on gemma with 1 card:
441-
442-
```bash
443-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py \
444-
--model_name_or_path google/gemma-7b \
445-
--use_hpu_graphs \
446-
--use_kv_cache \
447-
--max_new_tokens 100 \
448-
--batch_size 1 \
449-
--reuse_cache \
450-
--bf16 \
451-
--sdp_on_bf16
452-
453-
python quantization_tools/postprocess_measurements.py -m hqt_output
454-
```
455-
456-
Here is an example to quantize the model based on previous measurements for gemma with 1 card:
457-
```bash
458-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_gemma.json python run_generation.py \
459-
--model_name_or_path google/gemma-7b \
460-
--use_hpu_graphs \
461-
--use_kv_cache \
462-
--max_new_tokens 100 \
463-
--batch_size 1 \
464-
--reuse_cache \
465-
--bf16 \
466-
--sdp_on_bf16
467-
```
468-
469399
Here is an example for running DeepSeek-R1 FP8 dynamic quantization without INC on 8-cards.
470400
```bash
471401
PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 \
@@ -574,28 +504,6 @@ PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant.json python .
574504
> [!NOTE]
575505
> For multi-card usage, the number of cards loaded and used needs to be kept consistent with that when saving.
576506
577-
### Loading FP8 Checkpoints from Hugging Face
578-
You can load pre-quantized FP8 models using the `--load_quantized_model_with_inc` argument. The `model_name_or_path` should be a model name from [Neural Magic](https://huggingface.co/collections/neuralmagic/fp8-llms-for-vllm-666742ed2b78b7ac8df13127) or a path to FP8 Checkpoints saved in Hugging Face format.
579-
580-
Below is an example of how to load `neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8` on two cards.
581-
```bash
582-
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
583-
--use_deepspeed --world_size 2 run_lm_eval.py \
584-
-o acc_load_fp8_model.txt \
585-
--model_name_or_path neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
586-
--use_hpu_graphs \
587-
--use_kv_cache \
588-
--trim_logits \
589-
--batch_size 1 \
590-
--bf16 \
591-
--use_flash_attention \
592-
--flash_attention_recompute \
593-
--attn_softmax_bf16 \
594-
--bucket_size=128 \
595-
--bucket_internal \
596-
--load_quantized_model_with_inc
597-
```
598-
599507
### Loading 4 Bit Checkpoints from Hugging Face
600508
601509
You can load pre-quantized 4bit models with the argument `--load_quantized_model_with_inc`.

examples/visual-question-answering/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ The `run_openclip_vqa.py` can be used to run zero shot image classification with
3434
The requirements for `run_openclip_vqa.py` can be installed with `openclip_requirements.txt` as follows:
3535

3636
```bash
37-
pip install -r openclip_requirements.txt
37+
pip install -r openclip_requirements.txt --no-deps
3838
```
3939

4040
By default, the script runs the sample outlined in [BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 notebook](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224/blob/main/biomed_clip_example.ipynb). One can also can also run other OpenCLIP models by specifying model, classifier labels and image URL(s) like so:
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
11
open_clip_torch==2.23.0
22
matplotlib
33

4+
# dependencies for open_clip_torch
5+
ftfy
6+
timm
7+
8+
# dependencies for matplotlib
9+
contourpy
10+
cycler
11+
fonttools
12+
kiwisolver

0 commit comments

Comments
 (0)