Skip to content

Commit 4c7f5b2

Browse files
author
Grzegorz Pluto-Prondzinski
authored
Temporarily remove broken examples for OH 1.21.0 release (#2362)
* remove command * remove GRPO Training
1 parent 812fe66 commit 4c7f5b2

File tree

4 files changed

+0
-170
lines changed

4 files changed

+0
-170
lines changed

examples/image-to-text/README.md

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -63,50 +63,6 @@ Inference with FP8 precision is enabled using [Intel Neural Compressor (INC)](ht
6363
More information on enabling FP8 in SynapseAI is available here:
6464
[Run Inference Using FP8](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Quantization/Inference_Using_FP8.html?highlight=fp8)
6565

66-
### Single card inference with FP8
67-
Here is an example to measure the tensor quantization statistics on Llava-v1.6-vicuna-13b with SDPA:
68-
```bash
69-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
70-
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
71-
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
72-
--use_hpu_graphs \
73-
--bf16 \
74-
--sdp_on_bf16
75-
```
76-
77-
Here is an example to quantize the model based on previous measurements for Llava-v1.6-vicuna-13b with SDPA:
78-
```bash
79-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python run_pipeline.py \
80-
--model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
81-
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
82-
--use_hpu_graphs \
83-
--bf16 \
84-
--sdp_on_bf16
85-
```
86-
87-
### Multi-cards inference with FP8
88-
Here is an example of measuring the tensor quantization statistics on Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
89-
```bash
90-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
91-
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
92-
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
93-
--use_hpu_graphs \
94-
--bf16 \
95-
--use_flash_attention \
96-
--flash_attention_recompute
97-
```
98-
99-
Here is an example of quantizing the model based on previous measurements for Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
100-
```bash
101-
PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
102-
--model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
103-
--image_path "https://llava-vl.github.io/static/images/view.jpg" \
104-
--use_hpu_graphs \
105-
--bf16 \
106-
--use_flash_attention \
107-
--flash_attention_recompute
108-
```
109-
11066
## LORA Finetune
11167

11268
Here are single-/multi-device command examples for meta-llama/Llama-3.2-11B-Vision-Instruct.

examples/speech-recognition/README.md

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -274,43 +274,6 @@ PT_HPU_LAZY_MODE=1 python run_speech_recognition_seq2seq.py \
274274
If training on a different language, you should be sure to change the `language` argument. The `language` and `task` arguments should be omitted for English speech recognition.
275275

276276

277-
### Multi HPU Whisper Training with Seq2Seq
278-
The following example shows how to fine-tune the [Whisper large](https://huggingface.co/openai/whisper-large) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 8 HPU devices in half-precision:
279-
```bash
280-
PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
281-
--world_size 8 --use_mpi run_speech_recognition_seq2seq.py \
282-
--model_name_or_path="openai/whisper-large" \
283-
--dataset_name="regisss/common_voice_11_0_hi" \
284-
--language="hindi" \
285-
--task="transcribe" \
286-
--train_split_name="train+validation" \
287-
--eval_split_name="test" \
288-
--gaudi_config_name="Habana/whisper" \
289-
--max_steps="625" \
290-
--output_dir="/tmp/whisper-large-hi" \
291-
--per_device_train_batch_size="16" \
292-
--per_device_eval_batch_size="2" \
293-
--logging_steps="25" \
294-
--learning_rate="1e-5" \
295-
--generation_max_length="225" \
296-
--preprocessing_num_workers="1" \
297-
--max_duration_in_seconds="30" \
298-
--text_column_name="sentence" \
299-
--freeze_feature_encoder="False" \
300-
--sdp_on_bf16 \
301-
--bf16 \
302-
--overwrite_output_dir \
303-
--do_train \
304-
--do_eval \
305-
--predict_with_generate \
306-
--use_habana \
307-
--use_hpu_graphs_for_inference \
308-
--label_features_max_length 128 \
309-
--dataloader_num_workers 8 \
310-
--gradient_checkpointing \
311-
--throughput_warmup_steps 3
312-
```
313-
314277
#### Single HPU Seq2Seq Inference
315278

316279
The following example shows how to do inference with the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 1 HPU devices in half-precision:

examples/text-generation/README.md

Lines changed: 0 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -98,23 +98,6 @@ PT_HPU_LAZY_MODE=1 python run_generation.py \
9898

9999
> The batch size should be larger than or equal to the number of prompts. Otherwise, only the first N prompts are kept with N being equal to the batch size.
100100
101-
### Run Speculative Sampling on Gaudi
102-
103-
If you want to generate a sequence of text from a prompt of your choice using assisted decoding, you can use the following command as an example:
104-
105-
```bash
106-
PT_HPU_LAZY_MODE=1 python run_generation.py \
107-
--model_name_or_path gpt2 \
108-
--assistant_model distilgpt2 \
109-
--batch_size 1 \
110-
--max_new_tokens 100 \
111-
--use_hpu_graphs \
112-
--use_kv_cache \
113-
--num_return_sequences 1 \
114-
--temperature 0 \
115-
--prompt "Alice and Bob" \
116-
--sdp_on_bf16
117-
```
118101

119102
### Benchmark
120103

@@ -147,21 +130,6 @@ PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_g
147130
--sdp_on_bf16
148131
```
149132

150-
To run Llama3-405B inference on 8 Gaudi3 cards use the following command:
151-
```bash
152-
PT_HPU_LAZY_MODE=1 ENABLE_LB_BUNDLE_ALL_COMPUTE_MME=0 ENABLE_EXPERIMENTAL_FLAGS=1 \
153-
python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
154-
--model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
155-
--max_new_tokens 2048 \
156-
--bf16 \
157-
--use_hpu_graphs \
158-
--use_kv_cache \
159-
--batch_size 1 \
160-
--do_sample \
161-
--use_flash_attention \
162-
--flash_attention_causal_mask
163-
```
164-
165133
To run Deepseek-R1-BF16 inference on 16 Gaudi3 cards (2 nodes) use the following command. Ensure you replace the hostfile parameter with the appropriate file. Sample hostfile reference [here](/examples/multi-node-training/hostfile)
166134

167135
> NOTE: This is an experimental support currently. Due to memory constraints, BS=1 is only supported for now.

examples/trl/README.md

Lines changed: 0 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -15,63 +15,6 @@ $ pip install -U -r requirements_grpo.txt
1515
$ pip install -U -r requirements.txt
1616
```
1717

18-
## GRPO Training
19-
20-
Installing DeepSpeed
21-
22-
```sh
23-
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.22.0
24-
```
25-
26-
Running single card training
27-
28-
```sh
29-
PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 python3 grpo.py \
30-
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
31-
--dataset_name AI-MO/NuminaMath-TIR \
32-
--per_device_train_batch_size 8 \
33-
--per_device_eval_batch_size 8 \
34-
--do_train \
35-
--do_eval \
36-
--use_habana \
37-
--use_lazy_mode \
38-
--bf16 True \
39-
--gradient_accumulation_steps=16 \
40-
--max_prompt_length 512 \
41-
--num_generations 4 \
42-
--max_completion_length 64 \
43-
--use_peft True \
44-
--lora_target_modules q_proj k_proj \
45-
--num_train_epochs 1 \
46-
--save_strategy="epoch"
47-
```
48-
49-
50-
Runnig multi-card training
51-
52-
```sh
53-
PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_deepspeed grpo.py \
54-
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
55-
--dataset_name AI-MO/NuminaMath-TIR \
56-
--per_device_train_batch_size 8 \
57-
--per_device_eval_batch_size 8 \
58-
--do_train \
59-
--do_eval \
60-
--use_habana \
61-
--use_lazy_mode \
62-
--bf16 True \
63-
--gradient_accumulation_steps=16 \
64-
--gradient_checkpointing \
65-
--max_prompt_length 512 \
66-
--num_generations 4 \
67-
--max_completion_length 64 \
68-
--use_peft True \
69-
--lora_target_modules q_proj k_proj \
70-
--max_steps=500 \
71-
--logging_steps=10 \
72-
--save_steps=100
73-
```
74-
7518
## Supervised Finetuning
7619

7720
1. The following example is for the supervised Lora finetune with Qwen2 model for conversational format dataset.

0 commit comments

Comments
 (0)