Temporarily remove broken examples for OH 1.21.0 release (#2362)

Grzegorz Pluto-Prondzinski · web-flow · commit 4c7f5b2b54fc · 2026-01-20T13:39:38.000+01:00
* remove command

* remove GRPO Training
diff --git a/examples/image-to-text/README.md b/examples/image-to-text/README.md
@@ -63,50 +63,6 @@ Inference with FP8 precision is enabled using [Intel Neural Compressor (INC)](ht
 More information on enabling FP8 in SynapseAI is available here:
 [Run Inference Using FP8](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Quantization/Inference_Using_FP8.html?highlight=fp8)
 
-### Single card inference with FP8
-Here is an example to measure the tensor quantization statistics on Llava-v1.6-vicuna-13b with SDPA:
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_pipeline.py \
-    --model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
-    --image_path "https://llava-vl.github.io/static/images/view.jpg" \
-    --use_hpu_graphs \
-    --bf16 \
-    --sdp_on_bf16
-```
-
-Here is an example to quantize the model based on previous measurements for Llava-v1.6-vicuna-13b with SDPA:
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python run_pipeline.py \
-    --model_name_or_path llava-hf/llava-v1.6-vicuna-13b-hf \
-    --image_path "https://llava-vl.github.io/static/images/view.jpg" \
-    --use_hpu_graphs \
-    --bf16 \
-    --sdp_on_bf16
-```
-
-### Multi-cards inference with FP8
-Here is an example of measuring the tensor quantization statistics on Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_measure.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
-    --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
-    --image_path "https://llava-vl.github.io/static/images/view.jpg" \
-    --use_hpu_graphs \
-    --bf16 \
-    --use_flash_attention \
-    --flash_attention_recompute
-```
-
-Here is an example of quantizing the model based on previous measurements for Llava-v1.6-mistral-7b with FusedSDPA on 8 HPUs:
-```bash
-PT_HPU_LAZY_MODE=1 QUANT_CONFIG=./quantization_config/maxabs_quant_scale_format_const.json python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py \
-    --model_name_or_path llava-hf/llava-v1.6-mistral-7b-hf \
-    --image_path "https://llava-vl.github.io/static/images/view.jpg" \
-    --use_hpu_graphs \
-    --bf16 \
-    --use_flash_attention \
-    --flash_attention_recompute
-```
-
 ## LORA Finetune
 
 Here are single-/multi-device command examples for meta-llama/Llama-3.2-11B-Vision-Instruct.
diff --git a/examples/speech-recognition/README.md b/examples/speech-recognition/README.md
@@ -274,43 +274,6 @@ PT_HPU_LAZY_MODE=1 python run_speech_recognition_seq2seq.py \
 If training on a different language, you should be sure to change the `language` argument. The `language` and `task` arguments should be omitted for English speech recognition.
 
 
-### Multi HPU Whisper Training with Seq2Seq
-The following example shows how to fine-tune the [Whisper large](https://huggingface.co/openai/whisper-large) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 8 HPU devices in half-precision:
-```bash
-PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py \
-    --world_size 8 --use_mpi run_speech_recognition_seq2seq.py \
-    --model_name_or_path="openai/whisper-large" \
-    --dataset_name="regisss/common_voice_11_0_hi" \
-    --language="hindi" \
-    --task="transcribe" \
-    --train_split_name="train+validation" \
-    --eval_split_name="test" \
-    --gaudi_config_name="Habana/whisper" \
-    --max_steps="625" \
-    --output_dir="/tmp/whisper-large-hi" \
-    --per_device_train_batch_size="16" \
-    --per_device_eval_batch_size="2" \
-    --logging_steps="25" \
-    --learning_rate="1e-5" \
-    --generation_max_length="225" \
-    --preprocessing_num_workers="1" \
-    --max_duration_in_seconds="30" \
-    --text_column_name="sentence" \
-    --freeze_feature_encoder="False" \
-    --sdp_on_bf16 \
-    --bf16 \
-    --overwrite_output_dir \
-    --do_train \
-    --do_eval \
-    --predict_with_generate \
-    --use_habana \
-    --use_hpu_graphs_for_inference \
-    --label_features_max_length 128 \
-    --dataloader_num_workers 8 \
-    --gradient_checkpointing \
-    --throughput_warmup_steps 3
-```
-
 #### Single HPU Seq2Seq Inference
 
 The following example shows how to do inference with the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/regisss/common_voice_11_0_hi) using 1 HPU devices in half-precision:
diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
@@ -98,23 +98,6 @@ PT_HPU_LAZY_MODE=1 python run_generation.py \
 
 > The batch size should be larger than or equal to the number of prompts. Otherwise, only the first N prompts are kept with N being equal to the batch size.
 
-### Run Speculative Sampling on Gaudi
-
-If you want to generate a sequence of text from a prompt of your choice using assisted decoding, you can use the following command as an example:
-
-```bash
-PT_HPU_LAZY_MODE=1 python run_generation.py \
---model_name_or_path gpt2 \
---assistant_model distilgpt2 \
---batch_size 1 \
---max_new_tokens 100 \
---use_hpu_graphs \
---use_kv_cache \
---num_return_sequences 1 \
---temperature 0 \
---prompt "Alice and Bob" \
---sdp_on_bf16
-```
 
 ### Benchmark
 
@@ -147,21 +130,6 @@ PT_HPU_LAZY_MODE=1 python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_g
 --sdp_on_bf16
 ```
 
-To run Llama3-405B inference on 8 Gaudi3 cards use the following command:
-```bash
-PT_HPU_LAZY_MODE=1 ENABLE_LB_BUNDLE_ALL_COMPUTE_MME=0 ENABLE_EXPERIMENTAL_FLAGS=1 \
-python ../gaudi_spawn.py --use_deepspeed --world_size 8 run_generation.py \
---model_name_or_path meta-llama/Llama-3.1-405B-Instruct \
---max_new_tokens 2048 \
---bf16 \
---use_hpu_graphs \
---use_kv_cache \
---batch_size 1 \
---do_sample \
---use_flash_attention \
---flash_attention_causal_mask
-```
-
 To run Deepseek-R1-BF16 inference on 16 Gaudi3 cards (2 nodes) use the following command. Ensure you replace the hostfile parameter with the appropriate file. Sample hostfile reference [here](/examples/multi-node-training/hostfile)
 
 > NOTE: This is an experimental support currently. Due to memory constraints, BS=1 is only supported for now.
diff --git a/examples/trl/README.md b/examples/trl/README.md
@@ -15,63 +15,6 @@ $ pip install -U -r requirements_grpo.txt
 $ pip install -U -r requirements.txt
 ```
 
-## GRPO Training
-
-Installing DeepSpeed
-
-```sh
-pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.22.0
-```
-
-Running single card training
-
-```sh
-PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 python3 grpo.py \
-    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
-    --dataset_name AI-MO/NuminaMath-TIR \
-    --per_device_train_batch_size 8 \
-    --per_device_eval_batch_size 8 \
-    --do_train \
-    --do_eval \
-    --use_habana \
-    --use_lazy_mode \
-    --bf16 True \
-    --gradient_accumulation_steps=16 \
-    --max_prompt_length 512 \
-    --num_generations 4 \
-    --max_completion_length 64 \
-    --use_peft True \
-    --lora_target_modules q_proj k_proj \
-    --num_train_epochs 1 \
-    --save_strategy="epoch"
-```
-
-
-Runnig multi-card training
-
-```sh
-PT_HPU_MAX_COMPOUND_OP_SIZE=10 PT_HPU_LAZY_MODE=1 python3 ../gaudi_spawn.py --world_size 8 --use_deepspeed grpo.py \
-    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
-    --dataset_name AI-MO/NuminaMath-TIR \
-    --per_device_train_batch_size 8 \
-    --per_device_eval_batch_size 8 \
-    --do_train \
-    --do_eval \
-    --use_habana \
-    --use_lazy_mode \
-    --bf16 True \
-    --gradient_accumulation_steps=16 \
-    --gradient_checkpointing \
-    --max_prompt_length 512 \
-    --num_generations 4 \
-    --max_completion_length 64 \
-    --use_peft True \
-    --lora_target_modules q_proj k_proj \
-    --max_steps=500 \
-    --logging_steps=10 \
-    --save_steps=100
-```
-
 ## Supervised Finetuning
 
 1. The following example is for the supervised Lora finetune with Qwen2 model for conversational format dataset.