Skip to content

Commit c132a8a

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents 354b002 + 24b806c commit c132a8a

File tree

10 files changed

+803
-0
lines changed

10 files changed

+803
-0
lines changed

examples/kfto-sft-llm/README.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# LLM Fine-Tuning with Kubeflow Training on OpenShift AI
2+
3+
This example demonstrates how to fine-tune LLMs with the Kubeflow Training operator on OpenShift AI.
4+
It uses HuggingFace SFTTrainer, with PEFT for LoRA and qLoRA, and PyTorch FSDP to distribute the training on multiple GPUs / nodes.
5+
6+
> [!IMPORTANT]
7+
> This example has been tested with the configurations listed in the [validation](#validation) section.
8+
> Its configuration space is highly dimensional, and tightly coupled to runtime / hardware configuration.
9+
> You need to adapt it, and validate it works as expected, with your configuration(s), on your target environment(s).
10+
11+
## Requirements
12+
13+
* An OpenShift cluster with OpenShift AI (RHOAI) 2.17+ installed:
14+
* The `dashboard`, `trainingoperator` and `workbenches` components enabled
15+
* Sufficient worker nodes for your configuration(s) with NVIDIA GPUs (Ampere-based or newer recommended) or AMD GPUs (AMD Instinct MI300X or newer recommended)
16+
* A dynamic storage provisioner supporting RWX PVC provisioning
17+
18+
## Setup
19+
20+
* Access the OpenShift AI dashboard, for example from the top navigation bar menu:
21+
![](./docs/01.png)
22+
* Log in, then go to _Data Science Projects_ and create a project:
23+
![](./docs/02.png)
24+
* Once the project is created, click on _Create a workbench_:
25+
![](./docs/03.png)
26+
* Then create a workbench with the following settings:
27+
* Select the `PyTorch` (or the `ROCm-PyTorch`) notebook image:
28+
![](./docs/04a.png)
29+
* Select the `Medium` container size and add an accelerator:
30+
![](./docs/04b.png)
31+
> [!NOTE]
32+
> Adding an accelerator is only needed to test the fine-tuned model from within the workbench so you can spare an accelerator if needed.
33+
* Create a storage that'll be shared between the workbench and the fine-tuning runs.
34+
Make sure it uses a storage class with RWX capability and give it enough size according to the size of the model you want to fine-tune:
35+
![](./docs/04c.png)
36+
> [!NOTE]
37+
> You can attach an existing shared storage if you already have one instead.
38+
* Review the storage configuration and click "Create workbench":
39+
![](./docs/04d.png)
40+
* From "Workbenches" page, click on _Open_ when the workbench you've just created becomes ready:
41+
![](./docs/05.png)
42+
* From the workbench, clone this repository, i.e., `https://github.com/opendatahub-io/distributed-workloads.git`:
43+
![](./docs/06.png)
44+
* Navigate to the `distributed-workloads/examples/kfto-sft-llm` directory and open the `sft` notebook
45+
46+
You can now proceed with the instructions from the notebook. Enjoy!
47+
48+
## Validation
49+
50+
This example has been validated with the following configurations:
51+
52+
### Llama 3.3 70B Instruct - GSM8k - LoRA
53+
54+
* Cluster:
55+
* OpenShift AI 2.17
56+
* 16x `gx2-80x1280x8a100` nodes on IBM Cloud (NVIDIA-A100-SXM4-80GB GPU)
57+
* Configuration:
58+
```yaml
59+
# Model
60+
model_name_or_path: meta-llama/Llama-3.3-70B-Instruct
61+
model_revision: main
62+
torch_dtype: bfloat16
63+
attn_implementation: flash_attention_2
64+
65+
# PEFT / LoRA
66+
use_peft: true
67+
lora_target_modules: "all-linear"
68+
lora_modules_to_save: ["lm_head", "embed_tokens"]
69+
lora_r: 16
70+
lora_alpha: 8
71+
lora_dropout: 0.05
72+
73+
# Quantization / BitsAndBytes
74+
load_in_4bit: false
75+
load_in_8bit: false
76+
77+
# Datasets
78+
dataset_name: gsm8k
79+
dataset_config: main
80+
81+
# SFT
82+
max_seq_length: 1024
83+
packing: false
84+
85+
# Training
86+
per_device_train_batch_size: 32
87+
per_device_eval_batch_size: 32
88+
89+
bf16: true
90+
tf32: false
91+
92+
# FSDP
93+
fsdp: "full_shard auto_wrap offload"
94+
fsdp_config:
95+
activation_checkpointing: true
96+
```
97+
98+
### Llama 3.1 8B Instruct - GSM8k - LoRA
99+
100+
* Cluster:
101+
* OpenShift AI 2.17
102+
* 8x `gx2-80x1280x8a100` nodes on IBM Cloud (NVIDIA-A100-SXM4-80GB GPU)
103+
* Configuration:
104+
```yaml
105+
# Model
106+
model_name_or_path: Meta-Llama/Meta-Llama-3.1-8B-Instruct
107+
model_revision: main
108+
torch_dtype: bfloat16
109+
attn_implementation: flash_attention_2
110+
111+
# PEFT / LoRA
112+
use_peft: true
113+
lora_target_modules: "all-linear"
114+
lora_modules_to_save: ["lm_head", "embed_tokens"]
115+
lora_r: 16
116+
lora_alpha: 8
117+
lora_dropout: 0.05
118+
119+
# Quantization / BitsAndBytes
120+
load_in_4bit: false
121+
load_in_8bit: false
122+
123+
# Datasets
124+
dataset_name: gsm8k
125+
dataset_config: main
126+
127+
# SFT
128+
max_seq_length: 1024
129+
packing: false
130+
131+
# Training
132+
per_device_train_batch_size: 32
133+
per_device_eval_batch_size: 32
134+
135+
bf16: true
136+
tf32: false
137+
138+
# FSDP
139+
fsdp: "full_shard auto_wrap offload"
140+
fsdp_config:
141+
activation_checkpointing: true
142+
```

examples/kfto-sft-llm/docs/01.png

284 KB
Loading

examples/kfto-sft-llm/docs/02.png

342 KB
Loading

examples/kfto-sft-llm/docs/03.png

324 KB
Loading

examples/kfto-sft-llm/docs/04a.png

288 KB
Loading

examples/kfto-sft-llm/docs/04b.png

202 KB
Loading

examples/kfto-sft-llm/docs/04c.png

331 KB
Loading

examples/kfto-sft-llm/docs/04d.png

172 KB
Loading

examples/kfto-sft-llm/docs/05.png

391 KB
Loading

0 commit comments

Comments
 (0)