Skip to content

Commit 14e0354

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents ca5ec1d + 1720bf6 commit 14e0354

File tree

4 files changed

+78
-1
lines changed

4 files changed

+78
-1
lines changed

examples/kfto-dreambooth/README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@ The finetuning is performed on OpenShift environment using Kubeflow Training ope
55

66
This example is based on HuggingFace DreamBooth Hackathon example - https://huggingface.co/learn/diffusion-course/en/hackathon/dreambooth
77

8+
> [!TIP]
9+
> **Multi-Team Resource Management**: For enterprise scenarios with multiple teams sharing GPU resources, see the [**Kueue Multi-Team Resource Management Workshop**](../../workshops/kueue/README.md). It demonstrates how to use this LLM fine-tuning example with Kueue for fair resource allocation, borrowing policies, and workload scheduling across teams.
810
911
## Requirements
1012

@@ -45,4 +47,36 @@ This example is based on HuggingFace DreamBooth Hackathon example - https://hugg
4547
* From the workbench, clone this repository, i.e., `https://github.com/opendatahub-io/distributed-workloads.git`
4648
* Navigate to the `distributed-workloads/examples/kfto-dreambooth` directory and open the `dreambooth` notebook
4749

48-
You can now proceed with the instructions from the notebook. Enjoy!
50+
You can now proceed with the instructions from the notebook. Enjoy!
51+
52+
> [!IMPORTANT]
53+
> **Kueue Integration (RHOAI 2.21+):**
54+
> * If using RHOAI 2.21+, the example supports Kueue integration for workload management:
55+
> * When using Kueue:
56+
> * Follow the [Configure Kueue (Optional)](#configure-kueue-optional) section to set up required resources
57+
> * Add the local-queue name label to your job configuration to enforce workload management
58+
> * You can skip Kueue usage by:
59+
> * Disabling the existing `kueue-validating-admission-policy-binding`
60+
> * Omitting the local-queue-name label in your job configuration
61+
>
62+
> **Note:** Kueue Enablement via Validating Admission Policy was introduced in RHOAI-2.21. You can skip this section if using an earlier RHOAI release version.
63+
64+
### Configure Kueue (Optional)
65+
66+
> [!NOTE]
67+
> This section is only required if you plan to use Kueue for workload management (RHOAI 2.21+) or Kueue is not already configured in your cluster.
68+
> The Kueue resource YAML files referenced below are located in the [Kueue workshop directory](../../workshops/kueue), specifically in `workshops/kueue/resources/`. You can use these files as templates for your own setup or copy them into your project as needed.
69+
70+
* Update the `nodeLabels` in the `workshops/kueue/resources/resource_flavor.yaml` file to match your AI worker nodes
71+
* Create the ResourceFlavor:
72+
```console
73+
oc apply -f workshops/kueue/resources/resource_flavor.yaml
74+
```
75+
* Create the ClusterQueue:
76+
```console
77+
oc apply -f workshops/kueue/resources/team1_cluster_queue.yaml
78+
```
79+
* Create a LocalQueue in your namespace:
80+
```console
81+
oc apply -f workshops/kueue/resources/team1_local_queue.yaml -n <your-namespace>
82+
```

examples/kfto-dreambooth/dreambooth.ipynb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,7 @@
394394
" resources_per_worker={\"gpu\": 2},\n",
395395
" base_image=\"quay.io/modh/training:py311-cuda121-torch241\",\n",
396396
" parameters=parameters,\n",
397+
" # labels={\"kueue.x-k8s.io/queue-name\": \"<LOCAL_QUEUE_NAME>\"}, # Optional: Add local queue name and uncomment these lines if using Kueue for resource management\n",
397398
" env_vars=[\n",
398399
" V1EnvVar(name=\"AWS_ACCESS_KEY_ID\", value_from=V1EnvVarSource(secret_key_ref=V1SecretKeySelector(key=\"AWS_ACCESS_KEY_ID\", name=aws_connection_name))),\n",
399400
" V1EnvVar(name=\"AWS_S3_BUCKET\", value_from=V1EnvVarSource(secret_key_ref=V1SecretKeySelector(key=\"AWS_S3_BUCKET\", name=aws_connection_name))),\n",

examples/kfto-feast/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ By integrating Feast into the fine-tuning pipeline, we ensure that the training
3535

3636
---
3737

38+
> [!TIP]
39+
> **Multi-Team Resource Management**: For enterprise scenarios with multiple teams sharing GPU resources, see the [**Kueue Multi-Team Resource Management Workshop**](../../workshops/kueue/README.md). It demonstrates how to use this LLM fine-tuning example with Kueue for fair resource allocation, borrowing policies, and workload scheduling across teams.
40+
3841
## Requirements
3942

4043
* An OpenShift cluster with OpenShift AI (RHOAI) 2.17+ installed:
@@ -109,4 +112,42 @@ By following this notebook, you'll gain hands-on experience in setting up a **fe
109112

110113
You can now proceed with the instructions from the notebook. Enjoy!
111114

115+
> [!IMPORTANT]
116+
> **Hugging Face Token Requirements:**
117+
> * You will need a Hugging Face token if using gated models:
118+
> * The examples use gated Llama models that require a token (e.g., https://huggingface.co/meta-llama/Llama-3.1-8B)
119+
> * Set the `HF_TOKEN` environment variable in your job configuration
120+
> * Note: You can skip the token if switching to non-gated models
121+
>
122+
> **Kueue Integration (RHOAI 2.21+):**
123+
> * If using RHOAI 2.21+, the example supports Kueue integration for workload management:
124+
> * When using Kueue:
125+
> * Follow the [Configure Kueue (Optional)](#configure-kueue-optional) section to set up required resources
126+
> * Add the local-queue name label to your job configuration to enforce workload management
127+
> * You can skip Kueue usage by:
128+
> * Disabling the existing `kueue-validating-admission-policy-binding`
129+
> * Omitting the local-queue-name label in your job configuration
130+
>
131+
> **Note:** Kueue Enablement via Validating Admission Policy was introduced in RHOAI-2.21. You can skip this section if using an earlier RHOAI release version.
132+
133+
### Configure Kueue (Optional)
134+
135+
> [!NOTE]
136+
> This section is only required if you plan to use Kueue for workload management (RHOAI 2.21+) or Kueue is not already configured in your cluster.
137+
> The Kueue resource YAML files referenced below are located in the [Kueue workshop directory](../../workshops/kueue), specifically in `workshops/kueue/resources/`. You can use these files as templates for your own setup or copy them into your project as needed.
138+
139+
* Update the `nodeLabels` in the `workshops/kueue/resources/resource_flavor.yaml` file to match your AI worker nodes
140+
* Create the ResourceFlavor:
141+
```console
142+
oc apply -f workshops/kueue/resources/resource_flavor.yaml
143+
```
144+
* Create the ClusterQueue:
145+
```console
146+
oc apply -f workshops/kueue/resources/team1_cluster_queue.yaml
147+
```
148+
* Create a LocalQueue in your namespace:
149+
```console
150+
oc apply -f workshops/kueue/resources/team1_local_queue.yaml -n <your-namespace>
151+
```
152+
112153

examples/kfto-feast/kfto_feast.ipynb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1328,6 +1328,7 @@
13281328
" \"USE_LORA\": \"true\", # Whether to apply LoRA adapters in the standard (full‑precision) mode.\n",
13291329
" \"USE_QLORA\":\"false\", # Whether to apply QLoRA, which loads the model in 4‑bit quantized mode and then applies LoRA adapters.\n",
13301330
" }, \n",
1331+
" # labels={\"kueue.x-k8s.io/queue-name\": \"<LOCAL_QUEUE_NAME>\"}, # Optional: Add local queue name and uncomment these lines if using Kueue for resource management\n",
13311332
" volume_mounts=[\n",
13321333
" {\n",
13331334
" \"name\": \"config-volume\",\n",

0 commit comments

Comments
 (0)