Skip to content

Commit e35db9e

Browse files
authored
Fine tuning demo (#169)
1 parent ec7ca4e commit e35db9e

File tree

1 file changed

+226
-1
lines changed

1 file changed

+226
-1
lines changed

setup.KubeConEU25/README.md

Lines changed: 226 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -823,8 +823,233 @@ In this example, `alice` uses [KubeRay](https://github.com/ray-project/kuberay)
823823
to run a job that uses [Ray](https://github.com/ray-project/ray) to fine tune a
824824
machine learning model.
825825

826+
This workload is an adaptation from [this blog post by Red Hat](https://developers.redhat.com/articles/2024/09/30/fine-tune-llama-openshift-ai), in turn adapted from [an example on Ray documentation](https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed).
827+
The example is about fine tuning Llama 3.1 with Ray, with DeepSpeed and LoRA.
828+
826829
<details>
827830

828-
TODO
831+
Let's set up the environment by installing Ray and cloning the repository
832+
833+
```bash
834+
uv venv myenv --python 3.12 --seed && source myenv/bin/activate && uv pip install ray datasets
835+
```
829836

837+
We are going to impersonate Alice in this example.
838+
839+
First, we create the PVC where we can download the model and save the checkpoints from the fine tuning job. We are calling this PVC `finetuning-pvc` and we need to add this to the Ray cluster YAML. If another name is used, please update the `claimName` entry in the Ray cluster definition.
840+
841+
```bash
842+
kubectl apply --as alice -n blue -f- << EOF
843+
apiVersion: v1
844+
kind: PersistentVolumeClaim
845+
metadata:
846+
name: finetuning-pvc
847+
spec:
848+
accessModes:
849+
- ReadWriteMany
850+
resources:
851+
requests:
852+
storage: 100Gi
853+
storageClassName: nfs-client-pokprod
854+
EOF
855+
```
856+
857+
Now, let's create an AppWrapper version of the Ray cluster. Notice that:
858+
859+
- We are using the container image `quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26` from Red Hat, but you can use the images from DockerHub if preferred
860+
- We are setting the number of worker replicas to `7`. Since we want to run on one GPU node, we are assigning one to the Ray Head pod, and one each to the 7 worker pods.
861+
862+
```bash
863+
cd tools/appwrapper-packager/
864+
cat << EOF > ray.yaml
865+
apiVersion: ray.io/v1
866+
kind: RayCluster
867+
metadata:
868+
name: ray
869+
spec:
870+
headGroupSpec:
871+
enableIngress: false
872+
rayStartParams:
873+
block: 'true'
874+
dashboard-host: 0.0.0.0
875+
num-gpus: '1'
876+
resources: '"{}"'
877+
serviceType: ClusterIP
878+
template:
879+
metadata: {}
880+
spec:
881+
containers:
882+
- env:
883+
- name: MY_POD_IP
884+
valueFrom:
885+
fieldRef:
886+
fieldPath: status.podIP
887+
- name: RAY_USE_TLS
888+
value: '0'
889+
image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26'
890+
imagePullPolicy: Always
891+
lifecycle:
892+
preStop:
893+
exec:
894+
command:
895+
- /bin/sh
896+
- '-c'
897+
- ray stop
898+
name: ray-head
899+
ports:
900+
- containerPort: 6379
901+
name: gcs
902+
protocol: TCP
903+
- containerPort: 8265
904+
name: dashboard
905+
protocol: TCP
906+
- containerPort: 10001
907+
name: client
908+
protocol: TCP
909+
resources:
910+
limits:
911+
cpu: '16'
912+
memory: 256G
913+
nvidia.com/gpu: '1'
914+
requests:
915+
cpu: '16'
916+
memory: 128G
917+
nvidia.com/gpu: '1'
918+
volumeMounts:
919+
- mountPath: /model
920+
name: model
921+
volumes:
922+
- name: model
923+
persistentVolumeClaim:
924+
claimName: finetuning-pvc
925+
rayVersion: 2.35.0
926+
workerGroupSpecs:
927+
- groupName: small-group-ray
928+
rayStartParams:
929+
block: 'true'
930+
num-gpus: '1'
931+
resources: '"{}"'
932+
replicas: 7
933+
scaleStrategy: {}
934+
template:
935+
metadata: {}
936+
spec:
937+
containers:
938+
- env:
939+
- name: MY_POD_IP
940+
valueFrom:
941+
fieldRef:
942+
fieldPath: status.podIP
943+
- name: RAY_USE_TLS
944+
value: '0'
945+
image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26'
946+
imagePullPolicy: Always
947+
lifecycle:
948+
preStop:
949+
exec:
950+
command:
951+
- /bin/sh
952+
- '-c'
953+
- ray stop
954+
name: machine-learning
955+
resources:
956+
limits:
957+
cpu: '16'
958+
memory: 256G
959+
nvidia.com/gpu: '1'
960+
requests:
961+
cpu: '16'
962+
memory: 128G
963+
nvidia.com/gpu: '1'
964+
volumeMounts:
965+
- mountPath: /model
966+
name: model
967+
volumes:
968+
- name: model
969+
persistentVolumeClaim:
970+
claimName: finetuning-pvc
971+
EOF
972+
```
973+
974+
Now let's use the tool to create the appwrapper:
975+
976+
```bash
977+
./awpack.py -o ray-aw.yaml -n ray-appwrapper -i ray.yaml
978+
```
979+
980+
Now we can submit the job while impersonating Alice
981+
982+
```bash
983+
kubectl create -f ray-aw.yaml -n blue --as alice
984+
```
985+
986+
Now that the Ray cluster is set up, first we need to expose the `ray-head` service, as that is the entrypoint for all job submissions. In another terminal, type:
987+
988+
```bash
989+
kubectl port-forward svc/ray-head-svc 8265:8265 -n blue --as alice
990+
```
991+
992+
Now we can download the git repository with the fine tuning workload.
993+
994+
```bash
995+
git clone https://github.com/opendatahub-io/distributed-workloads
996+
cd distributed-workloads/examples/ray-finetune-llm-deepspeed
997+
```
998+
999+
We also create a Python program that launches the job in the Ray cluster using the Ray API.
1000+
Notice that:
1001+
1002+
- We set the `--num-devices=8` as it is the total number of accelerators being used by head and workers
1003+
- we set the `HF_HOME` to the shared PVC, so the model will be downloaded as a single instance and shared among all executors
1004+
- we set `epochs` to just one for a shorter run
1005+
- we use localhost as entry point for submitting Ray jobs as we exposed the service earlier.
1006+
1007+
```bash
1008+
cat << EOF > finetuning.py
1009+
import create_dataset
1010+
create_dataset.gsm8k_qa_no_tokens_template()
1011+
1012+
from ray.job_submission import JobSubmissionClient
1013+
1014+
client = JobSubmissionClient("http://127.0.0.1:8265")
1015+
1016+
kick_off_pytorch_benchmark = (
1017+
"git clone https://github.com/opendatahub-io/distributed-workloads || true;"
1018+
# Run the benchmark.
1019+
"python ray_finetune_llm_deepspeed.py"
1020+
" --model-name=meta-llama/Meta-Llama-3.1-8B --lora --num-devices=8 --num-epochs=1 --ds-config=./deepspeed_configs/zero_3_offload_optim_param.json --storage-path=/model/ --batch-size-per-device=32 --eval-batch-size-per-device=32"
1021+
)
1022+
1023+
1024+
submission_id = client.submit_job(
1025+
entrypoint=kick_off_pytorch_benchmark,
1026+
runtime_env={
1027+
"env_vars": {
1028+
'HF_HOME': "/model/ray_finetune_llm_deepspeed/cache/",
1029+
},
1030+
'pip': 'requirements.txt',
1031+
'working_dir': './',
1032+
"excludes": ["/docs/", "*.ipynb", "*.md"]
1033+
},
1034+
)
1035+
1036+
print("Use the following command to follow this Job's logs:")
1037+
print(f"ray job logs '{submission_id}' --address http://127.0.0.1:8265 --follow")
1038+
EOF
1039+
python finetuning.py
1040+
```
1041+
The expected output is like the following:
1042+
```bash
1043+
2025-03-24 16:37:53,029 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_21ddaa8b13d30deb.zip.
1044+
2025-03-24 16:37:53,030 INFO packaging.py:575 -- Creating a file package for local module './'.
1045+
Use the following command to follow this Job's logs:
1046+
ray job logs 'raysubmit_C6hVCvdhpmapgQB8' --address http://127.0.0.1:8265 --follow
1047+
```
1048+
1049+
We can now either follow the logs on the terminal with `ray job logs` command, or open the Ray dashboard and follow from there. To access the Ray dashboard from localhost, as we exposed the service earlier.
1050+
1051+
Once the job is completed, the checkpoint with the fine tuned model is saved in the folder
1052+
```
1053+
/model/meta-llama/Meta-Llama-3.1-8B/TorchTrainer_<timestamp>/TorchTrainer_<id_timestamp>/checkpoint_<ID>
1054+
```
8301055
</details>

0 commit comments

Comments
 (0)