Skip to content

Commit 18e4572

Browse files
author
Claudia
committed
fine tuning demo
1 parent 5c0081f commit 18e4572

File tree

1 file changed

+226
-1
lines changed

1 file changed

+226
-1
lines changed

setup.KubeConEU25/README.md

Lines changed: 226 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -692,8 +692,233 @@ In this example, `alice` uses [KubeRay](https://github.com/ray-project/kuberay)
692692
to run a job that uses [Ray](https://github.com/ray-project/ray) to fine tune a
693693
machine learning model.
694694

695+
This workload is an adaptation from [this blog post by Red Hat](https://developers.redhat.com/articles/2024/09/30/fine-tune-llama-openshift-ai), in turn adapted from [an example on Ray documentation](https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed).
696+
The example is about fine tuning Llama 3.1 with Ray, with DeepSpeed and LoRA.
697+
695698
<details>
696699

697-
TODO
700+
Let's set up the environment by installing Ray and cloning the repository
701+
702+
```bash
703+
uv venv myenv --python 3.12 --seed && source myenv/bin/activate && uv pip install ray datasets
704+
```
705+
706+
We are going to impersonate Alice in this example.
707+
708+
First, we create the PVC where we can download the model and save the checkpoints from the fine tuning job. We are calling this PVC `finetuning-pvc` and we need to add this to the Ray cluster YAML. If another name is used, please update the `claimName` entry in the Ray cluster definition.
709+
710+
```bash
711+
kubectl apply --as alice -n blue -f- << EOF
712+
apiVersion: v1
713+
kind: PersistentVolumeClaim
714+
metadata:
715+
name: finetuning-pvc
716+
spec:
717+
accessModes:
718+
- ReadWriteMany
719+
resources:
720+
requests:
721+
storage: 100Gi
722+
storageClassName: nfs-client-pokprod
723+
EOF
724+
```
725+
726+
Now, let's create an AppWrapper version of the Ray cluster. Notice that:
727+
728+
- We are using the container image `quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26` from Red Hat, but you can use the images from DockerHub if preferred
729+
- We are setting the number of worker replicas to `7`. Since we want to run on one GPU node, we are assigning one to the Ray Head pod, and one each to the 7 worker pods.
730+
731+
```bash
732+
cd tools/appwrapper-packager/
733+
cat << EOF > ray.yaml
734+
apiVersion: ray.io/v1
735+
kind: RayCluster
736+
metadata:
737+
name: ray
738+
spec:
739+
headGroupSpec:
740+
enableIngress: false
741+
rayStartParams:
742+
block: 'true'
743+
dashboard-host: 0.0.0.0
744+
num-gpus: '1'
745+
resources: '"{}"'
746+
serviceType: ClusterIP
747+
template:
748+
metadata: {}
749+
spec:
750+
containers:
751+
- env:
752+
- name: MY_POD_IP
753+
valueFrom:
754+
fieldRef:
755+
fieldPath: status.podIP
756+
- name: RAY_USE_TLS
757+
value: '0'
758+
image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26'
759+
imagePullPolicy: Always
760+
lifecycle:
761+
preStop:
762+
exec:
763+
command:
764+
- /bin/sh
765+
- '-c'
766+
- ray stop
767+
name: ray-head
768+
ports:
769+
- containerPort: 6379
770+
name: gcs
771+
protocol: TCP
772+
- containerPort: 8265
773+
name: dashboard
774+
protocol: TCP
775+
- containerPort: 10001
776+
name: client
777+
protocol: TCP
778+
resources:
779+
limits:
780+
cpu: '16'
781+
memory: 256G
782+
nvidia.com/gpu: '1'
783+
requests:
784+
cpu: '16'
785+
memory: 128G
786+
nvidia.com/gpu: '1'
787+
volumeMounts:
788+
- mountPath: /model
789+
name: model
790+
volumes:
791+
- name: model
792+
persistentVolumeClaim:
793+
claimName: finetuning-pvc
794+
rayVersion: 2.35.0
795+
workerGroupSpecs:
796+
- groupName: small-group-ray
797+
rayStartParams:
798+
block: 'true'
799+
num-gpus: '1'
800+
resources: '"{}"'
801+
replicas: 7
802+
scaleStrategy: {}
803+
template:
804+
metadata: {}
805+
spec:
806+
containers:
807+
- env:
808+
- name: MY_POD_IP
809+
valueFrom:
810+
fieldRef:
811+
fieldPath: status.podIP
812+
- name: RAY_USE_TLS
813+
value: '0'
814+
image: 'quay.io/rhoai/ray:2.35.0-py311-cu121-torch24-fa26'
815+
imagePullPolicy: Always
816+
lifecycle:
817+
preStop:
818+
exec:
819+
command:
820+
- /bin/sh
821+
- '-c'
822+
- ray stop
823+
name: machine-learning
824+
resources:
825+
limits:
826+
cpu: '16'
827+
memory: 256G
828+
nvidia.com/gpu: '1'
829+
requests:
830+
cpu: '16'
831+
memory: 128G
832+
nvidia.com/gpu: '1'
833+
volumeMounts:
834+
- mountPath: /model
835+
name: model
836+
volumes:
837+
- name: model
838+
persistentVolumeClaim:
839+
claimName: finetuning-pvc
840+
EOF
841+
```
698842

843+
Now let's use the tool to create the appwrapper:
844+
845+
```bash
846+
./awpack.py -o ray-aw.yaml -n ray-appwrapper -i ray.yaml
847+
```
848+
849+
Now we can submit the job while impersonating Alice
850+
851+
```bash
852+
kubectl create -f ray-aw.yaml -n blue --as alice
853+
```
854+
855+
Now that the Ray cluster is set up, first we need to expose the `ray-head` service, as that is the entrypoint for all job submissions. In another terminal, type:
856+
857+
```bash
858+
kubectl port-forward svc/ray-head-svc 8265:8265 -n blue --as alice
859+
```
860+
861+
Now we can download the git repository with the fine tuning workload.
862+
863+
```bash
864+
git clone https://github.com/opendatahub-io/distributed-workloads
865+
cd distributed-workloads/examples/ray-finetune-llm-deepspeed
866+
```
867+
868+
We also create a Python program that launches the job in the Ray cluster using the Ray API.
869+
Notice that:
870+
871+
- We set the `--num-devices=8` as it is the total number of accelerators being used by head and workers
872+
- we set the `HF_HOME` to the shared PVC, so the model will be downloaded as a single instance and shared among all executors
873+
- we set `epochs` to just one for a shorter run
874+
- we use localhost as entry point for submitting Ray jobs as we exposed the service earlier.
875+
876+
```bash
877+
cat << EOF > finetuning.py
878+
import create_dataset
879+
create_dataset.gsm8k_qa_no_tokens_template()
880+
881+
from ray.job_submission import JobSubmissionClient
882+
883+
client = JobSubmissionClient("http://127.0.0.1:8265")
884+
885+
kick_off_pytorch_benchmark = (
886+
"git clone https://github.com/opendatahub-io/distributed-workloads || true;"
887+
# Run the benchmark.
888+
"python ray_finetune_llm_deepspeed.py"
889+
" --model-name=meta-llama/Meta-Llama-3.1-8B --lora --num-devices=8 --num-epochs=1 --ds-config=./deepspeed_configs/zero_3_offload_optim_param.json --storage-path=/model/ --batch-size-per-device=32 --eval-batch-size-per-device=32"
890+
)
891+
892+
893+
submission_id = client.submit_job(
894+
entrypoint=kick_off_pytorch_benchmark,
895+
runtime_env={
896+
"env_vars": {
897+
'HF_HOME': "/model/ray_finetune_llm_deepspeed/cache/",
898+
},
899+
'pip': 'requirements.txt',
900+
'working_dir': './',
901+
"excludes": ["/docs/", "*.ipynb", "*.md"]
902+
},
903+
)
904+
905+
print("Use the following command to follow this Job's logs:")
906+
print(f"ray job logs '{submission_id}' --address http://127.0.0.1:8265 --follow")
907+
EOF
908+
python finetuning.py
909+
```
910+
The expected output is like the following:
911+
```bash
912+
2025-03-24 16:37:53,029 INFO dashboard_sdk.py:338 -- Uploading package gcs://_ray_pkg_21ddaa8b13d30deb.zip.
913+
2025-03-24 16:37:53,030 INFO packaging.py:575 -- Creating a file package for local module './'.
914+
Use the following command to follow this Job's logs:
915+
ray job logs 'raysubmit_C6hVCvdhpmapgQB8' --address http://127.0.0.1:8265 --follow
916+
```
917+
918+
We can now either follow the logs on the terminal with `ray job logs` command, or open the Ray dashboard and follow from there. To access the Ray dashboard from localhost, as we exposed the service earlier.
919+
920+
Once the job is completed, the checkpoint with the fine tuned model is saved in the folder
921+
```
922+
/model/meta-llama/Meta-Llama-3.1-8B/TorchTrainer_<timestamp>/checkpoint_<ID>
923+
```
699924
</details>

0 commit comments

Comments
 (0)