Skip to content

Commit c70d5cc

Browse files
committed
Small updates to README.
1 parent 55259d6 commit c70d5cc

File tree

1 file changed

+10
-7
lines changed
  • cloud-infrastructure/ai-infra-gpu/ai-infrastructure/nemo-megatron-training-oke

1 file changed

+10
-7
lines changed

cloud-infrastructure/ai-infra-gpu/ai-infrastructure/nemo-megatron-training-oke/README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,22 @@ This repository demonstrates how to train LLM using
55
on the Oracle Container Engine for Kubernetes (OKE) using
66
[NVIDIA Megatron](https://developer.nvidia.com/megatron-core).
77

8-
Reference results from NVIDIA to train Llama 2 can be found on the
9-
[NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/llama2-dgxc-benchmarking).
8+
Reference results from NVIDIA to train Llama 3 can be found on the
9+
[NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/llama3-dgxc-benchmarking).
1010

11-
Reviewed: 13.03.2025
11+
Reviewed: 18.03.2025
1212

1313
# When to use this asset?
1414

15-
* If you want to get started with training LLM like Llama 2 on Kubernetes using OCI.
15+
* If you want to get started with training LLM like Llama 3 on Kubernetes using OCI.
1616

1717
# How to use this asset?
1818

1919
## Prerequisites
2020

2121
* You have access to an Orcale Cloud Tenancy.
2222
* You have access to shapes with NVIDIA GPUs such as H100.
23-
* You have a HuggingFace account and access to `meta-llama/Llama-2-70b-hf`.
23+
* You have a HuggingFace account and access to `meta-llama/Llama-3.1-8B-Instruct`.
2424

2525
This guide is loosely based on the
2626
[NVIDIA NeMo Framework Launcher guide for Kubernetes](https://docs.nvidia.com/nemo-framework/user-guide/24.07/playbooks/kubernetes.html).
@@ -45,8 +45,11 @@ This guide is loosely based on the
4545
Data-Intensive Workloads](https://docs.oracle.com/en-us/iaas/Content/Resources/Assets/whitepapers/scale-out-oci-file-storage-performance-for-data-intensive-workloads.pdf)
4646
* [File Storage Performance Guide](https://docs.oracle.com/en-us/iaas/Content/Resources/Assets/whitepapers/file-storage-performance-guide.pdf)
4747

48-
3. Install Helm, the NVIDIA GPU Operator, and the Volcano scheduler according to
49-
[NVIDIA NeMo Framework Launcher guide for Kubernetes](https://docs.nvidia.com/nemo-framework/user-guide/24.07/playbooks/kubernetes.html).
48+
3. Install the NVIDIA GPU Operator according to
49+
[NVIDIA NeMo Framework Launcher guide for Kubernetes](https://docs.nvidia.com/nemo-framework/user-guide/24.07/playbooks/kubernetes.html), then install the [Volcano scheduler](https://github.com/volcano-sh/volcano) with:
50+
```sh
51+
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
52+
```
5053

5154
4. Copy the [files in this repository](./files) to the Kubernetes operator node.
5255
You can download them from this repository via:

0 commit comments

Comments
 (0)