Skip to content

Commit 9b1c797

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- 7db23c38a100cf56941839b8404ec8bf7b35f4d6 Remove maxtext-llama-2-7b - d674e88029b332f62bb1b6d22d7a85e5f2f10c1a Merge "Changing name convention to the default network in... GitOrigin-RevId: d674e88029b332f62bb1b6d22d7a85e5f2f10c1a
1 parent a27a0e2 commit 9b1c797

File tree

27 files changed

+24
-351
lines changed

27 files changed

+24
-351
lines changed

README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,12 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
1818

1919
### Training benchmarks A3 Mega
2020

21-
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
22-
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
23-
| **GPT3-175B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/gpt3-175b/nemo-pretraining-gke/README.md) |
24-
| **Llama-2-7B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | MaxText | Pre-training | GKE | [Link](./training/a3mega/llama-2-7b/maxtext-pretraining-gke/README.md) |
25-
| **Llama-3-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3-70b/nemo-pretraining-gke/README.md) |
26-
| **Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3.1-70b/nemo-pretraining-gke/README.md) |
27-
| **Mixtral-8-7B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/mixtral-8x7b/nemo-pretraining-gke/README.md) |
21+
Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe
22+
----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------
23+
**GPT3-175B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/gpt3-175b/nemo-pretraining-gke/README.md)
24+
**Llama-3-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3-70b/nemo-pretraining-gke/README.md)
25+
**Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3.1-70b/nemo-pretraining-gke/README.md)
26+
**Mixtral-8-7B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/mixtral-8x7b/nemo-pretraining-gke/README.md)
2827

2928
### Training benchmarks A3 Ultra
3029

inference/a3ultra/deepseek-r1-671b/sglang-serving-gke/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,6 @@ The recipe uses the helm chart to run the above steps.
157157
cd $RECIPE_ROOT
158158
helm install -f values.yaml \
159159
--set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
160-
--set clusterName=$CLUSTER_NAME \
161160
--set job.image.repository=${ARTIFACT_REGISTRY}/${SGLANG_IMAGE} \
162161
--set job.image.tag=${SGLANG_VERSION} \
163162
$USER-serving-deepseek-r1-model \

inference/a3ultra/deepseek-r1-671b/sglang-serving-gke/values.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
15-
clusterName:
1615

1716
huggingface:
1817
secretName: hf-secret

inference/a3ultra/deepseek-r1-671b/vllm-serving-gke/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,6 @@ The recipe uses the helm chart to run the above steps.
156156
cd $RECIPE_ROOT
157157
helm install -f values.yaml \
158158
--set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
159-
--set clusterName=$CLUSTER_NAME \
160159
--set job.image.repository=${ARTIFACT_REGISTRY}/${VLLM_IMAGE} \
161160
--set job.image.tag=${VLLM_VERSION} \
162161
$USER-serving-deepseek-r1-model \

inference/a3ultra/deepseek-r1-671b/vllm-serving-gke/values.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
15-
clusterName:
1615

1716
huggingface:
1817
secretName: hf-secret

inference/a3ultra/llama-3.1-405b/trtllm-inference-gke/single-node/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,6 @@ The recipe uses the helm chart to run the above steps.
170170
cd $RECIPE_ROOT
171171
helm install -f values.yaml \
172172
--set volumes.gcsMounts[0].bucketName=${GCS_BUCKET} \
173-
--set clusterName=$CLUSTER_NAME \
174173
--set job.image.repository=${ARTIFACT_REGISTRY}/${TRT_LLM_IMAGE} \
175174
--set job.image.tag=${TRT_LLM_VERSION} \
176175
$USER-benchmark-llama-model \

inference/a3ultra/llama-3.1-405b/trtllm-inference-gke/single-node/values.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
clusterName:
1615

1716
huggingface:
1817
secretName: hf-secret

src/helm-charts/a3ultra/maxtext-training/templates/maxtext-launcher-job.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ spec:
6464
{{- else }}
6565
[
6666
{"interfaceName":"eth0","network":"default"},
67-
{"interfaceName":"eth1","network":"{{ $root.Values.clusterName }}-sub-1"},
67+
{"interfaceName":"eth1","network":"gvnic-1"},
6868
{{- range $i := until 8 }}
69-
{"interfaceName":"eth{{ add 2 $i }}","network":"{{ $root.Values.clusterName }}-rdma-sub-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
69+
{"interfaceName":"eth{{ add 2 $i }}","network":"rdma-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
7070
{{- end }}
7171
]
7272
{{- end }}

src/helm-charts/a3ultra/nccl-tests/templates/nccl-tests-job.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,9 @@ spec:
5656
{{- else }}
5757
[
5858
{"interfaceName":"eth0","network":"default"},
59-
{"interfaceName":"eth1","network":"{{ $root.Values.clusterName }}-sub-1"},
59+
{"interfaceName":"eth1","network":"gvnic-1"},
6060
{{- range $i := until 8 }}
61-
{"interfaceName":"eth{{ add 2 $i }}","network":"{{ $root.Values.clusterName }}-rdma-sub-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
61+
{"interfaceName":"eth{{ add 2 $i }}","network":"rdma-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
6262
{{- end }}
6363
]
6464
{{- end }}

src/helm-charts/a3ultra/nemo-training/templates/nemo-launcher-job.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,9 @@ spec:
6464
{{- else }}
6565
[
6666
{"interfaceName":"eth0","network":"default"},
67-
{"interfaceName":"eth1","network":"{{ $root.Values.clusterName }}-sub-1"},
67+
{"interfaceName":"eth1","network":"gvnic-1"},
6868
{{- range $i := until 8 }}
69-
{"interfaceName":"eth{{ add 2 $i }}","network":"{{ $root.Values.clusterName }}-rdma-sub-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
69+
{"interfaceName":"eth{{ add 2 $i }}","network":"rdma-{{ $i }}"}{{ eq $i 7 | ternary "" ","}}
7070
{{- end }}
7171
]
7272
{{- end }}

0 commit comments

Comments
 (0)