Skip to content

Commit 0c86ae4

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- ddc7b4335809a0077ddac5cba1660cff3a937bf3 Merge "Modifications on Llama-3.1-405B recipes" into main GitOrigin-RevId: ddc7b4335809a0077ddac5cba1660cff3a937bf3
1 parent 1a3020b commit 0c86ae4

File tree

4 files changed

+7
-4
lines changed

4 files changed

+7
-4
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
3232
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
3333
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/maxtext-pretraining-gke/README.md)
3434
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/nemo-pretraining-gke/README.md)
35+
| **Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-405b/nemo-pretraining-gke/README.md)
3536
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/maxtext-pretraining-gke/README.md)
3637
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md) |
3738

src/frameworks/a3ultra/nemo-configs/llama-3.1-405b-576gpus-a3ultra-bf16.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
run:
2-
name: llama-3.1-405b-a3u-fp8
2+
name: llama-3.1-405b-a3u-bf16
33
time_limit: 0-02:30:00
44
dependency: singleton
55
trainer:
@@ -124,7 +124,7 @@ model:
124124
deterministic_mode: false
125125
transformer_engine: true
126126
fp8: false
127-
ub_tp_comm_overlap: false
127+
ub_tp_comm_overlap: true
128128
use_flash_attention: true
129129
fsdp: false
130130
fsdp_sharding_strategy: full

src/frameworks/a3ultra/nemo-configs/llama-3.1-405b-576gpus-a3ultra-fp8.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ model:
130130
fp8_interval: 1
131131
fp8_amax_history_len: 1024
132132
fp8_amax_compute_algo: max
133-
ub_tp_comm_overlap: false
133+
ub_tp_comm_overlap: true
134134
use_flash_attention: true
135135
fsdp: false
136136
fsdp_sharding_strategy: full

training/a3ultra/llama-3.1-405b/nemo-pretraining-gke/values.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,6 @@ network:
4040
- name: NCCL_DEBUG
4141
value: "VERSION"
4242
- name: NCCL_WORK_FIFO_DEPTH
43-
value: "4194304"
43+
value: "4194304"
44+
- name: NVTE_UB_SOCKET_IFNAME
45+
value: "eth1"

0 commit comments

Comments
 (0)