Skip to content

Commit d57d222

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- 984c4b15df2f97682936541a0cd47daa5a08a2f2 Merge "Adding A3 Mega Llama-2-7B MaxText/JAX" into main - 9ec22b974a257f4f92f299a12028700e614820c9 Removing hard coded NCCL settings GitOrigin-RevId: 9ec22b974a257f4f92f299a12028700e614820c9
1 parent dc6ef1a commit d57d222

File tree

11 files changed

+861
-62
lines changed

11 files changed

+861
-62
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
2121
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
2222
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
2323
| **GPT3-175B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/gpt3-175b/nemo-pretraining-gke/README.md) |
24+
| **Llama-2-7B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | MaxText | Pre-training | GKE | [Link](./training/a3mega/llama-2-7b/maxtext-pretraining-gke/README.md) |
2425
| **Llama-3-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3-70b/nemo-pretraining-gke/README.md) |
2526
| **Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/llama-3.1-70b/nemo-pretraining-gke/README.md) |
2627
| **Mixtral-8-7B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo | Pre-training | GKE | [Link](./training/a3mega/mixtral-8x7b/nemo-pretraining-gke/README.md) |
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
hardware: gpu
2+
dcn_data_parallelism: 16
3+
ici_fsdp_parallelism: 8
4+
per_device_batch_size: 4
5+
max_target_length: 4096
6+
model_name: llama2-7b
7+
enable_checkpointing: false
8+
attention: cudnn_flash_te
9+
remat_policy: minimal_flash
10+
use_iota_embed: true
11+
scan_layers: false
12+
dataset_type: synthetic
13+
logits_dot_in_fp32: false
14+
enable_goodput_recording: false
15+
monitor_goodput: false
16+
save_config_to_gcs: true
17+
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
hardware: gpu
2+
dcn_data_parallelism: 32
3+
ici_fsdp_parallelism: 8
4+
per_device_batch_size: 4
5+
max_target_length: 4096
6+
model_name: llama2-7b
7+
enable_checkpointing: false
8+
attention: cudnn_flash_te
9+
remat_policy: minimal_flash
10+
use_iota_embed: true
11+
scan_layers: false
12+
dataset_type: synthetic
13+
logits_dot_in_fp32: false
14+
enable_goodput_recording: false
15+
monitor_goodput: false
16+
save_config_to_gcs: true
17+
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v2
16+
name: maxtext_pretraining_workload
17+
description: maxtext_pretraining_workload_a3mega
18+
type: application
19+
version: 0.1.0
20+
appVersion: "1.16.0"
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Copyright 2024 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v1
16+
kind: ConfigMap
17+
metadata:
18+
name: "{{ .Release.Name }}"
19+
data:
20+
maxtext-configuration.yaml: |-
21+
{{ .Values.maxtext_config | nindent 4 }}
22+
xla-flags: >-
23+
--xla_gpu_enable_latency_hiding_scheduler=true
24+
--xla_gpu_enable_triton_gemm=false
25+
--xla_gpu_graph_level=0
26+
--xla_gpu_enable_highest_priority_async_stream=true
27+
--xla_gpu_all_reduce_combine_threshold_bytes=536870912
28+
--xla_gpu_all_gather_combine_threshold_bytes=134217728
29+
--xla_gpu_reduce_scatter_combine_threshold_bytes=67108864
30+
--xla_gpu_enable_pipelined_all_gather=true
31+
--xla_gpu_enable_pipelined_reduce_scatter=true
32+
--xla_gpu_enable_pipelined_all_reduce=true
33+
--xla_gpu_enable_while_loop_double_buffering=true
34+
--xla_gpu_enable_triton_softmax_fusion=false
35+
--xla_gpu_enable_all_gather_combine_by_dim=false
36+
--xla_gpu_enable_reduce_scatter_combine_by_dim=false
37+
--xla_disable_hlo_passes=rematerialization"

0 commit comments

Comments
 (0)