Reduced workload name

incredere · web-flow · commit 062b03751678 · 2025-10-27T11:27:37.000-07:00
Shortened workload name to limitations (32 characters)
"$USER-a4-llama3-1-70b"
diff --git a/training/a4/llama3-1-70b/nemo-pretraining-gke/32node-bf16-seq8192-gbs2048/recipe/README.md b/training/a4/llama3-1-70b/nemo-pretraining-gke/32node-bf16-seq8192-gbs2048/recipe/README.md
@@ -1,7 +1,7 @@
 <!-- mdformat global-off -->
-# Pretrain llama3-1-70b-seq8192-gbs2048-mbs1-gpus256 workloads on a4 GKE Node pools with Nvidia NeMo Framework
+# Pretrain $USER-a4-llama3-1-70b workloads on a4 GKE Node pools with Nvidia NeMo Framework
 
-This recipe outlines the steps for running a llama3-1-70b-seq8192-gbs2048-mbs1-gpus256 pretraining
+This recipe outlines the steps for running a $USER-a4-llama3-1-70b pretraining
 workload on [a4 GKE Node pools](https://cloud.google.com/kubernetes-engine) by using the
 [NVIDIA NeMo framework](https://github.com/NVIDIA/nemo).
 
@@ -89,7 +89,7 @@ your client:
 
     ```bash
     cd $RECIPE_ROOT
-    export WORKLOAD_NAME=$USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node
+    export WORKLOAD_NAME=$USER-a4-llama3-1-70b
     helm install $WORKLOAD_NAME . -f values.yaml \
     --set-file workload_launcher=launcher.sh \
     --set-file workload_config=llama3-1-70b-seq8192-gbs2048-mbs1-gpus256.py \
@@ -107,7 +107,7 @@ your client:
 
     ```bash
     cd $RECIPE_ROOT
-    export WORKLOAD_NAME=$USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node
+    export WORKLOAD_NAME=$USER-a4-llama3-1-70b
     helm install $WORKLOAD_NAME . -f values.yaml \
     --set-file workload_launcher=launcher.sh \
     --set-file workload_config=llama3-1-70b-seq8192-gbs2048-mbs1-gpus256.py \
@@ -124,12 +124,12 @@ your client:
 To check the status of pods in your job, run the following command:
 
 ```
-kubectl get pods | grep $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node
+kubectl get pods | grep $USER-a4-llama3-1-70b
 ```
 
 Replace the following:
 
-- JOB_NAME_PREFIX - your job name prefix. For example $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node.
+- JOB_NAME_PREFIX - your job name prefix. For example $USER-a4-llama3-1-70b.
 
 To get the logs for one of the pods, run the following command:
 
@@ -141,13 +141,13 @@ Information about the training job's progress, including crucial details such as
 loss, step count, and step time, is generated by the rank 0 process.
 This process runs on the pod whose name begins with
 `JOB_NAME_PREFIX-workload-0-0`.
-For example: `$USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node-workload-0-0-s9zrv`.
+For example: `$USER-a4-llama3-1-70b-workload-0-0-s9zrv`.
 
 ### Uninstall the Helm release
 
 You can delete the job and other resources created by the Helm chart. To
 uninstall Helm, run the following command from your client:
 
 ```bash
-helm uninstall $USER-a4-llama3-1-70b-seq8192-gbs2048-mbs1-gpus256-32node
-```
+helm uninstall $USER-a4-llama3-1-70b
+```