argonne-lcf
diff --git a/‎AuroraBugTracking‎ b/‎AuroraBugTracking‎
diff --git a/‎docs/CODEOWNERS‎
Lines changed: 1 addition & 1 deletion b/‎docs/CODEOWNERS‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/ai-testbed/cerebras/csl.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/ai-testbed/cerebras/csl.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/ai-testbed/cerebras/customizing-environment.md‎
Lines changed: 7 additions & 7 deletions b/‎docs/ai-testbed/cerebras/customizing-environment.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎docs/ai-testbed/cerebras/example-programs.md‎
Lines changed: 28 additions & 28 deletions b/‎docs/ai-testbed/cerebras/example-programs.md‎
Lines changed: 28 additions & 28 deletions
diff --git a/‎docs/ai-testbed/cerebras/index.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/ai-testbed/cerebras/index.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/ai-testbed/cerebras/miscellaneous.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/ai-testbed/cerebras/miscellaneous.md‎
Lines changed: 2 additions & 2 deletions
@@ -95,7 +95,7 @@ aurora/aurora-pe.md                @koysean
 **/filesystem-and-storage/          @kevin-harms
 
 # All container documentation
-**/containers/                      @bcote-anl  # ?
+#### **/containers/                      @bcote-anl  # ?
 
 # All debugger documentation
 **/debugging*/*                     @jkwack
 
@@ -89,7 +89,7 @@ Example script to forward port 8000 to localhost 8008:
 export SDK_PORT=8000
 export LOCAL_PORT=8008
 export ALCFUserID=<your alcf username>
-ssh -L $LOCAL_PORT:localhost:$LOCAL_PORT $ALCFUserID@cer-login-04.ai.alcf.anl.gov -t ssh -L $LOCAL_PORT:localhost:$SDK_PORT -N cer-anl-net001-us-sr01
+ssh -L $LOCAL_PORT:localhost:$LOCAL_PORT $ALCFUserID@cerebras.alcf.anl.gov -t ssh -L $LOCAL_PORT:localhost:$SDK_PORT -N cer-anl-net001-us-sr01
 ```
 
 Then open the following URL in your web browser:  `http://localhost:8008/sdk-gui/`
@@ -114,8 +114,8 @@ pip install --upgrade pip
 
 **Install SDK Packages:** Install the `cerebras_appliance` and `cerebras_sdk` Python packages in the virtual environment, specifying the appropriate Cerebras Software release:
 ```bash linenums="1"
-pip install cerebras_appliance==2.6.0
-pip install cerebras_sdk==2.6.0
+pip install cerebras_appliance==2.9.0
+pip install cerebras_sdk==2.9.0
 ```
 
 ### Examples
 
@@ -4,16 +4,16 @@
 
 #### To make a PyTorch virtual environment for Cerebras
 
-Clone the Cerebras modelzoo, if it is not already cloned. Check out the R 2.6.0 release.
+Clone the Cerebras modelzoo, if it is not already cloned. Check out the R 2.9.0 release.
 
 ```console
-mkdir ~/R_2.6.0
-cd ~/R_2.6.0
+mkdir ~/R_2.9.0
+cd ~/R_2.9.0
 export HTTPS_PROXY=http://proxy.alcf.anl.gov:3128
 git clone https://github.com/Cerebras/modelzoo.git
 cd modelzoo
 git tag
-git checkout Release_2.6.0
+git checkout Release_2.9.0
 ```
 Note: a `git pull` will not update the tags; if `modelzoo/setup.py` does not exist after tag checkout, please re-clone `modelzoo`.
 
@@ -26,8 +26,8 @@ export https_proxy=http://proxy.alcf.anl.gov:3128
 Then build the virtual environment
 
 ```console
-mkdir ~/R_2.6.0
-cd ~/R_2.6.0
+mkdir ~/R_2.9.0
+cd ~/R_2.9.0
 # Note: "deactivate" does not actually work in scripts.
 deactivate
 rm -r venv_cerebras_pt
@@ -46,7 +46,7 @@ pip install -e modelzoo
 To activate a virtual environment
 
 ```console
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 
 To deactivate a virtual environment,
 
@@ -4,13 +4,13 @@
 Make a working directory and a local copy of the Cerebras **modelzoo** repository, if not previously done, as follows.
 
 ```bash
-mkdir ~/R_2.6.0
-cd ~/R_2.6.0
+mkdir ~/R_2.9.0
+cd ~/R_2.9.0
 export HTTPS_PROXY=http://proxy.alcf.anl.gov:3128
 git clone https://github.com/Cerebras/modelzoo.git
 cd modelzoo
 git tag
-git checkout Release_2.6.0
+git checkout Release_2.9.0
 ```
 
 Note: to access any external web resources from a Cerebras user node, you will need to have a proxy environment variable set (or equivalent). `wget` needs the lower-case proxy environment variable.
@@ -43,17 +43,17 @@ To run Unet with the <a href="https://www.kaggle.com/c/severstal-steel-defect-de
 First, source a Cerebras PyTorch virtual environment.
 
 ```console
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 
 Then
 
 ```console
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/nlp/bert
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/nlp/bert
 cp /software/cerebras/dataset/severstal-steel-defect-detection/params_severstal_binary_rawds.yaml configs/params_severstal_binary_rawds.yaml
 export MODEL_DIR=model_dir_unet
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
-python run.py CSX --job_labels name=unet_pt --params configs/params_severstal_binary_rawds.yaml --model_dir $MODEL_DIR --mode train --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.6.0/modelzoo/ --compile_dir $(whoami) |& tee mytest.log 
+python run.py CSX --job_labels name=unet_pt --params configs/params_severstal_binary_rawds.yaml --model_dir $MODEL_DIR --mode train --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.9.0/modelzoo/ --compile_dir $(whoami) |& tee mytest.log 
 ```
 --->
 
@@ -68,7 +68,7 @@ The BraggNN model has two versions:<br>
 
 ```console
 TODO
-cd ~/R_2.6.0/anl_shared/braggnn/tf
+cd ~/R_2.9.0/anl_shared/braggnn/tf
 # This yaml has a correct path to a BraggNN dataset
 cp /software/cerebras/dataset/BraggN/params_bragg_nonlocal_sampleds.yaml configs/params_bragg_nonlocal_sampleds.yaml
 export MODEL_DIR=model_dir_braggnn
@@ -88,23 +88,23 @@ source /software/cerebras/venvs/venv_cerebras_pt/bin/activate
 # or your personal venv
 --->
 ```console
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 
 Then
 
 ```console
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/nlp/bert
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/nlp/bert
 cp /software/cerebras/dataset/bert_large/bert_large_MSL128_sampleds.yaml configs/bert_large_MSL128_sampleds.yaml
 export MODEL_DIR=model_dir_bert_large_pytorch
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
 cszoo fit configs/bert_large_MSL128_sampleds.yaml --job_labels name=bert_pt --model_dir $MODEL_DIR |& tee mytest.log
 ```
 <!---
 previously,
-python run.py CSX --job_labels name=bert_pt --params configs/bert_large_MSL128_sampleds.yaml --num_workers_per_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software/ --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src --compile_dir $(whoami) |& tee mytest.log
+python run.py CSX --job_labels name=bert_pt --params configs/bert_large_MSL128_sampleds.yaml --num_workers_per_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software/ --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src --compile_dir $(whoami) |& tee mytest.log
 --->
-Note: the vocabulary file referenced in `/software/cerebras/dataset/bert_large/bert_large_MSL128_sampleds.yaml` is the same as the one at `/home/$(whoami)/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/vocab/google_research_uncased_L-12_H-768_A-12.txt`. 
+Note: the vocabulary file referenced in `/software/cerebras/dataset/bert_large/bert_large_MSL128_sampleds.yaml` is the same as the one at `/home/$(whoami)/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/vocab/google_research_uncased_L-12_H-768_A-12.txt`. 
 
 The last parts of the output should resemble the following, with messages about cuda that should be ignored and are not shown.
 
@@ -130,13 +130,13 @@ This PyTorch GPT-J 6B parameter pretraining sample uses 1 CS3.
 First, source a Cerebras PyTorch virtual environment.
 
 ```console
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 
 Then
 
 ```console
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/nlp/gptj
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/nlp/gptj
 cp /software/cerebras/dataset/gptj/params_gptj_6B_sampleds.yaml configs/params_gptj_6B_sampleds.yaml
 export MODEL_DIR=model_dir_gptj
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
@@ -147,7 +147,7 @@ Note: the validation has been commented out of the yaml to decrease the run time
 
 <!---
 Previously,
-python run.py CSX --job_labels name=gptj_pt --params configs/params_gptj_6B_sampleds.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src --compile_dir $(whoami) |& tee mytest.log
+python run.py CSX --job_labels name=gptj_pt --params configs/params_gptj_6B_sampleds.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src --compile_dir $(whoami) |& tee mytest.log
 --->
 
 The last parts of the output should resemble the following:
@@ -162,7 +162,7 @@ The last parts of the output should resemble the following:
 2025-10-10 20:20:51,668 INFO:   Saved checkpoint model_dir_gptj/checkpoint_200.mdl
 2025-10-10 20:21:14,280 INFO:   Training completed successfully!
 2025-10-10 20:21:14,286 INFO:   Processed 24000 training sample(s) in 1443.67300221 seconds.
-/home/arnoldw/R_2.6.0/venv_cerebras_pt/lib/python3.8/site-packages/pydantic/_internal/_gener
+/home/arnoldw/R_2.9.0/venv_cerebras_pt/lib/python3.8/site-packages/pydantic/_internal/_gener
 ```
 
 ## Llama2-7B 
@@ -171,11 +171,11 @@ The Cerebras llama2 7B model implementation can be found at modelzoo/modelzoo/tr
 
 First, source a Cerebras PyTorch virtual environment.
 ```bash
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 Instructions for training:
 ```bash
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/nlp/llama
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/nlp/llama
 cp /software/cerebras/dataset/params_llama2_7b.yaml configs/params_llama2_7b.yaml
 export MODEL_DIR=model_dir_llama2_7b
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
@@ -185,7 +185,7 @@ cszoo fit configs/params_llama2_7b.yaml --job_labels name=llama2_7b --model_dir
 Note: the validation has been commented out of the yaml to decrease the run time of this sample. To run validation, uncomment the validation sections at the end of `configs/params_llama2_7b.yaml`. 
 <!--
 Formerly,
-python run.py CSX --job_labels name=llama2_7b --params configs/params_llama2_7b.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /projects /home/ /software --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src  --compile_dir $(whoami) |& tee mytest.log
+python run.py CSX --job_labels name=llama2_7b --params configs/params_llama2_7b.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /projects /home/ /software --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src  --compile_dir $(whoami) |& tee mytest.log
 -->
 
 Please find a sample output
@@ -230,11 +230,11 @@ The Cerebras ESM-2 model implementation can be found at `modelzoo/src/cerebras/m
 
 First, source a Cerebras PyTorch virtual environment.
 ```bash
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 Instructions for training (for 400 steps):
 ```bash
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/nlp/esm2
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/nlp/esm2
 cp /software/cerebras/dataset/ESM-2/params_esm2_t12_35M_UR50D_modified.yaml configs/params_esm2_t12_35M_UR50D_modified.yaml
 export MODEL_DIR=model_dir_esm2
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
@@ -243,7 +243,7 @@ cszoo fit configs/params_esm2_t12_35M_UR50D_modified.yaml --job_labels name=esm2
 
 <!--
 Formerly,
-python run.py CSX --job_labels name=esm2_t12_35m --params configs/params_esm2_t12_35M_UR50D_modified.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/$(whoami)/ /software --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src --compile_dir /$(whoami) |& tee mytest.log
+python run.py CSX --job_labels name=esm2_t12_35m --params configs/params_esm2_t12_35M_UR50D_modified.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/$(whoami)/ /software --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src --compile_dir /$(whoami) |& tee mytest.log
 -->
 
 Note: the validation has been commented out of the yaml to decrease the run time of this sample. To run validation, uncomment the validation sections at the end of `configs/params_esm2_t12_35M_UR50D_modified.yaml`. 
@@ -273,27 +273,27 @@ Saving checkpoint: 100%|██████████████████
 Saving checkpoint: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1321/1321 [00:08<00:00, 154.35 tensors/s]
 2025-10-10 23:45:54,994 INFO:   Saved checkpoint model_dir_esm2/checkpoint_400.mdl
 2025-10-10 23:46:01,812 INFO:   Training completed successfully!
-2025-10-10 23:46:01,861 INFO:   Processed 819200 training sample(s) in 4049.286902367 seconds.
+2025-10-10 23:46:01,861 INFO:   Processed 819200 training sample(s) in 4049.286902367 seconds
 ```
 
 ## Vision Transformer
 The cerebras transformer based vision classifier model implementation can be found at `modelzoo/models/vision/vision_transformer`. Configs for base and huge model of the vision transformer can be found at `modelzoo/models/vision/vision_transformer/configs`. This examples uses the ImageNet dataset preprocessed at path `/software/datasets/imagenet/`. 
 
 First, source a Cerebras PyTorch virtual environment.
 ```bash
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 Instructions for training (for 400 steps):
 ```bash
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/vision/vision_transformer
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/vision/vision_transformer
 cp /software/cerebras/dataset/vision_transformer/params_vit_base_patch_16_imagenet_1k.yaml configs/params_vit_base_patch_16_imagenet_1k.yaml
 export MODEL_DIR=model_dir_vit
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
 cszoo fit configs/params_vit_base_patch_16_imagenet_1k.yaml --job_labels name=vision_transformer --model_dir $MODEL_DIR |& tee mytest.log
 ```
 <!--
 Formerly,
-python run.py CSX --job_labels name=vision_transformer --params configs/params_vit_base_patch_16_imagenet_1k.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/$(whoami)/ /software --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src --compile_dir /$(whoami) |& tee mytest.log
+python run.py CSX --job_labels name=vision_transformer --params configs/params_vit_base_patch_16_imagenet_1k.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/$(whoami)/ /software --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src --compile_dir /$(whoami) |& tee mytest.log
 -->
 
 Note: the validation has been commented out of the yaml to decrease the run time of this sample. To run validation, uncomment the validation sections at the end of `configs/params_vit_base_patch_16_imagenet_1k.yaml`. 
@@ -345,20 +345,20 @@ The Cerebras Diffusion Transformer[[1](https://arxiv.org/pdf/2212.09748.pdf)] mo
 
 First, source a Cerebras PyTorch virtual environment.
 ```bash
-source ~/R_2.6.0/venv_cerebras_pt/bin/activate
+source ~/R_2.9.0/venv_cerebras_pt/bin/activate
 ```
 
 Instructions for training (for 400 steps):
 ```bash
-cd ~/R_2.6.0/modelzoo/src/cerebras/modelzoo/models/vision/dit
+cd ~/R_2.9.0/modelzoo/src/cerebras/modelzoo/models/vision/dit
 cp /software/cerebras/dataset/params_dit_2B_patchsize_2x2_modified.yaml configs/params_dit_2B_patchsize_2x2_modified.yaml
 export MODEL_DIR=model_dir_dit
 if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
 cszoo fit configs/params_dit_2B_patchsize_2x2_modified.yaml --job_labels name=DiT --model_dir $MODEL_DIR |& tee mytest.log
 ```
 <!---
 Formerly:
-python run.py CSX --job_labels name=DiT --mode train --params configs/params_dit_2B_patchsize_2x2_modified.yaml --python_paths /home/$(whoami)/R_2.6.0/modelzoo/src --model_dir ${MODEL_DIR} |& tee mytest.log
+python run.py CSX --job_labels name=DiT --mode train --params configs/params_dit_2B_patchsize_2x2_modified.yaml --python_paths /home/$(whoami)/R_2.9.0/modelzoo/src --model_dir ${MODEL_DIR} |& tee mytest.log
 --->
 
 ???+ example "Example output:"
 
@@ -6,15 +6,15 @@ The ALCF CS-3 Cerebras Wafer-Scale Cluster, is designed to support large-scale m
 
 The Cerebras Wafer-Scale cluster is run as an appliance: a user submits a job to the appliance, and the appliance manages preprocessing and streaming of the data, IO, and device orchestration within the appliance. It provides programming via PyTorch. This installation supports Weight Streaming execution for models being pre-trained or fine-tuned.
 
-The public Cerebras documentation is available [here](https://training-docs.cerebras.ai/rel-2.6.0/getting-started/overview).
+The public Cerebras documentation is available [here](https://training-docs.cerebras.ai/rel-2.9.0/getting-started/overview).
 
 A typical Cerebras Wafer-Scale Cluster is shown in the figure below. Users connect via SSH to the login node, `cerebras.alcf.anl.gov` and then ssh to a user node, using either  `cer-usn-01` or `cer-usn-02`. 
 <!--- The rest of the nodes in the cluster infrastructure are not directly accessible, except by admins.--> 
 The trees `/home`, `/projects`, and `/software` are shared across the login nodes and user nodes, the relevant cluster infrastructure nodes, and all ALCF AI testbed platforms.
 
 ![CS-3 cluster figure](files/topology-of-weight-streaming-on-wsc.png)
 /// caption
-Figure: topology of CS-3 cluster ([source](https://training-docs.cerebras.ai/rel-2.6.0/concepts/cerebras-wafer-scale-cluster))
+Figure: topology of CS-3 cluster ([source](https://training-docs.cerebras.ai/rel-2.9.0/concepts/cerebras-wafer-scale-cluster))
 ///
 
 As indicated in the figure, which represent a CS-3 cluster with 4 CS-3 WSE, each of the CS-3 engines (marked at the right end corner of the figure) is responsible only for running and accelerating the computations for training and predictions with the model. The other work, including compilation, is performed on the input nodes, and the MemoryX nodes are used for weight storage and broadcast, and SwarmX nodes are used for gradient accumulation. 
@@ -3,12 +3,12 @@
 ## Porting applications to the CS-3
 
 Cerebras documentation for porting code to run on a Cerebras CS-3 system:<br>
-[Port Pytorch Models to Cerebras](https://training-docs.cerebras.ai/rel-2.6.0/model-zoo/migration/porting-pytorch-models-to-cerebras#port-pytorch-models-to-cerebras)
+[Port Pytorch Models to Cerebras](https://training-docs.cerebras.ai/rel-2.9.0/model-zoo/migration/porting-pytorch-models-to-cerebras#port-pytorch-models-to-cerebras)
 
 ## Finetuning a model using CS-3s
 
 The Cerebras tutorial for finetuning a model:<br>
-[Fine-Tune Your First Model](https://training-docs.cerebras.ai/rel-2.6.0/getting-started/fine-tune-your-first-model)
+[Fine-Tune Your First Model](https://training-docs.cerebras.ai/rel-2.9.0/getting-started/fine-tune-your-first-model)
 
 The tutorial covers how to: