Skip to content

Commit 0862ac3

Browse files
committed
Add instructions for using custom yaml config for collective benchmark
1 parent 0c081bf commit 0862ac3

File tree

1 file changed

+19
-0
lines changed
  • microbenchmarks/trillium/collectives

1 file changed

+19
-0
lines changed

microbenchmarks/trillium/collectives/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,23 @@ Results will be printed out and also stored at `/tmp/microbenchmarks/collectives
3636
gsutil cp -r /tmp/microbenchmarks/collectives gs://<your-gcs-bucket>
3737
```
3838

39+
### Run with a custom yaml config
40+
If you would like to run with a custom defined yaml with modified configurations (e.g. warmup_tries, tries, matrix_dim_range) you may do so by uploading it to a GCS bucket, pulling the yaml file from GCS in the workload, and then referencing the yaml file in the benchmark command.
3941

42+
Start by creating a yaml file `your_config.yaml`. Take a look at [1x_v6e_256.yaml](https://github.com/AI-Hypercomputer/accelerator-microbenchmarks/blob/35c10a42e8cfab7593157327dd3ad3150e4c001d/configs/1x_v6e_256.yaml) for an example yaml config. Then upload it to your GCS bucket:
43+
```
44+
gsutil cp your_config.yaml gs://<your-gcs-bucket>
45+
```
46+
47+
Then use a modified launch command that pulls the yaml file from GCS and references it in the benchmark command:
48+
```
49+
python3 ~/xpk/xpk.py workload create \
50+
--cluster=${CLUSTER_NAME} \
51+
--project=${PROJECT} \
52+
--zone=${ZONE} \
53+
--device-type=v6e-256 \
54+
--command="git clone https://github.com/AI-Hypercomputer/accelerator-microbenchmarks.git && cd accelerator-microbenchmarks && git checkout trillium-collectives && pip install -r requirements.txt && echo '4096 41943040 314572800' > /proc/sys/net/ipv4/tcp_rmem && export LIBTPU_INIT_ARGS='--megascale_grpc_premap_memory_bytes=17179869184 --xla_tpu_enable_sunk_dcn_allreduce_done_with_host_reduction=true' && gsutil cp gs://<your-gcs-bucket>/your_config.yaml configs/ && python src/run_benchmark.py --config=configs/your_config.yaml" \
55+
--num-slices=1 \
56+
--docker-image=us-docker.pkg.dev/cloud-tpu-images/jax-stable-stack/tpu:jax0.5.2-rev1 \
57+
--workload=${WORKLOAD_NAME}
58+
```

0 commit comments

Comments
 (0)