Skip to content

Commit 0e8ade8

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- 62f9bf2fea42bda44b4073f7b6af8a03dd2b738a Adding Llama3.1-405B NeMo pretraining recipe - ac440a8e8ab0e64b40e18784f2ae5729ff139769 Single A3U node serving of Deepseek R1 671B using SGLang - d97898ffea46f98ba0c3093b3449d7b58a6b6adf Merge "Delete llama-3.1-405b/nemo-pretraining-gke recipe"... GitOrigin-RevId: d97898ffea46f98ba0c3093b3449d7b58a6b6adf
1 parent 0c28053 commit 0e8ade8

File tree

16 files changed

+1212
-12
lines changed

16 files changed

+1212
-12
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
4141
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
4242
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
4343
| **Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a3ultra/llama-3.1-405b/trtllm-inference-gke/single-node/README.md)
44+
| **DeepSeek R1 671B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | SGLang | Inference | GKE | [Link](./inference/a3ultra/deepseek-r1-671b/sglang-serving-gke/README.md)
4445

4546

4647
## Repository structure

docs/configuring-environment-gke-a3-ultra.md

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Before you begin, ensure you have completed the following:
2020

2121
3. Requested enough GPU quotas. Each `a3-ultragpu-8g` machine has 8 H200 GPUs attached.
2222
1. To view quotas, see [View the quotas for your project](/docs/quotas/view-manage).
23-
In the Filter field, select **Dimensions(e.g location)** and
23+
In the Filter field, select **Dimensions(e.g location)** and
2424
specify [`gpu_family:NVIDIA_H200`](https://cloud.google.com/compute/resource-usage#gpu_quota).
2525
1. If you don't have enough quota, [request a higher quota](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).
2626

@@ -29,7 +29,7 @@ Before you begin, ensure you have completed the following:
2929
The environment comprises of the following components:
3030

3131
- Client workstation: this is used to prepare, submit, and monitor ML workloads.
32-
- [Google Cloud Storage (GCS) Bucket](https://cloud.google.com/storage/docs): used for storing
32+
- [Google Cloud Storage (GCS) Bucket](https://cloud.google.com/storage/docs): used for storing
3333
datasets and logs.
3434
- [Artifact Registry](https://cloud.google.com/artifact-registry/docs/overview): serves as a
3535
private container registry for storing and managing Docker images used in the deployment.
@@ -48,9 +48,9 @@ comes with all necessary components pre-installed.
4848

4949
### Local client
5050
If you prefer to use your local machine, ensure your local machine has the following
51-
components installed.
51+
components installed.
5252

53-
1. Google Cloud SDK. To install, see
53+
1. Google Cloud SDK. To install, see
5454
[Install the gcloud CLI](https://cloud.google.com/sdk/docs/install).
5555
2. kubectl. To install, see the
5656
[kuberenetes documentation](https://kubernetes.io/docs/tasks/tools/#kubectl).
@@ -71,6 +71,19 @@ Replace the following:
7171
- `BUCKET_LOCATION`: the location of your bucket. The bucket must be located in
7272
the same region as the GKE cluster.
7373

74+
Add IAM binding to allow workloads authenticated via a workload identity (with the default service account) to access Cloud Storage objects.
75+
76+
```bash
77+
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
78+
gcloud storage buckets add-iam-policy-binding gs://<BUCKET_NAME> \
79+
--role=roles/storage.objectUser \
80+
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/default \
81+
--condition=None
82+
```
83+
Replace the following:
84+
85+
- `BUCKET_NAME`: the name of your bucket created in the previous step
86+
7487
## Set up an Artifact Registry
7588
7689
- If you use Cloud KMS for repository encryption, create your artifact registry by using the
@@ -84,19 +97,19 @@ Replace the following:
8497
--description="<DESCRIPTION>" \
8598
```
8699
Replace the following:
87-
100+
88101
- `REPOSITORY`: the name of the repository. For each repository location in a project,
89102
repository names must be unique.
90103
- `LOCATION`: the regional or multi-regional location for the repository. You can omit this
91-
flag if you set a default region.
104+
flag if you set a default region.
92105
- `DESCRIPTION`: a description of the repository. Don't include sensitive data because
93106
repository descriptions are not encrypted.
94107

95108

96109
## Create a GKE Cluster with A3 Ultra Node Pools
97110

98-
Follow [this guide](https://cloud.google.com/ai-hypercomputer/docs/create/gke-ai-hypercompute) for
99-
detailed instructions to create a GKE cluster with A3 Ultra node pools, GPUDirect-RDMA and required GPU driver versions.
111+
Follow [this guide](https://cloud.google.com/ai-hypercomputer/docs/create/gke-ai-hypercompute) for
112+
detailed instructions to create a GKE cluster with A3 Ultra node pools, GPUDirect-RDMA and required GPU driver versions.
100113

101114
The documentation uses [ Cluster Toolkit](https://cloud.google.com/cluster-toolkit/docs/overview) to create your GKE cluster quickly while incorporating best practices:
102115

@@ -108,11 +121,11 @@ The documentation uses [ Cluster Toolkit](https://cloud.google.com/cluster-toolk
108121
## What's next
109122

110123
Once you have set up your GKE cluster with A3 Ultra node pools, you can proceed to deploy and
111-
run your [benchmark recipes](../README.md#benchmarks-support-matrix).
124+
run your [benchmark recipes](../README.md#benchmarks-support-matrix).
112125

113126
## Get Help
114127

115-
If you encounter any issues or have questions about this setup, use one of the following
128+
If you encounter any issues or have questions about this setup, use one of the following
116129
resources:
117130

118131
- Consult the [official GKE documentation](https://cloud.google.com/kubernetes-engine/docs).

0 commit comments

Comments
 (0)