Skip to content

Commit eeadb34

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- d5a7868d51785b830fa853506437141b120e57b5 Multi node inference of DeepSeek R1 on A3Mega (16xH100) GitOrigin-RevId: d5a7868d51785b830fa853506437141b120e57b5
1 parent 0c86ae4 commit eeadb34

File tree

10 files changed

+1211
-8
lines changed

10 files changed

+1211
-8
lines changed

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,16 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
3232
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
3333
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/maxtext-pretraining-gke/README.md)
3434
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/nemo-pretraining-gke/README.md)
35-
| **Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-405b/nemo-pretraining-gke/README.md)
3635
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/maxtext-pretraining-gke/README.md)
3736
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md) |
3837

3938

39+
### Inference benchmarks A3 Mega
40+
41+
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
42+
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
43+
| **DeepSeek R1 671B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | SGLang | Inference | GKE | [Link](./inference/a3mega/deepseek-r1-671b/sglang-serving-gke/README.md)
44+
4045
### Inference benchmarks A3 Ultra
4146

4247
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |

docs/configuring-environment-gke-a3-ultra.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,14 @@ Replace the following:
7373

7474
Add IAM binding to allow workloads authenticated via a workload identity (with the default service account) to access Cloud Storage objects.
7575

76-
```bash
76+
```bash
7777
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
7878
gcloud storage buckets add-iam-policy-binding gs://<BUCKET_NAME> \
79-
--role=roles/storage.objectUser \
80-
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/default \
81-
--condition=None
79+
--role=roles/storage.objectUser \
80+
--member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/default/sa/default \
81+
--condition=None
8282
```
83+
8384
Replace the following:
8485

8586
- `BUCKET_NAME`: the name of your bucket created in the previous step

inference/a3mega/deepseek-r1-671b/sglang-serving-gke/README.md

Lines changed: 347 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/bin/bash
2+
3+
# Copyright 2025 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
18+
[ $# -eq 0 ] && {
19+
echo "Error: No prompt provided."
20+
echo "Usage: $0 \"Your prompt here\""
21+
exit 1
22+
}
23+
24+
start_time=$(date +%s.%N)
25+
temp_file="/tmp/temp_response.txt"
26+
27+
# format JSON payload to send to the model with streaming enabled
28+
json_payload=$(jq -n \
29+
--arg prompt "$1" \
30+
'{
31+
model: "default",
32+
messages: [
33+
{role: "system", content: "You are a helpful AI assistant"},
34+
{role: "user", content: $prompt}
35+
],
36+
temperature: 0.6,
37+
top_p: 0.95,
38+
max_tokens: 2048,
39+
stream: true
40+
}')
41+
42+
echo "Streaming response:"
43+
echo "----------------"
44+
45+
# Send the request to the model and stream the response
46+
curl -sN "http://localhost:30000/v1/chat/completions" \
47+
-H "Content-Type: application/json" \
48+
-d "$json_payload" | while IFS= read -r line; do
49+
[[ -z $line ]] && continue
50+
51+
line=${line#data: }
52+
[[ $line == "[DONE]" ]] && continue
53+
54+
content=$(jq -r '.choices[0].delta.content // empty' <<< "$line")
55+
[[ -n $content ]] && {
56+
echo -n "$content"
57+
echo -n "$content" >> "$temp_file"
58+
}
59+
done
60+
61+
echo -e "\n\n----------------"
62+
63+
[[ ! -s $temp_file ]] && {
64+
echo "Error: No response received from the API or an error occurred during streaming." >&2
65+
rm -f "$temp_file"
66+
exit 1
67+
}
68+
69+
# Parse the response and extract the reasoning and final answer
70+
full_content=$(<"$temp_file")
71+
72+
[[ $full_content =~ \<think\>([[:print:][:space:]]*)\</think\> ]] && \
73+
reasoning="${BASH_REMATCH[1]}" || reasoning=""
74+
75+
final_answer=$(sed 's/.*<\/think>//; s/^[[:space:]]*//; s/[[:space:]]*$//' <<< "$full_content")
76+
77+
execution_time=$(bc <<< "$(date +%s.%N) - $start_time")
78+
79+
echo -e "\nParsed Results:"
80+
echo "----------------"
81+
echo -e "Reasoning:\n$reasoning"
82+
echo -e "\nFinal Answer:\n$final_answer"
83+
echo -e "\nExecution time: $execution_time seconds"
84+
85+
rm "$temp_file"
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
targetPlatform: "gke"
16+
17+
clusterName:
18+
queue:
19+
20+
huggingface:
21+
secretName: hf-secret
22+
secretData:
23+
token: "hf_api_token"
24+
25+
model:
26+
name: deepseek-ai/DeepSeek-R1
27+
tp_size: 16
28+
pp_size: 1
29+
30+
job:
31+
image:
32+
repository:
33+
tag:
34+
gpus: 16
35+
36+
volumes:
37+
ssdMountPath: "/ssd"
38+
gcsMounts:
39+
- bucketName:
40+
mountPath: "/gcs"
41+
42+
gpuPlatformSettings:
43+
useHostPlugin: false
44+
ncclPluginImage: "us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.8-1"
45+
rxdmImage: "us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpgpudmarxd-dev:v1.0.14"
46+
ncclBuildType: 223
47+
48+
network:
49+
ncclSettings:
50+
- name: NCCL_DEBUG
51+
value: "INFO"
52+
subnetworks[]:
53+
54+
sglang:
55+
replicaCount: 1
56+
57+
service:
58+
type: ClusterIP
59+
ports:
60+
http: 30000

src/docker/sglang/sglang.Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ RUN apt update && apt install --yes --no-install-recommends \
2323
curl \
2424
gnupg \
2525
cmake \
26+
dnsutils \
2627
&& echo "deb https://packages.cloud.google.com/apt gcsfuse-buster main" \
2728
| tee /etc/apt/sources.list.d/gcsfuse.list \
2829
&& echo "deb https://packages.cloud.google.com/apt cloud-sdk main" \
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v2
16+
name: sglang-deepseek-r1-671b-inference
17+
description: sglang-deepseek-r1-671b-inference
18+
type: application
19+
version: 0.1.0
20+
appVersion: "1.16.0"
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v1
16+
kind: Service
17+
metadata:
18+
name: {{ .Release.Name }}-svc
19+
spec:
20+
selector:
21+
app: {{ .Release.Name }}-serving
22+
ports:
23+
- name: http
24+
port: {{ .Values.sglang.service.ports.http }}
25+
targetPort: {{ .Values.sglang.service.ports.http }}
26+
selector:
27+
leaderworkerset.sigs.k8s.io/name: {{ .Release.Name }}
28+
role: leader
29+
type: {{ .Values.sglang.service.type }}

0 commit comments

Comments
 (0)