Skip to content

Commit 0c28053

Browse files
author
Copybara
committed
Copybara import of gpu-recipes:
- 54fe6d92f32a01bdcfe19c4d2829896ef1d37b9f Single node TRT-LLM Benchmarking of Llama 3.1 405B GitOrigin-RevId: 54fe6d92f32a01bdcfe19c4d2829896ef1d37b9f
1 parent 611d8ff commit 0c28053

File tree

10 files changed

+842
-4
lines changed

10 files changed

+842
-4
lines changed

README.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,23 @@ Welcome to the reproducible benchmark recipes repository for GPUs! This reposito
3030

3131
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
3232
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
33-
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/maxtext-pretraining-gke/README.md)
34-
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/nemo-pretraining-gke/README.md)
35-
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/maxtext-pretraining-gke/README.md)
36-
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md) |
33+
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/maxtext-pretraining-gke/README.md)
34+
| **Llama-3.1-70B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/llama-3.1-70b/nemo-pretraining-gke/README.md)
35+
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/maxtext-pretraining-gke/README.md)
36+
| **Mixtral-8-7B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo | Pre-training | GKE | [Link](./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md) |
37+
38+
39+
### Inference benchmarks A3 Ultra
40+
41+
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
42+
| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |
43+
| **Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | TensorRT-LLM | Inference | GKE | [Link](./inference/a3ultra/llama-3.1-405b/trtllm-inference-gke/single-node/README.md)
3744

3845

3946
## Repository structure
4047

4148
* **[training/](./training)**: Contains recipes to reproduce training benchmarks with GPUs.
49+
* **[inference/](./inference)**: Contains recipes to reproduce inference benchmarks with GPUs.
4250
* **[src/](./src)**: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.
4351
* **[docs/](./docs)**: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.
4452

inference/a3ultra/llama-3.1-405b/trtllm-inference-gke/single-node/README.md

Lines changed: 381 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
clusterName:
16+
17+
huggingface:
18+
secretName: hf-secret
19+
secretData:
20+
token: "hf_api_token"
21+
22+
model:
23+
name: meta-llama/Llama-3.1-405B
24+
tp_size: 8
25+
pp_size: 1
26+
27+
job:
28+
ttlSecondsAfterFinished: 3600
29+
image:
30+
repository:
31+
tag:
32+
gpus: 8
33+
34+
volumes:
35+
ssdMountPath: "/ssd"
36+
gcsMounts:
37+
- bucketName:
38+
mountPath: "/gcs"
39+
40+
network:
41+
subnetworks[]:
42+
43+
benchmarks:
44+
experiments:
45+
# - isl: 1000
46+
# osl: 1000
47+
# num_requests: 3000
48+
- isl: 128
49+
osl: 128
50+
num_requests: 30000
51+
# - isl: 128
52+
# osl: 2048
53+
# num_requests: 3000
54+
# - isl: 128
55+
# osl: 4096
56+
# num_requests: 1500
57+
# - isl: 20000
58+
# osl: 2000
59+
# num_requests: 1000
60+
# - isl: 2048
61+
# osl: 128
62+
# num_requests: 3000
63+
# - isl: 2048
64+
# osl: 2048
65+
# num_requests: 1500
66+
# - isl: 500
67+
# osl: 2000
68+
# num_requests: 3000
69+
# - isl: 5000
70+
# osl: 500
71+
# num_requests: 1500
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
steps:
16+
- name: 'gcr.io/cloud-builders/docker'
17+
args:
18+
- 'build'
19+
- '--tag=${_ARTIFACT_REGISTRY}/${_TRT_LLM_IMAGE}:${_TRT_LLM_VERSION}'
20+
- '--file=trtllm.Dockerfile'
21+
- '.'
22+
automapSubstitutions: true
23+
24+
images:
25+
- '${_ARTIFACT_REGISTRY}/${_TRT_LLM_IMAGE}:${_TRT_LLM_VERSION}'
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
hf_transfer==0.1.9
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#
2+
# This file is autogenerated by pip-compile with Python 3.11
3+
# by the following command:
4+
#
5+
# pip-compile --generate-hashes requirements.in
6+
#
7+
hf-transfer==0.1.9 \
8+
--hash=sha256:035572865dab29d17e783fbf1e84cf1cb24f3fcf8f1b17db1cfc7fdf139f02bf \
9+
--hash=sha256:0d991376f0eac70a60f0cbc95602aa708a6f7c8617f28b4945c1431d67b8e3c8 \
10+
--hash=sha256:16f208fc678911c37e11aa7b586bc66a37d02e636208f18b6bc53d29b5df40ad \
11+
--hash=sha256:1a6bd16c667ebe89a069ca163060127a794fa3a3525292c900b8c8cc47985b0d \
12+
--hash=sha256:2c7fc1b85f4d0f76e452765d7648c9f4bfd0aedb9ced2ae1ebfece2d8cfaf8e2 \
13+
--hash=sha256:3a736dfbb2c84f5a2c975478ad200c0c8bfcb58a25a35db402678fb87ce17fa4 \
14+
--hash=sha256:3ebc4ab9023414880c8b1d3c38174d1c9989eb5022d37e814fa91a3060123eb0 \
15+
--hash=sha256:435cc3cdc8524ce57b074032b8fd76eed70a4224d2091232fa6a8cef8fd6803e \
16+
--hash=sha256:504b8427fd785dd8546d53b9fafe6e436bd7a3adf76b9dce556507650a7b4567 \
17+
--hash=sha256:57fd9880da1ee0f47250f735f791fab788f0aa1ee36afc49f761349869c8b4d9 \
18+
--hash=sha256:5828057e313de59300dd1abb489444bc452efe3f479d3c55b31a8f680936ba42 \
19+
--hash=sha256:5d561f0520f493c66b016d99ceabe69c23289aa90be38dd802d2aef279f15751 \
20+
--hash=sha256:6e94e8822da79573c9b6ae4d6b2f847c59a7a06c5327d7db20751b68538dc4f6 \
21+
--hash=sha256:8669dbcc7a3e2e8d61d42cd24da9c50d57770bd74b445c65123291ca842a7e7a \
22+
--hash=sha256:8674026f21ed369aa2a0a4b46000aca850fc44cd2b54af33a172ce5325b4fc82 \
23+
--hash=sha256:89a23f58b7b7effbc047b8ca286f131b17728c99a9f972723323003ffd1bb916 \
24+
--hash=sha256:8fd0167c4407a3bc4cdd0307e65ada2294ec04f1813d8a69a5243e379b22e9d8 \
25+
--hash=sha256:a5b366d34cd449fe9b20ef25941e6eef0460a2f74e7389f02e673e1f88ebd538 \
26+
--hash=sha256:cdca9bfb89e6f8f281890cc61a8aff2d3cecaff7e1a4d275574d96ca70098557 \
27+
--hash=sha256:d2fde99d502093ade3ab1b53f80da18480e9902aa960dab7f74fb1b9e5bc5746 \
28+
--hash=sha256:dc7fff1345980d6c0ebb92c811d24afa4b98b3e07ed070c8e38cc91fd80478c5 \
29+
--hash=sha256:e66acf91df4a8b72f60223059df3003062a5ae111757187ed1a06750a30e911b \
30+
--hash=sha256:e6ac4eddcd99575ed3735ed911ddf9d1697e2bd13aa3f0ad7e3904dd4863842e \
31+
--hash=sha256:ee8b10afedcb75f71091bcc197c526a6ebf5c58bbbadb34fdeee6160f55f619f \
32+
--hash=sha256:fc6bd19e1cc177c66bdef15ef8636ad3bde79d5a4f608c158021153b4573509d
33+
# via -r requirements.in
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
FROM nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
16+
WORKDIR /workspace
17+
18+
# Copy the directories
19+
COPY --from=nvcr.io/nvidia/pytorch:24.11-py3 /usr/local/lib/python3.12/dist-packages/functorch /usr/local/lib/python3.12/dist-packages/functorch
20+
COPY --from=nvcr.io/nvidia/pytorch:24.11-py3 /usr/local/lib/python3.12/dist-packages/triton /usr/local/lib/python3.12/dist-packages/triton
21+
22+
# GCSfuse components (used to provide shared storage, not intended for high performance)
23+
RUN apt update && apt install --yes --no-install-recommends \
24+
ca-certificates \
25+
curl \
26+
gnupg \
27+
cmake \
28+
&& echo "deb https://packages.cloud.google.com/apt gcsfuse-buster main" \
29+
| tee /etc/apt/sources.list.d/gcsfuse.list \
30+
&& echo "deb https://packages.cloud.google.com/apt cloud-sdk main" \
31+
| tee -a /etc/apt/sources.list.d/google-cloud-sdk.list \
32+
&& curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - \
33+
&& apt-get update \
34+
&& apt-get install --yes gcsfuse \
35+
&& apt-get install --yes google-cloud-cli \
36+
&& apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
37+
&& mkdir /gcs
38+
39+
RUN git clone -b v0.16.0 https://github.com/triton-inference-server/tensorrtllm_backend.git && \
40+
cd tensorrtllm_backend && \
41+
git submodule update --init --recursive && \
42+
git lfs install && \
43+
git lfs pull
44+
45+
COPY requirements.txt /workspace/requirements.txt
46+
RUN pip install --no-cache-dir -r requirements.txt
47+
48+
ENTRYPOINT [ "/bin/bash" ]
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v2
16+
name: trtllm-llama-3-1-405b-inference
17+
description: trtllm-llama-3-1-405b-inference
18+
type: application
19+
version: 0.1.0
20+
appVersion: "1.16.0"
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
apiVersion: v1
16+
kind: ConfigMap
17+
metadata:
18+
name: {{ .Release.Name }}-benchmark-script
19+
data:
20+
run_trtllm_bench.sh: |-
21+
#!/bin/bash
22+
23+
# Function to run benchmarks
24+
run_benchmark() {
25+
local model_name=$1
26+
local isl=$2
27+
local osl=$3
28+
local num_requests=$4
29+
local tp_size=$5
30+
31+
echo "Running benchmark for $model_name with ISL=$isl, OSL=$osl, TP=$tp_size"
32+
33+
dataset_file="/ssd/token-norm-dist_${model_name##*/}_${isl}_${osl}_tp${tp_size}.json"
34+
output_file="/ssd/output_${model_name##*/}_isl${isl}_osl${osl}_tp${tp_size}.txt"
35+
36+
python3 /workspace/tensorrtllm_backend/tensorrt_llm/benchmarks/cpp/prepare_dataset.py --tokenizer=$model_name --stdout token-norm-dist \
37+
--num-requests=$num_requests --input-mean=$isl --output-mean=$osl \
38+
--input-stdev=0 --output-stdev=0 > $dataset_file
39+
40+
pp_size=1
41+
42+
trtllm-bench --model $model_name --model_path /ssd/${model_name} --workspace /ssd build \
43+
--tp_size $tp_size --quantization FP8 --dataset $dataset_file
44+
45+
engine_dir="/ssd/${model_name}/tp_${tp_size}_pp_${pp_size}"
46+
47+
# Save throughput output to a file
48+
trtllm-bench --model $model_name --model_path /ssd/${model_name} throughput \
49+
--dataset $dataset_file --engine_dir $engine_dir \
50+
--kv_cache_free_gpu_mem_fraction 0.95 > $output_file
51+
52+
cat $output_file
53+
gsutil cp $output_file /gcs/benchmark_logs/
54+
55+
rm -rf $engine_dir
56+
rm -f $dataset_file
57+
}
58+
59+
# Generated benchmark executions
60+
model_name="{{ .Values.model.name }}"
61+
tp_size={{ .Values.model.tp_size }}
62+
63+
{{- range .Values.benchmarks.experiments }}
64+
run_benchmark "$model_name" {{ .isl }} {{ .osl }} {{ .num_requests }} $tp_size
65+
{{- end }}

0 commit comments

Comments
 (0)