Skip to content

Commit e2c000c

Browse files
authored
Add tutorial example (#548)
* Add initial tutorial example for 24.03 release of GenAI-Perf * Update release example date * Clarify steps 4 & 5 by combining them. * Clean up wording and genai-perf command * Align command line example with args in main instead of frozen 24.03
1 parent d69c60b commit e2c000c

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
<!--
2+
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
4+
Redistribution and use in source and binary forms, with or without
5+
modification, are permitted provided that the following conditions
6+
are met:
7+
* Redistributions of source code must retain the above copyright
8+
notice, this list of conditions and the following disclaimer.
9+
* Redistributions in binary form must reproduce the above copyright
10+
notice, this list of conditions and the following disclaimer in the
11+
documentation and/or other materials provided with the distribution.
12+
* Neither the name of NVIDIA CORPORATION nor the names of its
13+
contributors may be used to endorse or promote products derived
14+
from this software without specific prior written permission.
15+
16+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Benchmarking LLM
30+
31+
The following guide provides an example on how to use GenAI-Perf
32+
to measure and characterize the performance behaviors of Large Language Models
33+
(LLMs).
34+
35+
## Setup: Download and configure Triton Server and Client environment
36+
37+
1. In a clean working directory, start by executing the following code snippet
38+
to create a triton model repository:
39+
40+
```bash
41+
MODEL_REPO="${PWD}/models"
42+
MODEL_NAME="gpt2_vllm"
43+
44+
mkdir -p $MODEL_REPO/$MODEL_NAME/1
45+
echo '{
46+
"model":"gpt2",
47+
"disable_log_requests": "true",
48+
"gpu_memory_utilization": 0.5
49+
}' >$MODEL_REPO/$MODEL_NAME/1/model.json
50+
51+
echo 'backend: "vllm"
52+
instance_group [
53+
{
54+
count: 1
55+
kind: KIND_MODEL
56+
}
57+
]' >$MODEL_REPO/$MODEL_NAME/config.pbtxt
58+
```
59+
60+
2. Download the pre-built Triton Server Container with the vLLM backend from the
61+
[NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver)
62+
registry.
63+
64+
```bash
65+
export RELEASE=<yy.mm> # e.g. to use the release from the end of April of 2024, do `export RELEASE=24.04`
66+
67+
docker pull nvcr.io/nvidia/tritonserver:${RELEASE}-vllm-python-py3
68+
```
69+
70+
3. Launch the triton container to serve the gpt2_vllm model.
71+
72+
```bash
73+
docker run -it --gpus all --net=host --rm --shm-size=1G --ulimit memlock=-1 \
74+
--ulimit stack=67108864 \
75+
-v ${PWD}:/work -w /work \
76+
nvcr.io/nvidia/tritonserver:${RELEASE}-vllm-python-py3 \
77+
tritonserver --model-repository ./models
78+
```
79+
80+
4. In a separate terminal window, download the pre-built Triton SDK Container
81+
which includes GenAI-Perf and Performance Analyzer from the
82+
[NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver)
83+
registry and launch the container.
84+
85+
```bash
86+
export RELEASE=<yy.mm>
87+
docker run --gpus all -it --net host --shm-size=1g --ulimit stack=67108864 \
88+
nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk \
89+
bash
90+
```
91+
92+
5. Profile GPT2 via GenAI-Perf
93+
94+
```bash
95+
genai-perf -m gpt2_vllm --backend vllm --streaming
96+
```
97+
98+
By default, all metrics will saved in the `profile_export_genai_perf.csv` file.

0 commit comments

Comments
 (0)