Skip to content

Commit 941ce54

Browse files
Merge pull request #1769 from oracle-devrel/llm-benchmarking-docker-compose
Add a tutorial testing LLMs on stand-alone GPU shapes using docker-compose.
2 parents d4d5cd2 + 745842c commit 941ce54

File tree

11 files changed

+771
-0
lines changed

11 files changed

+771
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Copyright (c) 2025 Oracle and/or its affiliates.
2+
3+
The Universal Permissive License (UPL), Version 1.0
4+
5+
Subject to the condition set forth below, permission is hereby granted to any
6+
person obtaining a copy of this software, associated documentation and/or data
7+
(collectively the "Software"), free of charge and under any and all copyright
8+
rights in the Software, and any and all patent rights owned or freely
9+
licensable by each licensor hereunder covering either (i) the unmodified
10+
Software as contributed to or provided by such licensor, or (ii) the Larger
11+
Works (as defined below), to deal in both
12+
13+
(a) the Software, and
14+
(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
one is included with the Software (each a "Larger Work" to which the Software
16+
is contributed by such licensors),
17+
18+
without restriction, including without limitation the rights to copy, create
19+
derivative works of, display, perform, and distribute the Software and make,
20+
use, sell, offer for sale, import, export, have made, and have sold the
21+
Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
either these or other terms.
23+
24+
This license is subject to the following condition:
25+
The above copyright notice and either this complete permission notice or at
26+
a minimum a reference to the UPL must be included in all copies or
27+
substantial portions of the Software.
28+
29+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
SOFTWARE.
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# LLM Benchmarking with Docker Compose
2+
3+
This repository demonstrates how to benchmark LLM with
4+
[vLLM](https://vllm.ai)
5+
and
6+
[genai-perf](https://docs.nvidia.com/nim/benchmarking/llm/latest/step-by-step.html#using-genai-perf-to-benchmark)
7+
using
8+
[Docker Compose](https://docs.docker.com/compose/).
9+
10+
Reviewed: 20.05.2025
11+
12+
# When should this asset be used?
13+
14+
* If you want to evaluate the performance of various LLM models or various shapes on OCI.
15+
16+
# How is this asset used?
17+
18+
## Prerequisites
19+
20+
* You have access to an Orcale Cloud Tenancy.
21+
* You have access to shapes with NVIDIA GPUs such as the A10.
22+
* You have a HuggingFace account and access to `meta-llama/Llama-3.1-8B-Instruct`.
23+
24+
## Infrastructure Setup
25+
26+
1. Create a new instance using a GPU shape.
27+
28+
* Use Ubuntu as the system image for simplicity.
29+
30+
<img src="files/image.png" alt="Selecting Ubuntu as the OS image" width="75%" />
31+
32+
* Create a large enough boot volume, e.g., with 200GB space.
33+
34+
2. Log into the machine and install NVIDIA drivers:
35+
```sh
36+
sudo apt-get update
37+
sudo apt-get install -y ubuntu-drivers-common
38+
sudo ubuntu-drivers install --gpgpu nvidia:570-server
39+
```
40+
If your shape has a NVLink fabric, also install the matching fabric manager
41+
from NVIDIA:
42+
```sh
43+
sudo apt-get install -y nvidia-fabricmanager-570
44+
```
45+
46+
3. Install Docker Compose:
47+
```sh
48+
sudo apt-install -y docker-compose
49+
```
50+
and add yourself to the `docker` group:
51+
```sh
52+
sudo usermod -aG docker ubuntu
53+
```
54+
55+
4. Then install and configure the container toolkit.
56+
In depth instructions on the NVIDIA container toolkit can be found
57+
[on NVIDIA's website](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#with-apt-ubuntu-debian).
58+
```sh
59+
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
60+
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
61+
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
62+
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
63+
sudo apt-get install -y nvidia-container-toolkit
64+
sudo nvidia-ctk runtime configure --runtime=docker
65+
```
66+
67+
5. Reboot the machine.
68+
69+
## Environment Configuration
70+
71+
This section is only needed if you wish to run LLM models from HuggingFace that
72+
are gated and require an access token.
73+
74+
1. Install `uv`:
75+
```sh
76+
sudo snap install --classic astral-uv
77+
```
78+
79+
2. Install the HuggingFace package:
80+
```sh
81+
uv venv
82+
uv pip install huggingface_hub
83+
```
84+
85+
3. Log into HuggingFace with your access token:
86+
```sh
87+
uv run huggingface-cli login
88+
```
89+
90+
## Executing the Benchmarks
91+
92+
1. Download the contents of the folder ["files"](./files).
93+
Then build the necessary containers with `docker-compose`:
94+
```sh
95+
docker-compose --profile benchmark build
96+
```
97+
98+
2. Edit the configuration file, `config.json`. This will specify all necessary
99+
settings with which the LLM is served. For example:
100+
```json
101+
{
102+
"model": "meta-llama/Llama-3.1-8B-Instruct",
103+
"gpu_memory_utilization": 0.98,
104+
"tensor_parallel_size": 1,
105+
"max_model_len": 8192,
106+
"max_num_batched_tokens": 8192
107+
}
108+
```
109+
will run Llama 3.1. Modify this file to the settings you desire.
110+
111+
3. Launch the LLM in the background:
112+
```sh
113+
docker-compose up -d llm
114+
```
115+
You can follow the start-up of the vLLM service with:
116+
```sh
117+
docker-compose logs -f llm
118+
```
119+
120+
4. Start the benchmarking container:
121+
```sh
122+
docker-compose run perf
123+
```
124+
This will execute multiple runs of NVIDIA's `genai-perf`, and store the
125+
results in the directory `./results`, containing information about the vLLM
126+
parameters and the shape used.
127+
128+
To run only certain scenarios and concurrent request settings, modify
129+
[`compose.yaml`](files/compose.yaml) and have the `command` for the `perf`
130+
container read, i.e:
131+
```yaml
132+
command:
133+
- "wait-for-it.sh"
134+
- "--timeout=300"
135+
- "llm:8000"
136+
- "--"
137+
- "/appli/scripts/benchmark.py"
138+
- "--scenario"
139+
- "chatbot"
140+
- "--concurrency"
141+
- "1"
142+
- "4"
143+
- "16"
144+
```
145+
146+
5. Run the plotting:
147+
```sh
148+
docker-compose run plot
149+
```
150+
The output files will be in `./plots`.
151+
152+
6. Shut down all remaining containers:
153+
```sh
154+
docker-compose down
155+
```
156+
157+
# Acknowledgments
158+
159+
- **Author** - Omar Awile (GPU Specialist)
160+
- **Author** - Matthias Wolf (GPU Specialist)
161+
162+
# License
163+
164+
Copyright (c) 2025 Oracle and/or its affiliates.
165+
166+
Licensed under the Universal Permissive License (UPL), Version 1.0.
167+
168+
See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (c) 2025 Oracle and/or its affiliates.
2+
FROM ubuntu:24.04
3+
4+
ENV PATH="/root/.local/bin:/root/.local/share/pipx/venvs/genai-perf/bin/:$PATH"
5+
ARG GENAI_PERF_VERSION=0.0.12
6+
7+
RUN apt-get update \
8+
&& apt-get install -y curl pipx \
9+
&& apt-get clean \
10+
&& rm -rf /var/lib/apt/lists/*
11+
12+
RUN pipx install genai-perf==${GENAI_PERF_VERSION} && pipx ensurepath
13+
RUN curl -o /root/.local/bin/wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/refs/heads/master/wait-for-it.sh
14+
RUN chmod +x /root/.local/bin/wait-for-it.sh
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Copyright (c) 2025 Oracle and/or its affiliates.
2+
version: "3"
3+
services:
4+
llm:
5+
image: vllm/vllm-openai:v0.8.5.post1
6+
container_name: llm
7+
runtime: nvidia
8+
volumes:
9+
- "$HOME/.cache/huggingface:/huggingface"
10+
- "$PWD:/appli"
11+
ports:
12+
- "127.0.0.1:8000:8000"
13+
environment:
14+
"HF_HOME": "/huggingface"
15+
working_dir: "/appli"
16+
entrypoint:
17+
- "/appli/scripts/startllm.py"
18+
perf:
19+
build: benchmark
20+
container_name: perf
21+
depends_on:
22+
- llm
23+
volumes:
24+
- "$HOME/.cache/huggingface:/huggingface"
25+
- "$PWD:/appli"
26+
environment:
27+
"HF_HOME": "/huggingface"
28+
working_dir: "/appli"
29+
command:
30+
- "wait-for-it.sh"
31+
- "--timeout=300"
32+
- "llm:8000"
33+
- "--"
34+
- "/appli/scripts/benchmark.py"
35+
plot:
36+
build: plot
37+
container_name: plot
38+
volumes:
39+
- "$PWD:/appli"
40+
working_dir: "/appli"
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"model": "meta-llama/Llama-3.1-8B-Instruct",
3+
"gpu_memory_utilization": 0.98,
4+
"tensor_parallel_size": 1,
5+
"max_model_len": 8192,
6+
"max_num_batched_tokens": 8192
7+
}
87 KB
Loading
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright (c) 2025 Oracle and/or its affiliates.
2+
FROM ubuntu:24.04
3+
4+
ENV PATH="/root/.local/bin:$PATH"
5+
6+
RUN apt-get update \
7+
&& apt-get install -y curl pipx \
8+
&& apt-get clean \
9+
&& rm -rf /var/lib/apt/lists/*
10+
11+
WORKDIR /appli
12+
13+
COPY pyproject.toml benchplot.py /appli/
14+
15+
RUN pipx install . && pipx ensurepath
16+
17+
CMD ["benchplot"]

0 commit comments

Comments
 (0)