Skip to content

Commit 82fe939

Browse files
committed
Add helm chart for running benchmark
1 parent 123ad68 commit 82fe939

File tree

9 files changed

+212
-101
lines changed

9 files changed

+212
-101
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
.bzr/
9+
.bzrignore
10+
.hg/
11+
.hgignore
12+
.svn/
13+
# Common backup files
14+
*.swp
15+
*.bak
16+
*.tmp
17+
*.orig
18+
*~
19+
# Various IDEs
20+
.project
21+
.idea/
22+
*.tmproj
23+
.vscode/

config/charts/benchmark/Chart.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
apiVersion: v2
2+
name: benchmark
3+
description: A Helm chart for Running Benchmark tool
4+
5+
type: application
6+
7+
version: 0.0.0
8+
9+
appVersion: "0.0.0"

config/charts/benchmark/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Benchmark
2+
3+
A chart to deploy the benchmark tool on top of vLLM model server deployment done via [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/#getting-started-with-gateway-api-inference-extension)
4+
5+
6+
## Install
7+
8+
To install benchmark tool
9+
10+
```txt
11+
$ helm install benchmark-tool ./config/charts/benchmark \
12+
--set moderlServingEndpoint.mode=gateway \
13+
--set inferenceGateway.name=inference-gateway \
14+
--set inferenceGateway.namespace=default
15+
```
16+
17+
## Uninstall
18+
19+
Run the following command to uninstall the chart:
20+
21+
```txt
22+
$ helm uninstall benchmark-tool
23+
```
24+
25+
## Configuration
26+
27+
The following table list the configurable parameters of the chart.
28+
29+
| **Parameter Name** | **Description** |
30+
|---------------------------------------------|----------------------------------------------------------------------------------------------------|
31+
| `benchmark.requestRates` | Comma separated list of number of requests per second. For each request rate benchmarking would be done against the vLLM deployment. |
32+
| `benchmark.timeSeconds` | Number of prompts will be calculated following this forumula `requestRate * timeSeconds` for each requestRate. |
33+
| `benchmark.maxNumPrompts` | Maximum number of prompts to process. Will be considered when `requestRates` is not set. |
34+
| `benchmark.tokenizer` | Name or path of the tokenizer. |
35+
| `benchmark.models` | Comma separated list of models to benchmark. |
36+
| `benchmark.backend` | Model serving backend. Default: vllm |
37+
| `benchmark.port` | Model serving backend server's port |
38+
| `benchmark.inputLength` | Maximum number of input tokens for filtering the benchmark dataset. |
39+
| `benchmark.outputLength` | Maximum number of output tokens for filtering the benchmark dataset. |
40+
| `benchmark.filePrefix` | Prefix to use for benchmark result's output file . |
41+
| `benchmark.trafficSplit` | Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0. |
42+
| `benchmark.scrapeServerMetrics` | Whether to scrape server metrics. |
43+
| `benchmark.saveAggregatedResult` | Whether to aggregate results of all models and save the result. |
44+
| `benchmark.streamRequest` | Whether to stream the request. Needed for TTFT metric |
45+
| `benchmark.trafficSplit` | Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0. |
46+
| `benchmark.trafficSplit` | Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0. |
47+
| `benchmark.trafficSplit` | Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0. |
48+
| `benchmark.trafficSplit` | Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0. |
49+
| `moderlServingEndpoint.mode` | Mode in which you want to consume the model serving endpoint for benchmarking. Options are gateway or service |
50+
| `moderlServingEndpoint.name` | Provide model serving endpoint's resource name. i.e. name of inference gateway or load balancer service name |
51+
| `moderlServingEndpoint.namespace` | Namespace of moderlServingEndpoint resource. i.e. namespace of inference gateway or load balancer service name |
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
{{- $targetIP := "" -}}
2+
{{- if eq .Values.moderlServingEndpoint.mode "gateway" -}}
3+
{{- $gw := lookup "gateway.networking.k8s.io/v1" "Gateway" .Values.moderlServingEndpoint.namespace .Values.moderlServingEndpoint.name -}}
4+
{{- if not $gw }}
5+
{{- fail "Gateway .Values.moderlServingEndpoint.name not found in namespace .Values.moderlServingEndpoint.namespace. Please create it before installing this chart." -}}
6+
{{- end }}
7+
{{- if or (not $gw.status) (not $gw.status.addresses) -}}
8+
{{- fail "Gateway .Values.moderlServingEndpoint.name found, but .status.addresses is not populated yet. Please wait until an IP is assigned." -}}
9+
{{- end }}
10+
{{- $targetIP = (index $gw.status.addresses 0).value | quote -}}
11+
{{- end }}
12+
{{- if eq .Values.moderlServingEndpoint.mode "service" -}}
13+
{{- $svc := lookup "v1" "Service" .Values.moderlServingEndpoint.namespace .Values.moderlServingEndpoint.name -}}
14+
{{- if not $svc }}
15+
{{- fail "Service .Values.moderlServingEndpoint.name not found in namespace .Values.moderlServingEndpoint.namespace. Please create it before installing this chart." -}}
16+
{{- end }}
17+
{{- if or (not $svc.status) (not $svc.status.loadBalancer) -}}
18+
{{- fail "Service .Values.moderlServingEndpoint.name found, but .status.loadBalancer is not populated yet. Please wait until an IP is assigned." -}}
19+
{{- end }}
20+
{{- $targetIP = (index $svc.status.loadBalancer.ingress 0).ip | quote -}}
21+
{{- end }}
22+
23+
apiVersion: apps/v1
24+
kind: Deployment
25+
metadata:
26+
labels:
27+
app: {{ .Release.Name }}
28+
name: {{ .Release.Name }}
29+
spec:
30+
replicas: 1
31+
selector:
32+
matchLabels:
33+
app: {{ .Release.Name }}
34+
template:
35+
metadata:
36+
labels:
37+
app: {{ .Release.Name }}
38+
spec:
39+
containers:
40+
# The following image was built from this source https://github.com/AI-Hypercomputer/inference-benchmark/tree/07628c9fe01b748f5a4cc9e5c2ee4234aaf47699
41+
- image: 'us-docker.pkg.dev/cloud-tpu-images/inference/inference-benchmark@sha256:1c100b0cc949c7df7a2db814ae349c790f034b4b373aaad145e77e815e838438'
42+
imagePullPolicy: Always
43+
name: {{ .Release.Name }}
44+
command:
45+
- bash
46+
- -c
47+
- ./latency_throughput_curve.sh
48+
env:
49+
- name: IP
50+
value: {{ $targetIP }}
51+
- name: REQUEST_RATES
52+
value: {{ .Values.request-rates }}
53+
- name: BENCHMARK_TIME_SECONDS
54+
value: {{ .Values.timeSeconds | quote }}
55+
- name: MAX_NUM_PROMPTS
56+
value: {{ .Values.maxNumPrompts }}
57+
- name: TOKENIZER
58+
value: {{ .Values.tokenizer | quote }}
59+
- name: MODELS
60+
value: {{ .Values.models | quote }}
61+
- name: BACKEND
62+
value: {{ .Values.backend | quote }}
63+
- name: PORT
64+
value: {{ .Values.port }}
65+
- name: INPUT_LENGTH
66+
value: {{ .Values.inputLength }}
67+
- name: OUTPUT_LENGTH
68+
value: {{ .Values.outputLength }}
69+
- name: FILE_PREFIX
70+
value: {{ .Values.filePrefix | quote}}
71+
- name: PROMPT_DATASET_FILE
72+
value: ShareGPT_V3_unfiltered_cleaned_split.json
73+
- name: TRAFFIC_SPLIT
74+
value: {{ .Values.trafficSplit | quote }}
75+
- name: SCRAPE_SERVER_METRICS
76+
value: {{ .Values.scrapeServerMetrics | quote }}
77+
- name: SAVE_AGGREGATION_RESULT
78+
value: {{ .Values.saveAggregatedResult | quote }}
79+
- name: STREAM_REQUEST
80+
value: {{ .Values.streamRequest | quote }}
81+
- name: HF_TOKEN
82+
valueFrom:
83+
secretKeyRef:
84+
key: token
85+
name: hf-token
86+
resources:
87+
limits:
88+
cpu: "2"
89+
memory: 20Gi
90+
requests:
91+
cpu: "2"
92+
memory: 20Gi
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
benchmark:
2+
requestRates: "10,20,30"
3+
timeSeconds: 60
4+
maxNumPrompts:
5+
tokenizer: "meta-llama/Llama-3.1-8B-Instruct"
6+
models: "meta-llama/Llama-3.1-8B-Instruct"
7+
backend: "vllm"
8+
port: 80
9+
inputLength: 1024
10+
outputLength: 2048
11+
filePrefix: "benchmark"
12+
trafficSplit:
13+
scrapeServerMetrics:
14+
saveAggregatedResult:
15+
streamRequest:
16+
moderlServingEndpoint:
17+
# `gateway` to select endpoint from inferenceGateway
18+
# `service` to select endpoint from LoadBalancer service created on top of vLLM model server deployment
19+
mode: gateway
20+
name: vllm-llama3-8b-instruct
21+
namespace: default

config/manifests/benchmark/benchmark.yaml

Lines changed: 0 additions & 60 deletions
This file was deleted.

config/manifests/benchmark/model-server-service.yaml

Lines changed: 0 additions & 12 deletions
This file was deleted.

site-src/performance/benchmark/index.md

Lines changed: 13 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -36,41 +36,28 @@ The LPG benchmark tool works by sending traffic to the specified target IP and p
3636
Follow the steps below to run a single benchmark. Multiple LPG instances can be deployed to run benchmarks in
3737
parallel against different targets.
3838

39-
1. Check out the repo.
40-
39+
1. Install the LPG benchmark tool by running the below helm chart.
4140
```bash
42-
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension
43-
cd gateway-api-inference-extension
41+
export BENCHMARK_DEPLOYMENT_NAME=benchmark-tool
42+
helm install $BENCHMARK_DEPLOYMENT_NAME \
43+
--set moderlServingEndpoint.mode=gateway \
44+
--set moderlServingEndpoint.name=inference-gateway \
45+
--set moderlServingEndpoint.namespace=default \
46+
oci://registry.k8s.io/gateway-api-inference-extension/charts/benchmark
4447
```
4548

46-
1. Get the target IP. The examples below shows how to get the IP of a gateway or a k8s service.
49+
## Download the results
50+
1. Check out the repo to use the tools available to download and analyse the benchmark results
4751

4852
```bash
49-
# Get gateway IP
50-
GW_IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
51-
# Get LoadBalancer k8s service IP
52-
SVC_IP=$(kubectl get service/vllm-llama2-7b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
53-
54-
echo $GW_IP
55-
echo $SVC_IP
56-
```
57-
58-
1. Then update the `<target-ip>` in `./config/manifests/benchmark/benchmark.yaml` to the value of `$SVC_IP` or `$GW_IP`.
59-
Feel free to adjust other parameters such as `request_rates` as well. For a complete list of LPG configurations, refer to the
60-
[LPG user guide](https://github.com/AI-Hypercomputer/inference-benchmark?tab=readme-ov-file#configuring-the-benchmark).
61-
62-
1. Start the benchmark tool.
63-
64-
```bash
65-
kubectl apply -f ./config/manifests/benchmark/benchmark.yaml
53+
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension
54+
cd gateway-api-inference-extension
6655
```
6756

68-
1. Wait for benchmark to finish and download the results. Use the `benchmark_id` environment variable to specify what this
69-
benchmark is for. For instance, `inference-extension` or `k8s-svc`. When the LPG tool finishes benchmarking, it will print
70-
a log line `LPG_FINISHED`. The script below will watch for that log line and then start downloading results.
57+
1. When the LPG tool finishes benchmarking, it will print a log line `LPG_FINISHED`. The script below will watch for that log line and then start downloading results. Use the `benchmark_id` environment variable to specify what this benchmark is for. For instance, `inference-extension` or `k8s-svc`. Use `BENCHMARK_DEPLOYMENT_NAME` environment variable to specify the deployment name used in previous step to install the LPG benchmark helm chart to download the results from respective deployment.
7158

7259
```bash
73-
benchmark_id='k8s-svc' ./tools/benchmark/download-benchmark-results.bash
60+
benchmark_id='k8s-svc' BENCHMARK_DEPLOYMENT_NAME=benchmark-tool ./tools/benchmark/download-benchmark-results.bash
7461
```
7562

7663
After the script finishes, you should see benchmark results under `./tools/benchmark/output/default-run/k8s-svc/results/json` folder.

tools/benchmark/download-benchmark-results.bash

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
# Downloads the benchmark result files from the benchmark tool pod.
44
download_benchmark_results() {
5-
until echo $(kubectl logs deployment/benchmark-tool -n ${namespace}) | grep -q -m 1 "LPG_FINISHED"; do sleep 30 ; done;
6-
benchmark_pod=$(kubectl get pods -l app=benchmark-tool -n ${namespace} -o jsonpath="{.items[0].metadata.name}")
5+
until echo $(kubectl logs deployment/$BENCHMARK_DEPLOYMENT_NAME -n ${namespace}) | grep -q -m 1 "LPG_FINISHED"; do sleep 30 ; done;
6+
benchmark_pod=$(kubectl get pods -l app=$BENCHMARK_DEPLOYMENT_NAME -n ${namespace} -o jsonpath="{.items[0].metadata.name}")
77
echo "Downloading JSON results from pod ${benchmark_pod}"
88
kubectl exec ${benchmark_pod} -n ${namespace} -- rm -f ShareGPT_V3_unfiltered_cleaned_split.json
99
for f in $(kubectl exec ${benchmark_pod} -n ${namespace} -- /bin/sh -c ls -l | grep json); do
@@ -27,4 +27,4 @@ benchmark_output_dir=${SCRIPT_DIR}/${output_dir}/${run_id}/${benchmark_id}
2727

2828
echo "Saving benchmark results to ${benchmark_output_dir}/results/json/"
2929
download_benchmark_results
30-
kubectl delete -f ${SCRIPT_DIR}/../../config/manifests/benchmark/benchmark.yaml
30+
helm uninstall $BENCHMARK_DEPLOYMENT_NAME

0 commit comments

Comments
 (0)