Skip to content

Commit 9af81f7

Browse files
authored
Merge pull request #2 from neuralmagic/bench-m
Add basic config for benchmark
2 parents 582823e + 1d0b9f9 commit 9af81f7

File tree

5 files changed

+29
-4
lines changed

5 files changed

+29
-4
lines changed

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,9 @@ The `accuracy` folder contains YAML files for each model that configures informa
88
* client.yml: contains settings for the llm-eval-test harness for the model
99
* accuracy.yml: contains evaluation tasks and accuracy expectations for the model
1010
* storage.yml: specifies where mode and dataset is located
11+
12+
The `benchmark` folder contains YAML files for each model that configures information needed for the model to be validated through the [guidellm](https://github.com/neuralmagic/guidellm). There are config files for each model:
13+
14+
* server.yml: contains settings to start a vllm server with the model
15+
* client.yml: contains settings for the guidellm for the model
16+
* storage.yml: specifies where mode and dataset is located
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# server configs for https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
22
model: "meta-llama/Meta-Llama-3.1-8B-Instruct"
3-
trust_remote_code: true
4-
enable_chunked_prefill: true
5-
tensor_parallel_size:
6-
max_model_len: 4096
3+
trust-remote-code: true
4+
enable-chunked-prefill: true
5+
tensor-parallel-size:
6+
max-model-len: 4096
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
target: "http://localhost:8000/v1"
2+
model: "meta-llama/Llama-3.1-8B-Instruct"
3+
data:
4+
prompt_tokens: 64
5+
output_tokens: 16
6+
rate-type: throughput
7+
max-seconds: 400
8+
output_path: ""
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# server configs for https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
2+
# config.yaml
3+
model: meta-llama/Llama-3.1-8B-Instruct
4+
uvicorn-log-level: "debug"
5+
trust-remote-code: true
6+
enable-chunked-prefill: true
7+
tensor-parallel-size: 1
8+
max-model-len: 4096
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# storage configs for https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
2+
model: hf
3+
data: hf

0 commit comments

Comments
 (0)