`--batch-size` CLI parameter is added #73

parfeniukink · 2025-02-25T19:44:40Z

Setup the environment

Run the model via vllm or llama.cpp
Execute the guidellm command

Command

guidellm --target "http://localhost:8080/v1" --model "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16" --tokenizer "hf-internal-testing/llama-tokenizer" --data-type emulated --data "prompt_tokens=512,generated_tokens=128" --rate-type constant --rate 2 --max-seconds 100 --batch-size 2

Output

  Generating report... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:01:40 < 0:00:00 ]
╭─ GuideLLM Benchmarks Report (stdout) ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─ Benchmark Report 1 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │
│ │ Backend(type=openai_server, target=http://localhost:8080/v1, model=Phi-3-mini-4k-instruct-q4.gguf)                                                                                     │ │
│ │ Data(type=emulated, source=prompt_tokens=128,generated_tokens=128, tokenizer=hf-internal-testing/llama-tokenizer)                                                                      │ │
│ │ Rate(type=constant, rate=(8.0,))                                                                                                                                                       │ │
│ │ Limits(max_number=None requests, max_duration=100 sec)                                                                                                                                 │ │
│ │                                                                                                                                                                                        │ │
│ │                                                                                                                                                                                        │ │
│ │ Requests Data by Benchmark                                                                                                                                                             │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓                                                                                │ │
│ │ ┃ Benchmark                 ┃ Requests Completed ┃ Request Failed ┃ Duration  ┃ Start Time ┃ End Time ┃                                                                                │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩                                                                                │ │
│ │ │ [email protected] req/sec │ 12/12              │ 0/12           │ 90.03 sec │ 21:46:41   │ 21:48:11 │                                                                                │ │
│ │ └───────────────────────────┴────────────────────┴────────────────┴───────────┴────────────┴──────────┘                                                                                │ │
│ │                                                                                                                                                                                        │ │
│ │ Tokens Data by Benchmark                                                                                                                                                               │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                                  │ │
│ │ ┃ Benchmark                 ┃ Prompt ┃ Prompt (1%, 5%, 50%, 95%, 99%)    ┃ Output ┃ Output (1%, 5%, 50%, 95%, 99%)  ┃                                                                  │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                                                  │ │
│ │ │ [email protected] req/sec │ 128.25 │ 128.0, 128.0, 128.0, 129.0, 129.0 │ 117.42 │ 56.3, 65.5, 128.0, 128.0, 128.0 │                                                                  │ │
│ │ └───────────────────────────┴────────┴───────────────────────────────────┴────────┴─────────────────────────────────┘                                                                  │ │
│ │                                                                                                                                                                                        │ │
│ │ Performance Stats by Benchmark                                                                                                                                                         │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │
│ │ ┃                           ┃ Request Latency [1%, 5%, 10%, 50%, 90%, 95%,     ┃ Time to First Token [1%, 5%, 10%, 50%, 90%, 95%, ┃ Inter Token Latency [1%, 5%, 10%, 50%, 90% 95%,  ┃ │ │
│ │ ┃ Benchmark                 ┃ 99%] (sec)                                       ┃ 99%] (ms)                                        ┃ 99%] (ms)                                        ┃ │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ │
│ │ │ [email protected] req/sec │ 7.76, 7.83, 7.91, 10.56, 16.01, 16.60, 17.16     │ 828.5, 830.6, 833.0, 4789.5, 9004.5, 9510.3,     │ 49.7, 51.1, 51.8, 55.0, 66.4, 70.4, 75.6         │ │ │
│ │ │                           │                                                  │ 9994.8                                           │                                                  │ │ │
│ │ └───────────────────────────┴──────────────────────────────────────────────────┴──────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ │ │
│ │                                                                                                                                                                                        │ │
│ │ Performance Summary by Benchmark                                                                                                                                                       │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓                                            │ │
│ │ ┃ Benchmark                 ┃ Requests per Second ┃ Request Latency ┃ Time to First Token ┃ Inter Token Latency ┃ Output Token Throughput ┃                                            │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩                                            │ │
│ │ │ [email protected] req/sec │ 0.13 req/sec        │ 11.60 sec       │ 4941.58 ms          │ 57.20 ms            │ 15.65 tokens/sec        │                                            │ │
│ │ └───────────────────────────┴─────────────────────┴─────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┘                                            │ │
│ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

markurtz · 2025-03-10T18:12:59Z

Closing this out as all this will do is run a set number of requests equal to the batch size in parallel. To add batch support, we'll either need to run vLLM locally or go through the OpenAI batch processing API which is a significant expansion in scope and work.

--batch-size CLI parameter is added

5a8763c

parfeniukink requested a review from markurtz February 25, 2025 19:44

parfeniukink self-assigned this Feb 25, 2025

fixed code quality issues. fixed tests

14611ef

parfeniukink marked this pull request as draft February 25, 2025 20:01

parfeniukink removed the request for review from markurtz February 25, 2025 20:01

removed unused function

59602cd

parfeniukink requested a review from markurtz February 27, 2025 12:00

rgreenberg1 added the load-request load-request workstream label Feb 28, 2025

rgreenberg1 added this to GuideLLM Kanban Board Feb 28, 2025

rgreenberg1 moved this to In progress in GuideLLM Kanban Board Feb 28, 2025

rgreenberg1 added this to the GuideLLM v0.2.0 - CI/CD Finalization, Documentation Expansion, and Backend Support milestone Feb 28, 2025

rgreenberg1 requested a review from sjmonson March 4, 2025 20:31

markurtz closed this Mar 10, 2025

github-project-automation bot moved this from In progress to Done in GuideLLM Kanban Board Mar 10, 2025

markurtz deleted the parfeniukink/batch-size-cli-parameter branch April 21, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`--batch-size` CLI parameter is added #73

`--batch-size` CLI parameter is added #73

Uh oh!

parfeniukink commented Feb 25, 2025 •

edited

Loading

Uh oh!

markurtz commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

--batch-size CLI parameter is added #73

--batch-size CLI parameter is added #73

Uh oh!

Conversation

parfeniukink commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup the environment

Command

Output

Uh oh!

markurtz commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`--batch-size` CLI parameter is added #73

`--batch-size` CLI parameter is added #73

parfeniukink commented Feb 25, 2025 •

edited

Loading