Skip to content

Conversation

parfeniukink
Copy link
Contributor

@parfeniukink parfeniukink commented Feb 27, 2025

Execution example

(py38) ➜  guidellm git:(parfeniukink/concurrent-load-generation-v2) python src/guidellm/main.py --target http://localhost:8080/v1 --model Phi-3-mini-4k-instruct-q4.gguf --data 'prompt_tokens=128,generated_tokens=128' --data-type emulated --tokenizer "hf-internal-testing/llama-tokenizer" --max-requests 2 --rate-type concurrent --rate 2
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
╭─ Benchmarks ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [14:08:32]   100% concurrent   (0.12 req/sec avg)                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  Generating report... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:00:16 < 0:00:00 ]
╭─ GuideLLM Benchmarks Report (stdout) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─ Benchmark Report 1 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │
│ │ Backend(type=openai_server, target=http://localhost:8080/v1, model=Phi-3-mini-4k-instruct-q4.gguf)                                                                                   │ │
│ │ Data(type=emulated, source=prompt_tokens=128,generated_tokens=128, tokenizer=hf-internal-testing/llama-tokenizer)                                                                    │ │
│ │ Rate(type=concurrent, rate=(2.0,))                                                                                                                                                   │ │
│ │ Limits(max_number=2 requests, max_duration=120 sec)                                                                                                                                  │ │
│ │                                                                                                                                                                                      │ │
│ │                                                                                                                                                                                      │ │
│ │ Requests Data by Benchmark                                                                                                                                                           │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓                                                                              │ │
│ │ ┃ Benchmark                 ┃ Requests Completed ┃ Request Failed ┃ Duration  ┃ Start Time ┃ End Time ┃                                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩                                                                              │ │
│ │ │ [email protected] req/sec │ 4/4                │ 0/4            │ 32.05 sec │ 14:08:32   │ 14:09:04 │                                                                              │ │
│ │ └───────────────────────────┴────────────────────┴────────────────┴───────────┴────────────┴──────────┘                                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Tokens Data by Benchmark                                                                                                                                                             │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓                                                              │ │
│ │ ┃ Benchmark                 ┃ Prompt ┃ Prompt (1%, 5%, 50%, 95%, 99%)    ┃ Output ┃ Output (1%, 5%, 50%, 95%, 99%)    ┃                                                              │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩                                                              │ │
│ │ │ [email protected] req/sec │ 129.00 │ 129.0, 129.0, 129.0, 129.0, 129.0 │ 128.00 │ 128.0, 128.0, 128.0, 128.0, 128.0 │                                                              │ │
│ │ └───────────────────────────┴────────┴───────────────────────────────────┴────────┴───────────────────────────────────┘                                                              │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Stats by Benchmark                                                                                                                                                       │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │ │
│ │ ┃                           ┃ Request Latency [1%, 5%, 10%, 50%, 90%, 95%,    ┃ Time to First Token [1%, 5%, 10%, 50%, 90%,     ┃ Inter Token Latency [1%, 5%, 10%, 50%, 90% 95%,  ┃ │ │
│ │ ┃ Benchmark                 ┃ 99%] (sec)                                      ┃ 95%, 99%] (ms)                                  ┃ 99%] (ms)                                        ┃ │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ │
│ │ │ [email protected] req/sec │ 8.50, 9.46, 10.65, 20.20, 29.68, 30.85, 31.80   │ 1365.0, 2316.8, 3506.6, 13039.3, 22588.4,       │ 50.8, 51.4, 51.7, 54.7, 62.3, 68.1, 73.6         │ │ │
│ │ │                           │                                                 │ 23781.6, 24736.2                                │                                                  │ │ │
│ │ └───────────────────────────┴─────────────────────────────────────────────────┴─────────────────────────────────────────────────┴──────────────────────────────────────────────────┘ │ │
│ │                                                                                                                                                                                      │ │
│ │ Performance Summary by Benchmark                                                                                                                                                     │ │
│ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓                                          │ │
│ │ ┃ Benchmark                 ┃ Requests per Second ┃ Request Latency ┃ Time to First Token ┃ Inter Token Latency ┃ Output Token Throughput ┃                                          │ │
│ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩                                          │ │
│ │ │ [email protected] req/sec │ 0.12 req/sec        │ 20.17 sec       │ 13045.15 ms         │ 56.12 ms            │ 15.98 tokens/sec        │                                          │ │
│ │ └───────────────────────────┴─────────────────────┴─────────────────┴─────────────────────┴─────────────────────┴─────────────────────────┘                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

tox local report

for some reason GitHub has some issues with quality checks. Here is report from local:

Alacritty 2025-02-27

use ``--rate`` CLI parameter to specify concurrent workers number
@markurtz
Copy link
Collaborator

Closing this out as it is being reworked and included in #96

@markurtz markurtz closed this Mar 10, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in GuideLLM Kanban Board Mar 10, 2025
markurtz added a commit that referenced this pull request Apr 11, 2025
…ation Refactor (#96)

Full refactor of GuideLLM enabling better overall performance to ensure
minimal overhead for benchmarking with a new multiprocess and threaded
scheduler along with significant updates to the output formats enabling
better analysis, visibility, and clarity.

<img width="668" alt="Screenshot 2025-04-11 at 2 26 13 PM"
src="https://github.com/user-attachments/assets/a723854a-7fe0-4eb2-9408-f632e747c3c2"
/>

Fixes:
- #92 
- #77 
- #47 
- #79

---------

Co-authored-by: Alexandre Marques <[email protected]>
Co-authored-by: Samuel Monson <[email protected]>
Co-authored-by: David Gray <[email protected]>
@markurtz markurtz deleted the parfeniukink/concurrent-load-generation-v2 branch April 21, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

load-request load-request workstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants