Guidellm fails with SGLang 0.4.6 due to unlimited output token generation

**Describe the bug**
Guidellm benchmarking fails when testing against SGLang 0.4.6 servers. While the same configuration works with SGLang 0.5.5.post3, requests sent to SGLang 0.4.6 result in extremely long generation sequences (hundreds to thousands of tokens), causing requests to hang and constraints like --max-requests to not work properly.

**Expected behavior**
guidellm can normally access sglang's OpenAI API interface

**Environment**
Include all relevant environment information:
1. OS [e.g. Ubuntu 22.04]:
2. Python version [e.g. 3.13.7]:

**To Reproduce**
Exact steps to reproduce the behavior:
   - Guidellm version: latest (from source)
   - Working SGLang version: 0.5.5.post3
   - Non-working SGLang version: 0.4.6
   - Model: Qwen3-32B
   - Test command:

   1   guidellm benchmark \
   2     --target "http://ip:port" \
   3     --processor "/root/qwen-model/tokenizer/qwen3-32b" \
   4     --rate-type "concurrent" \
   5     --rate 1 \
   6     --max-requests 1 \
   7     --data "prompt_tokens=32,output_tokens=32,samples=1"
  When using guidellm to benchmark against SGLang 0.4.6, the following issues occur:

   1. Unlimited token generation: Requests with output_tokens=64 generate hundreds or thousands of tokens instead of respecting the specified limit
   2. Request state management failure: The generated requests don't complete properly in guidellm, causing --max-requests constraints to fail
   
**Errors**

<img width="3014" height="1160" alt="Image" src="https://github.com/user-attachments/assets/cdaaaacd-f256-4841-af1a-70d64fcd3354" />

<img width="2260" height="1454" alt="Image" src="https://github.com/user-attachments/assets/cd7bd032-a9cb-4fe3-82cd-c0dde28047b4" />
Parameter mapping issue: Guidellm's output_tokens=64 parameter does not properly map to SGLang 0.4.6's max_tokens parameter

**Additional context**
Add any other context about the problem here. Also include any relevant files.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidellm fails with SGLang 0.4.6 due to unlimited output token generation #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidellm fails with SGLang 0.4.6 due to unlimited output token generation #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions