Skip to content

Conversation

git-jxj
Copy link
Contributor

@git-jxj git-jxj commented Sep 11, 2025

Summary

This PR fixes a KeyError: 'content' that occurs when processing streaming chat completions.

Details

When using the chat_completions endpoint with stream=True, the final delta chunk sent by the server may not contain a content key. This is part of the standard API behavior to signal the end of the stream.

The existing code in _extract_completions_delta_content did not account for this possibility and tried to access delta['content'] directly, leading to a KeyError and causing the benchmark process to crash when the stream ended.

Test Plan

This was discovered while running guidellm benchmark against an OpenAI-compatible API endpoint (via litellm) that correctly implements the streaming protocol.

guidellm benchmark
--target "http://10.64.1.62:4000/v1"
--model "qwen3-06b-2"
--processor "Qwen/Qwen3-0.6B"
--rate-type "synchronous"
--max-requests 1
--data "prompt_tokens=32,output_tokens=32,samples=1"

Related Issues

#315

  • Resolves #

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

sjmonson
sjmonson previously approved these changes Sep 11, 2025
Copy link
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but needs signoff.

@git-jxj git-jxj force-pushed the fix/streaming-keyerror-content branch from 55d32b0 to 697ea5e Compare September 12, 2025 02:41
@sjmonson sjmonson merged commit 0ce21da into vllm-project:main Sep 12, 2025
17 checks passed
tukwila pushed a commit to tukwila/guidellm that referenced this pull request Sep 17, 2025
…#316)

## Summary

This PR fixes a `KeyError: 'content'` that occurs when processing
streaming chat completions.

## Details

When using the `chat_completions` endpoint with `stream=True`, the final
`delta` chunk sent by the server may not contain a `content` key. This
is part of the standard API behavior to signal the end of the stream.

The existing code in `_extract_completions_delta_content` did not
account for this possibility and tried to access `delta['content']`
directly, leading to a `KeyError` and causing the benchmark process to
crash when the stream ended.

## Test Plan

This was discovered while running `guidellm benchmark` against an
OpenAI-compatible API endpoint (via `litellm`) that correctly implements
the streaming protocol.

guidellm benchmark \
  --target "http://10.64.1.62:4000/v1" \
  --model "qwen3-06b-2" \
  --processor "Qwen/Qwen3-0.6B" \
  --rate-type "synchronous" \
  --max-requests 1 \
  --data "prompt_tokens=32,output_tokens=32,samples=1"

## Related Issues

vllm-project#315

- Resolves #

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: xinjun.jiang <[email protected]>
Co-authored-by: Samuel Monson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants