Skip to content

Conversation

@Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Oct 9, 2025

When semantic cache hits occur with streaming requests, the cached response (in chat.completion JSON format) was being returned directly without converting to SSE format (chat.completion.chunk), causing streaming clients to receive malformed responses.

This fix:

  • Updates CreateCacheHitResponse() to accept isStreaming parameter
  • Converts cached chat.completion to chat.completion.chunk format for streaming
  • Sets appropriate content-type header (text/event-stream vs application/json)
  • Maintains backward compatibility for non-streaming requests
  • Adds comprehensive unit tests for both streaming and non-streaming cases

Similar to the fix in a0f0581 for jailbreak/PII violations, this ensures consistent response format handling across all direct response scenarios.

Resolves streaming client hanging issues when cache hits occur.

When semantic cache hits occur with streaming requests, the cached
response (in chat.completion JSON format) was being returned directly
without converting to SSE format (chat.completion.chunk), causing
streaming clients to receive malformed responses.

This fix:
- Updates CreateCacheHitResponse() to accept isStreaming parameter
- Converts cached chat.completion to chat.completion.chunk format for streaming
- Sets appropriate content-type header (text/event-stream vs application/json)
- Maintains backward compatibility for non-streaming requests
- Adds comprehensive unit tests for both streaming and non-streaming cases

Similar to the fix in a0f0581 for jailbreak/PII violations, this ensures
consistent response format handling across all direct response scenarios.

Resolves streaming client hanging issues when cache hits occur.

Signed-off-by: bitliu <[email protected]>
@netlify
Copy link

netlify bot commented Oct 9, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit c18becb
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68e769c3c92285000832c611
😎 Deploy Preview https://deploy-preview-378--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Oct 9, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/utils/http/response_test.go
  • src/semantic-router/pkg/extproc/request_handler.go
  • src/semantic-router/pkg/utils/http/response.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Signed-off-by: bitliu <[email protected]>
@rootfs rootfs merged commit be38a4b into main Oct 9, 2025
9 checks passed
@Xunzhuo Xunzhuo deleted the fix/semantic-cache-streaming-response branch October 10, 2025 11:43
joyful-ii-V-I pushed a commit to joyful-ii-V-I/semantic-router that referenced this pull request Oct 13, 2025
…-project#378)

* fix: resolve semantic cache hit streaming response format issue

When semantic cache hits occur with streaming requests, the cached
response (in chat.completion JSON format) was being returned directly
without converting to SSE format (chat.completion.chunk), causing
streaming clients to receive malformed responses.

This fix:
- Updates CreateCacheHitResponse() to accept isStreaming parameter
- Converts cached chat.completion to chat.completion.chunk format for streaming
- Sets appropriate content-type header (text/event-stream vs application/json)
- Maintains backward compatibility for non-streaming requests
- Adds comprehensive unit tests for both streaming and non-streaming cases

Similar to the fix in a0f0581 for jailbreak/PII violations, this ensures
consistent response format handling across all direct response scenarios.

Resolves streaming client hanging issues when cache hits occur.

Signed-off-by: bitliu <[email protected]>

* fix lint

Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: bitliu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants