Skip to content

Unexpected behavior of PostResponse plugins for streaming requests #1483

@dagrayvid

Description

@dagrayvid

Currently PostResponse functions in plugins are called when the response headers are handled (HandleResponseHeaders -> S.directory.HandleResponse -> s.runPostResponsePlugins). This means that for streaming requests, the PostResponse plugin is called almost immediately after the request is made, not at the end of the response.

This is unexpected and we should change this or add a separate hook which is called at the end of the response.

This came up when trying out the active-request scorer in llm-d-inference-scheduler, which currently does not work for streaming requests.

cc: @vMaroon

Metadata

Metadata

Labels

triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions