generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 176
Open
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Description
Currently PostResponse functions in plugins are called when the response headers are handled (HandleResponseHeaders -> S.directory.HandleResponse -> s.runPostResponsePlugins). This means that for streaming requests, the PostResponse plugin is called almost immediately after the request is made, not at the end of the response.
This is unexpected and we should change this or add a separate hook which is called at the end of the response.
This came up when trying out the active-request scorer in llm-d-inference-scheduler, which currently does not work for streaming requests.
cc: @vMaroon
Metadata
Metadata
Assignees
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.