-
Notifications
You must be signed in to change notification settings - Fork 195
Description
What would you like to be added:
-
Enable to support the following requirement for batch inference via an inter-flow policy: When there is a high load of interactive workload - forward mostly interactive requests with a trickle of batch requests. When the interactive workload is low - forward a growing stream of batch requests.
While the above is considering 2 flows (interactive and batch) - the policy could be generic, supporting any number of flows with a hierarchy, minimum forwarding quantity per flow (the minimum quantity can correlate to the relative priority of the flow), and a logic to calculate the forwarding quantities per flow based on flows' load metrics. -
In the current implementation the default inter-flow policy has the risk of starvation of lower priority flows. The requirement here is to implement a policy that prevents starvation (e.g. the suggestion above), and set this policy as the default policy.
Why is this needed:
- To facilitate batch inference and additional use cases.
- To prevent starvation of lower priority flows.