[Flow Control] Inter-flow policies to support batch inference and to prevent starvation of flows

**What would you like to be added**:

1. Enable to support the following requirement for batch inference via an inter-flow policy: When there is a high load of interactive workload - forward mostly interactive requests with a trickle of batch requests. When the interactive workload is low - forward a growing stream of batch requests.
While the above is considering 2 flows (interactive and batch) - the policy could be generic, supporting any number of flows with a hierarchy, minimum forwarding quantity per flow (the minimum quantity can correlate to the relative priority of the flow), and a logic to calculate the forwarding quantities per flow based on flows' load metrics.

2. In the current implementation the default inter-flow policy has the risk of starvation of lower priority flows. The requirement here is to implement a policy that prevents starvation (e.g. the suggestion above), and set this policy as the default policy.

**Why is this needed**:

1. To facilitate batch inference and additional use cases.
2. To prevent starvation of lower priority flows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flow Control] Inter-flow policies to support batch inference and to prevent starvation of flows #1861

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Flow Control] Inter-flow policies to support batch inference and to prevent starvation of flows #1861

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions