Skip to content

[Flow Control] Inter-flow policies to support batch inference and to prevent starvation of flows #1861

@lioraron

Description

@lioraron

What would you like to be added:

  1. Enable to support the following requirement for batch inference via an inter-flow policy: When there is a high load of interactive workload - forward mostly interactive requests with a trickle of batch requests. When the interactive workload is low - forward a growing stream of batch requests.
    While the above is considering 2 flows (interactive and batch) - the policy could be generic, supporting any number of flows with a hierarchy, minimum forwarding quantity per flow (the minimum quantity can correlate to the relative priority of the flow), and a logic to calculate the forwarding quantities per flow based on flows' load metrics.

  2. In the current implementation the default inter-flow policy has the risk of starvation of lower priority flows. The requirement here is to implement a policy that prevents starvation (e.g. the suggestion above), and set this policy as the default policy.

Why is this needed:

  1. To facilitate batch inference and additional use cases.
  2. To prevent starvation of lower priority flows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions