Skip to content

[Flow Control] Reconciliation of the flow control logic with the scheduling logic #1860

@lioraron

Description

@lioraron

What would you like to be added:
In the current implementation it is possible that the decision in the flow control component conflicts with the subsequent decision in the scheduling component. For example: flow control decides (based on logic 1) that overall there is some capacity in the pool and forwards some requests - hoping these will land in model replicas that have capacity, but then the scheduler decides (based on logic 2) to route these requests to a more busy model replica where the requests end up being queued locally. This can cause underutilization of resources and excess latency of inference requests.
The requirement is to add a mechanism to enable better knowledge sharing between these components, to ensure that their decisions align.

Why is this needed:
Without the above - conflicts in the decisions of flow control and the scheduler can conflict, causing underutilization of resources and excess latency of inference requests.

Metadata

Metadata

Assignees

Labels

needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions