-
Notifications
You must be signed in to change notification settings - Fork 195
Description
What would you like to be added:
In the current implementation it is possible that the decision in the flow control component conflicts with the subsequent decision in the scheduling component. For example: flow control decides (based on logic 1) that overall there is some capacity in the pool and forwards some requests - hoping these will land in model replicas that have capacity, but then the scheduler decides (based on logic 2) to route these requests to a more busy model replica where the requests end up being queued locally. This can cause underutilization of resources and excess latency of inference requests.
The requirement is to add a mechanism to enable better knowledge sharing between these components, to ensure that their decisions align.
Why is this needed:
Without the above - conflicts in the decisions of flow control and the scheduler can conflict, causing underutilization of resources and excess latency of inference requests.