Skip to content

metrics staleness check should be moved from saturation detector to getting candidate pods per request #1497

@nirrozenbaum

Description

@nirrozenbaum

What happened:
today SaturationDetector is checking 3 conditions per pod -

  1. Metrics are fresh (not stale).
  2. WaitingQueueSize <= QueueDepthThreshold.
  3. KVCacheUsagePercent <= KVCacheUtilThreshold.

consider the case where one pod came up and hasn't reported metrics for a while, assume that last time that it was reporting, it was in great shape (e.g., on startup, waiting queue size = 0, kvcache = 0).

so we have:
pod1 - able to serve requests. reports as healthy. but hasn't reported metrics for a while and last time we collected metrics from it, it's was idle.
pod2 - partially busy.
pod3 - partially busy.

every incoming request that will go through the saturation detection identify that pod1 shouldn't be considered as a final target, but still pod1 may be selected to serve requests and its stale metrics that shows it's idle, so pod1 has high chances of getting selected.
this may happen continuously, so multiple requests can be sent to pod1, leading to unpredicted state (pod1 may not be able to handle the load).

What you expected to happen:
pod freshness of metrics shouldn't be part of saturation detector, but rather these pods should be filtered when getting the candidate pods from the datastore. stale metrics is not a sign of saturation, it's a sign of invalid pod that shouldn't be taken into consideration for the request serving.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

cc: @LukeAVanDrie

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions