metrics staleness check should be moved from saturation detector to getting candidate pods per request

**What happened**:
today SaturationDetector is checking 3 conditions per pod - 
1. Metrics are fresh (not stale).
2. WaitingQueueSize <= QueueDepthThreshold.
3. KVCacheUsagePercent <= KVCacheUtilThreshold.

consider the case where one pod came up and hasn't reported metrics for a while, assume that last time that it was reporting, it was in great shape (e.g., on startup, waiting queue size = 0, kvcache = 0).

so we have:
pod1 - able to serve requests. reports as healthy. but hasn't reported metrics for a while and last time we collected metrics from it, it's was idle.
pod2 - partially busy.
pod3 - partially busy.

every incoming request that will go through the saturation detection identify that pod1 shouldn't be considered as a final target, but still pod1 may be selected to serve requests and its stale metrics that shows it's idle, so pod1 has high chances of getting selected.
this may happen continuously, so multiple requests can be sent to pod1, leading to unpredicted state (pod1 may not be able to handle the load).

**What you expected to happen**:
pod freshness of metrics shouldn't be part of saturation detector, but rather these pods should be filtered when getting the candidate pods from the datastore. stale metrics is not a sign of saturation, it's a sign of invalid pod that shouldn't be taken into consideration for the request serving.

**How to reproduce it (as minimally and precisely as possible)**:

**Anything else we need to know?**:

**Environment**:
- Kubernetes version (use `kubectl version`):
- Inference extension version (use `git describe --tags --dirty --always`):
- Cloud provider or hardware configuration:
- Install tools:
- Others:

cc: @LukeAVanDrie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metrics staleness check should be moved from saturation detector to getting candidate pods per request #1497

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

metrics staleness check should be moved from saturation detector to getting candidate pods per request #1497

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions