ModelNotHereException causing 8 retry iterations exhausted for model

**Describe the bug**

From time to time, our system spins out of control, throwing many ModelNotHereExceptions which eventually leading to "8 retry iterations exhausted for model".

Our registration process is completely automated, and triggered by a `registerModel` gRPC request (instead of a yaml configuration), followed by `ensureLoaded` request to validate that the registration has completed successfully.

Models:
The issues is not consistent per model: a failing invocation of a model can be successful on the next try, in case that the request is directed to a not-faulty mm pod (see the next section).

MM pods:
We have a few dozens mm pods , and the issues is very prominent in only some of them (<50%), addressed as "faulty" pods. Faulty pods are still functioning, meaning they are able to serve, run predictions and invoke internal requests, but have very high error rate due to the ModelNotHereExceptions.
It looks like faulty pods are somehow not synced with ETCD and invoke random internal requests.
All the mm pods are not new, and are running for a days/hours before the issue starts.
Note that non-faulty pods are also throwing these errors from time to time.

ETCD:
We do however suspect the ETCD, since its pods were restarted (for reasons unclear to us yet) and the faulty pods are only ones that were created prior to the ETCD restart.

Mitigation:
The issue usually stops when there is a scale in event, so some of the pods are terminated.
Note that a faulty pod might not be terminated, but the errors are stopped due to a termination of a different pod (maybe on that the problematic model was loaded on).

Example:
In the attached log file, you can see that a newly registered model `4774912c` is facing this issue, even though it was loaded on `modelmesh-serving-triton-2.x-768448c4fb-q9564`.
The external requests to the many faulty pods, are directed to 8 pods, which none of them is `modelmesh-serving-triton-2.x-768448c4fb-q9564`.

[report.csv](https://github.com/user-attachments/files/16507319/report.csv)


As you can see, the situation is very peculiar and we are not sure how to investigate further.
We are curious:
1. Why the ForwardingLB decides to randomly invoke inference requests to other pods, assuming the model is already loaded there?
2. How to continue this investigation?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelNotHereException causing 8 retry iterations exhausted for model #523

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ModelNotHereException causing 8 retry iterations exhausted for model #523

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions