Skip to content

Enable collection of configured / loaded models in each inference serving endpoint #466

@elevran

Description

@elevran

What would you like to be added:
The ability to collect and use information from /v1/models to direct traffic to specific endpoints.

Why is this needed:
Models can be loaded at runtime and the set of available models changed at runtime.
To support divergent model availability we would need to collect the set of models on each endpoint and confirm model availability on serving inference.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions