What would you like to be added:
The ability to collect and use information from /v1/models to direct traffic to specific endpoints.
Why is this needed:
Models can be loaded at runtime and the set of available models changed at runtime.
To support divergent model availability we would need to collect the set of models on each endpoint and confirm model availability on serving inference.