/kind feature
Describe the solution you'd like
As of now, while the model is being loaded into memory, user queries may still reach a pod which may not be ready to answer (since the model is not loaded). The larger the model the larger the time to load it the larger the "downtime"
Anything else you would like to add:
This could be useful in different tasks like model/runtime canary rollout, replicas auto-scaling, etc