It may be useful to add support for OpenRouter, which acts as a unified API gateway for many of the already supported models and more.
The docs state that OpenRouter “will select the least expensive and best GPUs available to serve the request” and “fall back … if you are rate-limited.” which I think it's interesting for both resilience and potential cost efficiency.