You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -345,6 +345,178 @@ TODO: explain how a user or org can specify the order of selection for providers
345
345
346
346
TODO: explain implementation details? (no URL rewrite, just proxy)
347
347
348
+
## Provider Selection
349
+
350
+
The Inference Providers API acts as a unified proxy layer that sits between your application and multiple AI providers. Understanding how provider selection works is crucial for optimizing performance, cost, and reliability in your applications.
351
+
352
+
### API as a Proxy Service
353
+
354
+
When using Inference Providers, your requests go through Hugging Face's proxy infrastructure, which provides several key benefits:
355
+
356
+
-**Unified Authentication & Billing**: Use a single Hugging Face token for all providers
357
+
-**Automatic Failover**: If one provider is unavailable, requests can be routed to alternatives
358
+
-**Rate Limiting & Load Balancing**: Intelligent distribution of requests across providers
359
+
-**Consistent API Interface**: The same request format works across different providers
360
+
361
+
Because the API acts as a proxy, the exact HTTP request may vary between providers as each provider has their own API requirements and response formats. The Hugging Face inference clients handle these provider-specific differences automatically when you use `provider="auto"` or specify a particular provider.
When using the Hugging Face inference clients (JavaScript or Python), you can explicitly specify a provider or let the system choose automatically. The client then formats the HTTP request to match the selected provider's API requirements.
-`provider: "auto"` (default): Selects the first available provider for the model, sorted by your preference order in [Inference Provider settings](https://hf.co/settings/inference-providers)
425
+
-`provider: "specific-provider"`: Forces use of a specific provider (e.g., "together", "replicate", "fal-ai", ...)
If you prefer to work with familiar OpenAI APIs or want to migrate existing chat completion code with minimal changes, we offer a drop-in compatible endpoint that handles all provider selection automatically on the server side.
430
+
431
+
**Note**: This OpenAI-compatible endpoint is currently available for chat completion tasks only. For other tasks like text-to-image, embeddings, or speech processing, use the Hugging Face inference clients shown above.
This endpoint can also be requested through direct HTTP access, making it suitable for integration with various HTTP clients and applications that need to interact with the chat completion service directly.
0 commit comments