Skip to content

Commit a329c65

Browse files
committed
(vibe-coded) more comprehensive documentation
1 parent 8164742 commit a329c65

File tree

1 file changed

+173
-1
lines changed

1 file changed

+173
-1
lines changed

docs/inference-providers/index.md

Lines changed: 173 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,7 @@ import OpenAI from "openai";
216216

217217
const client = new OpenAI({
218218
baseURL: "https://router.huggingface.co/v1",
219-
apiKey: process.env["HF_TOKEN"],
219+
apiKey: process.env.HF_TOKEN,
220220
});
221221

222222
const completion = await client.chat.completions.create({
@@ -345,6 +345,178 @@ TODO: explain how a user or org can specify the order of selection for providers
345345

346346
TODO: explain implementation details? (no URL rewrite, just proxy)
347347

348+
## Provider Selection
349+
350+
The Inference Providers API acts as a unified proxy layer that sits between your application and multiple AI providers. Understanding how provider selection works is crucial for optimizing performance, cost, and reliability in your applications.
351+
352+
### API as a Proxy Service
353+
354+
When using Inference Providers, your requests go through Hugging Face's proxy infrastructure, which provides several key benefits:
355+
356+
- **Unified Authentication & Billing**: Use a single Hugging Face token for all providers
357+
- **Automatic Failover**: If one provider is unavailable, requests can be routed to alternatives
358+
- **Rate Limiting & Load Balancing**: Intelligent distribution of requests across providers
359+
- **Consistent API Interface**: The same request format works across different providers
360+
361+
Because the API acts as a proxy, the exact HTTP request may vary between providers as each provider has their own API requirements and response formats. The Hugging Face inference clients handle these provider-specific differences automatically when you use `provider="auto"` or specify a particular provider.
362+
363+
### Client-Side Provider Selection (Inference Clients)
364+
365+
When using the Hugging Face inference clients (JavaScript or Python), you can explicitly specify a provider or let the system choose automatically. The client then formats the HTTP request to match the selected provider's API requirements.
366+
367+
<hfoptions id="client-side-provider-selection">
368+
369+
<hfoption id="javascript">
370+
371+
```javascript
372+
import { InferenceClient } from "@huggingface/inference";
373+
374+
const client = new InferenceClient(process.env.HF_TOKEN);
375+
376+
// Explicit provider selection
377+
await client.chatCompletion({
378+
model: "meta-llama/Llama-3.1-8B-Instruct",
379+
provider: "sambanova", // Specific provider
380+
messages: [{ role: "user", content: "Hello!" }],
381+
});
382+
383+
// Automatic provider selection (default: "auto")
384+
await client.chatCompletion({
385+
model: "meta-llama/Llama-3.1-8B-Instruct",
386+
// Defaults to "auto" selection of the provider
387+
// provider="auto",
388+
messages: [{ role: "user", content: "Hello!" }],
389+
});
390+
```
391+
392+
</hfoption>
393+
394+
<hfoption id="python">
395+
396+
```python
397+
import os
398+
from huggingface_hub import InferenceClient
399+
400+
client = InferenceClient(token=os.environ["HF_TOKEN"])
401+
402+
# Explicit provider selection
403+
result = client.chat_completion(
404+
model="meta-llama/Llama-3.1-8B-Instruct",
405+
provider="sambanova", # Specific provider
406+
messages=[{"role": "user", "content": "Hello!"}],
407+
)
408+
409+
# Automatic provider selection (default: "auto")
410+
result = client.chat_completion(
411+
model="meta-llama/Llama-3.1-8B-Instruct",
412+
# Defaults to "auto" selection of the provider
413+
# provider="auto",
414+
messages=[{"role": "user", "content": "Hello!"}],
415+
)
416+
```
417+
418+
</hfoption>
419+
420+
</hfoptions>
421+
422+
**Provider Selection Policy:**
423+
424+
- `provider: "auto"` (default): Selects the first available provider for the model, sorted by your preference order in [Inference Provider settings](https://hf.co/settings/inference-providers)
425+
- `provider: "specific-provider"`: Forces use of a specific provider (e.g., "together", "replicate", "fal-ai", ...)
426+
427+
### Alternative: OpenAI-Compatible Chat Completions Endpoint (Chat Only)
428+
429+
If you prefer to work with familiar OpenAI APIs or want to migrate existing chat completion code with minimal changes, we offer a drop-in compatible endpoint that handles all provider selection automatically on the server side.
430+
431+
**Note**: This OpenAI-compatible endpoint is currently available for chat completion tasks only. For other tasks like text-to-image, embeddings, or speech processing, use the Hugging Face inference clients shown above.
432+
433+
<hfoptions id="openai-compatible">
434+
435+
<hfoption id="javascript">
436+
437+
```javascript
438+
import { OpenAI } from "openai";
439+
440+
const client = new OpenAI({
441+
baseURL: "https://router.huggingface.co/v1",
442+
apiKey: process.env.HF_TOKEN,
443+
});
444+
445+
const completion = await client.chat.completions.create({
446+
model: "meta-llama/Llama-3.1-8B-Instruct",
447+
messages: [{ role: "user", content: "Hello!" }],
448+
});
449+
```
450+
451+
</hfoption>
452+
453+
<hfoption id="python">
454+
455+
```python
456+
import os
457+
from openai import OpenAI
458+
459+
client = OpenAI(
460+
base_url="https://router.huggingface.co/v1",
461+
api_key=os.environ["HF_TOKEN"],
462+
)
463+
464+
completion = client.chat.completions.create(
465+
model="meta-llama/Llama-3.1-8B-Instruct",
466+
messages=[{"role": "user", "content": "Hello!"}],
467+
)
468+
469+
print(completion.choices[0].message.content)
470+
```
471+
472+
</hfoption>
473+
474+
</hfoptions>
475+
476+
This endpoint can also be requested through direct HTTP access, making it suitable for integration with various HTTP clients and applications that need to interact with the chat completion service directly.
477+
478+
```bash
479+
curl https://router.huggingface.co/v1/chat/completions \
480+
-H "Authorization: Bearer $HF_TOKEN" \
481+
-H "Content-Type: application/json" \
482+
-d '{
483+
"model": "meta-llama/Llama-3.1-8B-Instruct",
484+
"messages": [
485+
{
486+
"role": "user",
487+
"content": "Hello!"
488+
}
489+
]
490+
}'
491+
```
492+
493+
**Key Features:**
494+
495+
- **Server-Side Provider Selection**: The server automatically chooses the best available provider
496+
- **Model Listing**: GET `/v1/models` returns available models across all providers
497+
- **OpenAI SDK Compatibility**: Works with existing OpenAI client libraries
498+
- **Chat Tasks Only**: Limited to conversational workloads
499+
500+
### Choosing the Right Approach
501+
502+
**Use Inference Clients when:**
503+
504+
- You need support for all task types (text-to-image, speech, embeddings, etc.)
505+
- You want explicit control over provider selection
506+
- You're building applications that use multiple AI tasks
507+
508+
**Use OpenAI-Compatible Endpoint when:**
509+
510+
- You're only doing chat completions
511+
- You want to migrate existing OpenAI-based code with minimal changes
512+
- You prefer server-side provider management
513+
514+
**Use Direct HTTP when:**
515+
516+
- You're implementing custom request logic
517+
- You need fine-grained control over the request/response cycle
518+
- You're working in environments without available client libraries
519+
348520
## Next Steps
349521

350522
Now that you understand the basics, explore these resources to make the most of Inference Providers:

0 commit comments

Comments
 (0)