diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml
index e1a2c9591..7b9093114 100644
--- a/docs/inference-providers/_toctree.yml
+++ b/docs/inference-providers/_toctree.yml
@@ -11,6 +11,13 @@
- local: security
title: Security
+- title: Guides
+ sections:
+ - local: guides/first-api-call
+ title: Your First API Call
+ - local: guides/building-first-app
+ title: Building Your First AI App
+
- title: Providers
sections:
- local: providers/cerebras
diff --git a/docs/inference-providers/guides/building-first-app.md b/docs/inference-providers/guides/building-first-app.md
new file mode 100644
index 000000000..b69d567ee
--- /dev/null
+++ b/docs/inference-providers/guides/building-first-app.md
@@ -0,0 +1,581 @@
+# Building Your First AI App with Inference Providers
+
+You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items.
+
+This project demonstrates real-world AI orchestration using multiple specialized providers within a single application.
+
+## Project Overview
+
+Our app will:
+1. **Accept audio** as a microphone input through a web interface
+2. **Transcribe speech** using a fast speech-to-text model
+3. **Generate summaries** using a powerful language model
+4. **Deploy to the web** for easy sharing
+
+
+
+
+**Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI)
+
+
+
+
+**Tech Stack**: HTML/JavaScript (for the UI) + Inference Providers (for the AI)
+
+We'll use HTML and JavaScript for the UI just to keep things simple and agnostic, but if you want to see more mature examples, you can check out the [Hugging Face JS spaces](https://huggingface.co/huggingfacejs/spaces) page.
+
+
+
+
+## Step 1: Set Up Authentication
+
+
+
+
+Before we start coding, authenticate with Hugging Face using the CLI:
+
+```bash
+pip install huggingface_hub
+huggingface-cli login
+```
+
+When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
+
+
+
+
+You'll need your Hugging Face token. Get one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). We can set it as an environment variable in our app.
+
+```bash
+export HF_TOKEN="your_token_here"
+```
+
+```javascript
+// Add your token at the top of your script
+const HF_TOKEN = process.env.HF_TOKEN;
+```
+
+
+
+When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code.
+
+
+
+
+
+
+## Step 2: Build the User Interface
+
+
+
+
+Now let's create a simple web interface using Gradio:
+
+```python
+import gradio as gr
+from huggingface_hub import InferenceClient
+
+def process_meeting_audio(audio_file):
+ """Process uploaded audio file and return transcript + summary"""
+ if audio_file is None:
+ return "Please upload an audio file.", ""
+
+ # We'll implement the AI logic next
+ return "Transcript will appear here...", "Summary will appear here..."
+
+# Create the Gradio interface
+app = gr.Interface(
+ fn=process_meeting_audio,
+ inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
+ outputs=[
+ gr.Textbox(label="Transcript", lines=10),
+ gr.Textbox(label="Summary & Action Items", lines=8)
+ ],
+ title="🎤 AI Meeting Notes",
+ description="Upload an audio file to get an instant transcript and summary with action items."
+)
+
+if __name__ == "__main__":
+ app.launch()
+```
+
+Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items.
+
+
+
+
+For JavaScript, we'll create a clean HTML interface with native file upload and a simple loading state:
+
+```html
+
+
🎤 AI Meeting Notes
+
+
+
+
Upload audio file
+
+
+
+
Processing...
+
+
+
+
📝 Transcript
+
+
+
+
📋 Summary
+
+
+
+
+```
+
+This creates a clean drag-and-drop interface with styled results sections for the transcript and summary.
+
+Our application can then use the `InferenceClient` from `huggingface.js` to call the transcription and summarization functions.
+
+```javascript
+import { InferenceClient } from 'https://esm.sh/@huggingface/inference';
+
+// Access the token from Hugging Face Spaces secrets
+const HF_TOKEN = window.huggingface?.variables?.HF_TOKEN;
+// Or if you're running locally, you can set it as an environment variable
+// const HF_TOKEN = process.env.HF_TOKEN;
+
+document.getElementById('file').onchange = async (e) => {
+ if (!e.target.files[0]) return;
+
+ const file = e.target.files[0];
+
+ show(document.getElementById('loading'));
+ hide(document.getElementById('results'), document.getElementById('error'));
+
+ try {
+ const transcript = await transcribe(file);
+ const summary = await summarize(transcript);
+
+ document.getElementById('transcript').textContent = transcript;
+ document.getElementById('summary').textContent = summary;
+
+ hide(document.getElementById('loading'));
+ show(document.getElementById('results'));
+ } catch (error) {
+ hide(document.getElementById('loading'));
+ showError(`Error: ${error.message}`);
+ }
+};
+```
+
+We'll also need to implement the `transcribe` and `summarize` functions.
+
+
+
+
+## Step 3: Add Speech Transcription
+
+
+
+
+Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.
+
+We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
+
+
+```python
+def transcribe_audio(audio_file_path):
+ """Transcribe audio using fal.ai for speed"""
+ client = InferenceClient(provider="auto")
+
+ # Pass the file path directly - the client handles file reading
+ transcript = client.automatic_speech_recognition(
+ audio=audio_file_path,
+ model="openai/whisper-large-v3"
+ )
+
+ return transcript.text
+```
+
+
+
+
+
+Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing.
+
+We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page.
+
+
+```javascript
+import { InferenceClient } from 'https://esm.sh/@huggingface/inference';
+
+async function transcribe(file) {
+ const client = new InferenceClient(HF_TOKEN);
+
+ const output = await client.automaticSpeechRecognition({
+ data: file,
+ model: "openai/whisper-large-v3-turbo",
+ provider: "auto"
+ });
+
+ return output.text || output || 'Transcription completed';
+}
+```
+
+
+
+
+
+## Step 4: Add AI Summarization
+
+
+
+
+Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider, and just like in the previous step, we'll use the `auto` provider to automatically select the first available provider for the model.
+We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made:
+
+```python
+def generate_summary(transcript):
+ """Generate summary using an Inference Provider"""
+ client = InferenceClient(provider="auto")
+
+ prompt = f"""
+ Analyze this meeting transcript and provide:
+ 1. A concise summary of key points
+ 2. Action items with responsible parties
+ 3. Important decisions made
+
+ Transcript: {transcript}
+
+ Format with clear sections:
+ ## Summary
+ ## Action Items
+ ## Decisions Made
+ """
+
+ response = client.chat.completions.create(
+ model="deepseek-ai/DeepSeek-R1-0528",
+ messages=[{"role": "user", "content": prompt}],
+ max_tokens=1000
+ )
+
+ return response.choices[0].message.content
+```
+
+
+
+
+
+Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider, and just like in the previous step, we'll use the `auto` provider to automatically select the first available provider for the model.
+We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made:
+
+```javascript
+async function summarize(transcript) {
+ const client = new InferenceClient(HF_TOKEN);
+
+ const prompt = `Analyze this meeting transcript and provide:
+ 1. A concise summary of key points
+ 2. Action items with responsible parties
+ 3. Important decisions made
+
+ Transcript: ${transcript}
+
+ Format with clear sections:
+ ## Summary
+ ## Action Items
+ ## Decisions Made`;
+
+ const response = await client.chatCompletion({
+ model: "deepseek-ai/DeepSeek-R1-0528",
+ messages: [
+ {
+ role: "user",
+ content: prompt
+ }
+ ],
+ max_tokens: 1000
+ }, {
+ provider: "auto"
+ });
+
+ return response.choices?.[0]?.message?.content || response || 'No summary available';
+}
+```
+
+
+
+
+
+## Step 5: Deploy on Hugging Face Spaces
+
+
+
+
+To deploy, we'll need to create an `app.py` file and upload it to Hugging Face Spaces.
+
+
+📋 Click to view the complete app.py file
+
+```python
+import gradio as gr
+from huggingface_hub import InferenceClient
+
+
+def transcribe_audio(audio_file_path):
+ """Transcribe audio using an Inference Provider"""
+ client = InferenceClient(provider="auto")
+
+ # Pass the file path directly - the client handles file reading
+ transcript = client.automatic_speech_recognition(
+ audio=audio_file_path, model="openai/whisper-large-v3"
+ )
+
+ return transcript.text
+
+
+def generate_summary(transcript):
+ """Generate summary using an Inference Provider"""
+ client = InferenceClient(provider="auto")
+
+ prompt = f"""
+ Analyze this meeting transcript and provide:
+ 1. A concise summary of key points
+ 2. Action items with responsible parties
+ 3. Important decisions made
+
+ Transcript: {transcript}
+
+ Format with clear sections:
+ ## Summary
+ ## Action Items
+ ## Decisions Made
+ """
+
+ response = client.chat.completions.create(
+ model="deepseek-ai/DeepSeek-R1-0528",
+ messages=[{"role": "user", "content": prompt}],
+ max_tokens=1000,
+ )
+
+ return response.choices[0].message.content
+
+
+def process_meeting_audio(audio_file):
+ """Main processing function"""
+ if audio_file is None:
+ return "Please upload an audio file.", ""
+
+ try:
+ # Step 1: Transcribe
+ transcript = transcribe_audio(audio_file)
+
+ # Step 2: Summarize
+ summary = generate_summary(transcript)
+
+ return transcript, summary
+
+ except Exception as e:
+ return f"Error processing audio: {str(e)}", ""
+
+
+# Create Gradio interface
+app = gr.Interface(
+ fn=process_meeting_audio,
+ inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
+ outputs=[
+ gr.Textbox(label="Transcript", lines=10),
+ gr.Textbox(label="Summary & Action Items", lines=8),
+ ],
+ title="🎤 AI Meeting Notes",
+ description="Upload audio to get instant transcripts and summaries.",
+)
+
+if __name__ == "__main__":
+ app.launch()
+```
+
+Our app will run on port 7860 and look like this:
+
+
+
+
+
+To deploy, we'll need to create a new Space and upload our files.
+
+1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space)
+2. **Choose Gradio SDK** and make it public
+3. **Upload your files**: Upload `app.py`
+4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens))
+5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name`
+
+> **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment.
+
+
+
+
+For JavaScript deployment, create a simple static HTML file:
+
+
+📋 Click to view the complete index.html file
+
+```html
+
+
+
+
+ 🎤 AI Meeting Notes
+
+
+
+
🎤 AI Meeting Notes
+
+
+
+
Upload audio file
+
+
+
+
Processing...
+
+
+
+
+
📝 Transcript
+
+
+
+
📋 Summary
+
+
+
+
+
+
+
+```
+
+We can run our app locally by going to the file from our browser.
+
+
+
+
+
+To deploy:
+
+1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space)
+2. **Choose Static SDK** and make it public
+3. **Upload your file**: Upload `index.html`
+4. **Add your token as a secret**: In Space settings, add `HF_TOKEN` as a **Secret**
+5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name`
+
+> **Note**: The token is securely managed by Hugging Face Spaces and accessed via `window.huggingface.variables.HF_TOKEN`.
+
+
+
+
+## Next Steps
+
+Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps:
+
+- **Improve your prompt**: Try different prompts to improve the quality for your use case
+- **Try different models**: Experiment with various speech and text models
+- **Compare performance**: Benchmark speed vs. accuracy across providers
diff --git a/docs/inference-providers/guides/first-api-call.md b/docs/inference-providers/guides/first-api-call.md
new file mode 100644
index 000000000..24dd39649
--- /dev/null
+++ b/docs/inference-providers/guides/first-api-call.md
@@ -0,0 +1,240 @@
+# Your First Inference Provider Call
+
+In this guide we're going to help you make your first API call with Inference Providers.
+
+Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required.
+
+We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model.
+
+
+
+This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
+
+
+
+## Step 1: Find a Model on the Hub
+
+Visit the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-to-image&inference_provider=fal-ai,hf-inference,nebius,nscale,replicate,together&sort=trending) and look for models with the "Inference Providers" filter, you can select the provider that you want. We'll go with `fal`.
+
+
+
+For this example, we'll use [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side.
+
+## Step 2: Try the Interactive Widget
+
+Before writing any code, try the widget directly on the [model page](https://huggingface.co/black-forest-labs/FLUX.1-dev?inference_provider=fal-ai):
+
+
+
+Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects.
+
+1. Enter a prompt like "A serene mountain landscape at sunset"
+2. Click **"Generate"**
+3. Watch as the model creates an image in seconds
+
+This widget uses the same endpoint you're about to implement in code.
+
+
+
+You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model.
+
+
+
+## Step 3: From Clicks to Code
+
+Now let's replicate this with Python. Click the **"View Code Snippets"** button in the widget to see the [generated code snippets](https://huggingface.co/black-forest-labs/FLUX.1-dev?inference_api=true&language=python&inference_provider=auto).
+
+
+
+You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your [settings page](https://huggingface.co/settings/tokens).
+
+Set your token as an environment variable:
+
+```bash
+export HF_TOKEN="your_token_here"
+```
+
+The Python or TypeScript code snippet will use the token from the environment variable.
+
+
+
+
+
+Install the required package:
+
+```bash
+pip install huggingface_hub
+```
+
+You can now use the code snippet to generate an image in your app.
+
+```python
+import os
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+ provider="auto",
+ api_key=os.environ["HF_TOKEN"],
+)
+
+# output is a PIL.Image object
+image = client.text_to_image(
+ "Astronaut riding a horse",
+ model="black-forest-labs/FLUX.1-schnell",
+)
+```
+
+
+
+
+
+Install the required package:
+
+```bash
+npm install @huggingface/inference
+```
+
+Then, you can use the code snippet to generate an image in your app.
+
+```typescript
+import { InferenceClient } from "@huggingface/inference";
+
+const client = new InferenceClient(process.env.HF_TOKEN);
+
+const image = await client.textToImage({
+ provider: "auto",
+ model: "black-forest-labs/FLUX.1-schnell",
+ inputs: "Astronaut riding a horse",
+ parameters: { num_inference_steps: 5 },
+});
+/// Use the generated image (it's a Blob)
+```
+
+
+
+
+
+## What Just Happened?
+
+Nice work! You've successfully used a production-grade AI model without any complex setup. In just a few lines of code, you:
+
+- Connected to a powerful text-to-image model
+- Generated a custom image from text
+- Saved the result locally
+
+The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically.
+
+## Dive Deeper: Provider Selection
+
+You might have noticed the `provider="auto"` parameter in the code examples above. This is a key feature of Inference Providers that gives you control over which infrastructure provider handles your request.
+
+`auto` is powerful because:
+
+1. It makes it easy to switch between providers, and to test different providers' performance for your use case.
+2. It also gives a fallback mechanism in case a provider is unavailable.
+
+But if you want to be more specific, you can also specify a provider. Let's see how.
+
+### Understanding Provider Selection
+
+When you use `provider="auto"` (which is the default), the system automatically selects the first available provider for your chosen model based on your preference order in your [Inference Provider settings](https://hf.co/settings/inference-providers). This provides:
+
+- **Automatic failover**: If one provider is unavailable, the system tries the next one
+- **Simplified setup**: No need to research which providers support your model
+- **Optimal routing**: The system handles provider selection for you
+
+### Specifying a Specific Provider
+
+Alternatively, you can explicitly choose a provider if you have specific requirements:
+
+
+
+
+
+```python
+import os
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(api_key=os.environ["HF_TOKEN"])
+
+# Using automatic provider selection (default)
+image_auto = client.text_to_image(
+ "Astronaut riding a horse",
+ model="black-forest-labs/FLUX.1-schnell",
+ provider="auto" # This is the default
+)
+
+# Using a specific provider
+image_fal = client.text_to_image(
+ "Astronaut riding a horse",
+ model="black-forest-labs/FLUX.1-schnell",
+ provider="fal-ai" # Explicitly use Fal AI
+)
+
+# Using another specific provider
+image_replicate = client.text_to_image(
+ "Astronaut riding a horse",
+ model="black-forest-labs/FLUX.1-schnell",
+ provider="replicate" # Explicitly use Replicate
+)
+```
+
+
+
+
+
+```typescript
+import { InferenceClient } from "@huggingface/inference";
+
+const client = new InferenceClient(process.env.HF_TOKEN);
+
+// Using automatic provider selection (default)
+const imageAuto = await client.textToImage({
+ model: "black-forest-labs/FLUX.1-schnell",
+ inputs: "Astronaut riding a horse",
+ provider: "auto", // This is the default
+ parameters: { num_inference_steps: 5 },
+});
+
+// Using a specific provider
+const imageFal = await client.textToImage({
+ model: "black-forest-labs/FLUX.1-schnell",
+ inputs: "Astronaut riding a horse",
+ provider: "fal-ai", // Explicitly use Fal AI
+ parameters: { num_inference_steps: 5 },
+});
+
+// Using another specific provider
+const imageReplicate = await client.textToImage({
+ model: "black-forest-labs/FLUX.1-schnell",
+ inputs: "Astronaut riding a horse",
+ provider: "replicate", // Explicitly use Replicate
+ parameters: { num_inference_steps: 5 },
+});
+```
+
+
+
+
+
+### When to Use Each Approach
+
+**Use `provider="auto"` when:**
+- You're just getting started with Inference Providers
+- You want the simplest setup and maximum reliability
+- You don't have specific infrastructure requirements
+- You want automatic failover if a provider is unavailable
+
+**Use a specific provider when:**
+- You need consistent performance characteristics
+- You have specific billing or cost requirements
+- You want to test different providers' performance for your use case
+
+## Next Steps
+
+Now that you've seen how easy it is to use AI models, you might wonder:
+- What was that "provider" system doing behind the scenes?
+- How does billing work?
+- What other models can you use?
+
+Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing.
\ No newline at end of file
diff --git a/docs/inference-providers/index.md b/docs/inference-providers/index.md
index c3970c044..adec7b7a9 100644
--- a/docs/inference-providers/index.md
+++ b/docs/inference-providers/index.md
@@ -13,22 +13,22 @@ To learn more about the launch of Inference Providers, check out our [announceme
Here is the complete list of partners integrated with Inference Providers, and the supported tasks for each of them:
-| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video |
-| ---------------------------------------- | :-------------------: | :-------------------: | :----------------: | :-----------: | :-----------: |
-| [Cerebras](./providers/cerebras) | ✅ | | | | |
-| [Cohere](./providers/cohere) | ✅ | ✅ | | | |
-| [Fal AI](./providers/fal-ai) | | | | ✅ | ✅ |
-| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ | | | |
-| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ | | | |
-| [Groq](./providers/groq) | ✅ | | | | |
-| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ | |
-| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ | | | |
-| [Nebius](./providers/nebius) | ✅ | ✅ | ✅ | ✅ | |
-| [Novita](./providers/novita) | ✅ | ✅ | | | ✅ |
-| [Nscale](./providers/nscale) | ✅ | ✅ | | ✅ | |
-| [Replicate](./providers/replicate) | | | | ✅ | ✅ |
-| [SambaNova](./providers/sambanova) | ✅ | | ✅ | | |
-| [Together](./providers/together) | ✅ | ✅ | | ✅ | |
+| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video | Speech to text |
+| ---------------------------------------- | :-------------------: | :-------------------: | :----------------: | :-----------: | :-----------: | :-----------: |
+| [Cerebras](./providers/cerebras) | ✅ | | | | | |
+| [Cohere](./providers/cohere) | ✅ | ✅ | | | | |
+| [Fal AI](./providers/fal-ai) | | | | ✅ | ✅ | ✅ |
+| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ | | | | |
+| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ | | | | |
+| [Groq](./providers/groq) | ✅ | | | | | |
+| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ | | ✅ |
+| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ | | | | |
+| [Nebius](./providers/nebius) | ✅ | ✅ | ✅ | ✅ | | |
+| [Novita](./providers/novita) | ✅ | ✅ | | | ✅ | |
+| [Nscale](./providers/nscale) | ✅ | ✅ | | ✅ | | |
+| [Replicate](./providers/replicate) | | | | ✅ | ✅ | ✅ |
+| [SambaNova](./providers/sambanova) | ✅ | | ✅ | | | |
+| [Together](./providers/together) | ✅ | ✅ | | ✅ | | |
## Why use Inference Providers?