huggingface · burtenshaw · Jul 3, 2025 · Jun 26, 2025 · Jun 26, 2025 · Jun 26, 2025
diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml
@@ -11,6 +11,13 @@
   - local: security
     title: Security
 
+- title: Guides
+  sections:
+  - local: guides/first-api-call
+    title: Your First API Call
+  - local: guides/building-first-app
+    title: Building Your First AI App
+
 - title: Providers
   sections:
   - local: providers/cerebras

diff --git a/docs/inference-providers/guides/building-first-app.md b/docs/inference-providers/guides/building-first-app.md
@@ -0,0 +1,226 @@
+# Building Your First AI App with Inference Providers
+
+You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items.
+
+This project demonstrates real-world AI orchestration using multiple specialized providers within a single application.
+
+## Project Overview
+
+Our app will:
+1. **Accept audio** as a microphone input through a web interface
+2. **Transcribe speech** using a fast speech-to-text model
+3. **Generate summaries** using a powerful language model
+4. **Deploy to the web** for easy sharing
+
+**Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI)
+
+## Step 1: Set Up Authentication
+
+Before we start coding, authenticate with Hugging Face using the CLI:
+
+```bash
+pip install huggingface_hub
+huggingface-cli login
+```
+
+When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
+
+## Step 2: Build the User Interface
+
+Now let's create a simple web interface using Gradio:
+
+```python
+import gradio as gr
+from huggingface_hub import InferenceClient
+
+def process_meeting_audio(audio_file):
+    """Process uploaded audio file and return transcript + summary"""
+    if audio_file is None:
+        return "Please upload an audio file.", ""
+
+    # We'll implement the AI logic next
+    return "Transcript will appear here...", "Summary will appear here..."
+
+# Create the Gradio interface
+app = gr.Interface(
+    fn=process_meeting_audio,
+    inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
+    outputs=[
+        gr.Textbox(label="Transcript", lines=10),
+        gr.Textbox(label="Summary & Action Items", lines=8)
+    ],
+    title="🎤 AI Meeting Notes",
+    description="Upload an audio file to get an instant transcript and summary with action items."
+)
+
+if __name__ == "__main__":
+    app.launch()
+```
+
+Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items.
+
+## Step 3: Add Speech Transcription
+
+Now let's implement the transcription using `fal.ai` and OpenAI's `whisper-large-v3` model for fast, reliable speech processing:
+
+```python
+def transcribe_audio(audio_file_path):
+    """Transcribe audio using fal.ai for speed"""
+    client = InferenceClient(provider="fal-ai")
+
+    # Pass the file path directly - the client handles file reading
+    transcript = client.automatic_speech_recognition(
+        audio=audio_file_path,
+        model="openai/whisper-large-v3"
+    )
+
+    return transcript.text
+```
+
+## Step 4: Add AI Summarization
+
+Next, we'll use a powerful language model like `Qwen/Qwen3-235B-A22B-FP8` from Qwen via Together AI for summarization:
+
+```python
+def generate_summary(transcript):
+    """Generate summary using Together AI"""
+    client = InferenceClient(provider="together")
+
+    prompt = f"""
+    Analyze this meeting transcript and provide:
+    1. A concise summary of key points
+    2. Action items with responsible parties
+    3. Important decisions made
+
+    Transcript: {transcript}
+
+    Format with clear sections:
+    ## Summary
+    ## Action Items  
+    ## Decisions Made
+    """
+
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-235B-A22B-FP8",
+        messages=[{"role": "user", "content": prompt}],
+        max_tokens=1000
+    )
+
+    return response.choices[0].message.content
+```
+
+Note, we're also defining a custom summary prompt to ensure the output is formatted as a summary with action items and decisions made.
+
+## Step 5: Deploy on Hugging Face Spaces
+
+To deploy, we'll need to create a `requirements.txt` file and a `app.py` file.
+
+`requirements.txt`:
+
+```txt
+gradio
+huggingface_hub
+```
+
+`app.py`:
+
+<details>
+<summary><strong>📋 Click to view the complete app.py file</strong></summary>
+
+```python
+import gradio as gr
+from huggingface_hub import InferenceClient
+
+
+def transcribe_audio(audio_file_path):
+    """Transcribe audio using fal.ai for speed"""
+    client = InferenceClient(provider="fal-ai")
+
+    # Pass the file path directly - the client handles file reading
+    transcript = client.automatic_speech_recognition(
+        audio=audio_file_path, model="openai/whisper-large-v3"
+    )
+
+    return transcript.text
+
+
+def generate_summary(transcript):
+    """Generate summary using Together AI"""
+    client = InferenceClient(provider="together")
+
+    prompt = f"""
+    Analyze this meeting transcript and provide:
+    1. A concise summary of key points
+    2. Action items with responsible parties
+    3. Important decisions made
+
+    Transcript: {transcript}
+
+    Format with clear sections:
+    ## Summary
+    ## Action Items  
+    ## Decisions Made
+    """
+
+    response = client.chat.completions.create(
+        model="Qwen/Qwen3-235B-A22B-FP8",
+        messages=[{"role": "user", "content": prompt}],
+        max_tokens=1000,
+    )
+
+    return response.choices[0].message.content
+
+
+def process_meeting_audio(audio_file):
+    """Main processing function"""
+    if audio_file is None:
+        return "Please upload an audio file.", ""
+
+    try:
+        # Step 1: Transcribe
+        transcript = transcribe_audio(audio_file)
+
+        # Step 2: Summarize
+        summary = generate_summary(transcript)
+
+        return transcript, summary
+
+    except Exception as e:
+        return f"Error processing audio: {str(e)}", ""
+
+
+# Create Gradio interface
+app = gr.Interface(
+    fn=process_meeting_audio,
+    inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
+    outputs=[
+        gr.Textbox(label="Transcript", lines=10),
+        gr.Textbox(label="Summary & Action Items", lines=8),
+    ],
+    title="🎤 AI Meeting Notes",
+    description="Upload audio to get instant transcripts and summaries.",
+)
+
+if __name__ == "__main__":
+    app.launch()
+```
+
+</details>
+
+To deploy, we'll need to create a new Space and upload our files.
+
+1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space)
+2. **Choose Gradio SDK** and make it public
+3. **Upload your files**: Upload `app.py` and `requirements.txt`
+4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens))
+5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name`
+
+> **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment.
+
+## Next Steps
+
+Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps:
+
+- **Improve your prompt**: Try different prompts to improve the quality for your use case
+- **Try different models**: Experiment with various speech and text models
+- **Compare performance**: Benchmark speed vs. accuracy across providers
diff --git a/docs/inference-providers/guides/first-api-call.md b/docs/inference-providers/guides/first-api-call.md
@@ -0,0 +1,131 @@
+# Your First Inference Provider Call
+
+In this guide we're going to help you make your first API call with Inference Providers.
+
+Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required.
+
+We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model.
+
+<Tip>
+
+This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).
+
+</Tip>
+
+## Step 1: Find a Model on the Hub
+
+Visit the [Hugging Face Hub](https://huggingface.co/models) and look for models with the "Inference Providers" filter, you can select the provider that you want. We'll go with `fal`.
+
+![search image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/search.png)
+
+For this example, we'll use [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side. 
+
+## Step 2: Try the Interactive Widget
+
+Before writing any code, try the widget directly on the model page:
+
+![widget image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/widget.png)
+
+Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects.
+
+1. Enter a prompt like "A serene mountain landscape at sunset"
+2. Click **"Generate"**
+3. Watch as the model creates an image in seconds
+
+This widget uses the same endpoint you're about to implement in code.
+
+<Tip warning={true}>
+
+You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model.
+
+</Tip>
+
+## Step 3: From Clicks to Code
+
+Now let's replicate this with Python. Click the **"View Code Snippets"** button in the widget to see the generated code snippets.
+
+![code snippets image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/code-snippets.png)
+
+You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your [settings page](https://huggingface.co/settings/tokens).
+
+Set your token as an environment variable:
+
+```bash
+export HF_TOKEN="your_token_here"
+```
+
+The Python or TypeScript code snippet will use the token from the environment variable.
+
+<hfoptions id="python-code-snippet">
+
+<hfoption id="python">
+
+Install the required package:
+
+```bash
+pip install huggingface_hub
+```
+
+You can now use the code snippet to generate an image:
+
+```python
+import os
+from huggingface_hub import InferenceClient
+
+client = InferenceClient(
+    provider="fal-ai",
+    api_key=os.environ["HF_TOKEN"],
+)
+
+# output is a PIL.Image object
+image = client.text_to_image(
+    "Astronaut riding a horse",
+    model="black-forest-labs/FLUX.1-schnell",
+)
+```
+
+</hfoption>
+
+<hfoption id="typescript">
+
+Install the required package:
+
+```bash
+npm install @huggingface/inference
+```
+
+```typescript
+import { InferenceClient } from "@huggingface/inference";
+
+const client = new InferenceClient(process.env.HF_TOKEN);
+
+const image = await client.textToImage({
+    provider: "fal-ai",
+    model: "black-forest-labs/FLUX.1-schnell",
+	inputs: "Astronaut riding a horse",
+	parameters: { num_inference_steps: 5 },
+});
+/// Use the generated image (it's a Blob)
+```
+</hfoption>
+
+</hfoptions>
+
+## What Just Happened?
+
+Nice work! You've successfully used a production-grade AI model without any complex setup. In just a few lines of code, you:
+
+- Connected to a powerful text-to-image model
+- Generated a custom image from text
+- Saved the result locally
+
+The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically.
+
+## Next Steps
+
+Now that you've seen how easy it is to use AI models, you might wonder:
+- What was that "provider" system doing behind the scenes?
+- How does billing work?
+- What other models can you use?
+
+Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing.