Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/inference-providers/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@
- local: security
title: Security

- title: Guides
sections:
- local: guides/first-api-call
title: Your First API Call
- local: guides/building-first-app
title: Building Your First AI App

- title: Providers
sections:
- local: providers/cerebras
Expand Down
226 changes: 226 additions & 0 deletions docs/inference-providers/guides/building-first-app.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Building Your First AI App with Inference Providers

You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items.

This project demonstrates real-world AI orchestration using multiple specialized providers within a single application.

## Project Overview

Our app will:
1. **Accept audio** as a microphone input through a web interface
2. **Transcribe speech** using a fast speech-to-text model
3. **Generate summaries** using a powerful language model
4. **Deploy to the web** for easy sharing

**Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI)

## Step 1: Set Up Authentication

Before we start coding, authenticate with Hugging Face using the CLI:

```bash
pip install huggingface_hub
huggingface-cli login
```

When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).

## Step 2: Build the User Interface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can upload these as a colab notebook, so that people can just execute these as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. I'll gunna come back to this and just re-use the new model repo notebooks.


Now let's create a simple web interface using Gradio:

```python
import gradio as gr
from huggingface_hub import InferenceClient

def process_meeting_audio(audio_file):
"""Process uploaded audio file and return transcript + summary"""
if audio_file is None:
return "Please upload an audio file.", ""

# We'll implement the AI logic next
return "Transcript will appear here...", "Summary will appear here..."

# Create the Gradio interface
app = gr.Interface(
fn=process_meeting_audio,
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
outputs=[
gr.Textbox(label="Transcript", lines=10),
gr.Textbox(label="Summary & Action Items", lines=8)
],
title="🎤 AI Meeting Notes",
description="Upload an audio file to get an instant transcript and summary with action items."
)

if __name__ == "__main__":
app.launch()
```

Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items.

## Step 3: Add Speech Transcription

Now let's implement the transcription using `fal.ai` and OpenAI's `whisper-large-v3` model for fast, reliable speech processing:

```python
def transcribe_audio(audio_file_path):
"""Transcribe audio using fal.ai for speed"""
client = InferenceClient(provider="fal-ai")

# Pass the file path directly - the client handles file reading
transcript = client.automatic_speech_recognition(
audio=audio_file_path,
model="openai/whisper-large-v3"
)

return transcript.text
```

## Step 4: Add AI Summarization

Next, we'll use a powerful language model like `Qwen/Qwen3-235B-A22B-FP8` from Qwen via Together AI for summarization:

```python
def generate_summary(transcript):
"""Generate summary using Together AI"""
client = InferenceClient(provider="together")

prompt = f"""
Analyze this meeting transcript and provide:
1. A concise summary of key points
2. Action items with responsible parties
3. Important decisions made

Transcript: {transcript}

Format with clear sections:
## Summary
## Action Items
## Decisions Made
"""

response = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-FP8",
messages=[{"role": "user", "content": prompt}],
max_tokens=1000
)

return response.choices[0].message.content
```

Note, we're also defining a custom summary prompt to ensure the output is formatted as a summary with action items and decisions made.

## Step 5: Deploy on Hugging Face Spaces

To deploy, we'll need to create a `requirements.txt` file and a `app.py` file.

`requirements.txt`:

```txt
gradio
huggingface_hub
```

`app.py`:

<details>
<summary><strong>📋 Click to view the complete app.py file</strong></summary>

```python
import gradio as gr
from huggingface_hub import InferenceClient


def transcribe_audio(audio_file_path):
"""Transcribe audio using fal.ai for speed"""
client = InferenceClient(provider="fal-ai")

# Pass the file path directly - the client handles file reading
transcript = client.automatic_speech_recognition(
audio=audio_file_path, model="openai/whisper-large-v3"
)

return transcript.text


def generate_summary(transcript):
"""Generate summary using Together AI"""
client = InferenceClient(provider="together")

prompt = f"""
Analyze this meeting transcript and provide:
1. A concise summary of key points
2. Action items with responsible parties
3. Important decisions made

Transcript: {transcript}

Format with clear sections:
## Summary
## Action Items
## Decisions Made
"""

response = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-FP8",
messages=[{"role": "user", "content": prompt}],
max_tokens=1000,
)

return response.choices[0].message.content


def process_meeting_audio(audio_file):
"""Main processing function"""
if audio_file is None:
return "Please upload an audio file.", ""

try:
# Step 1: Transcribe
transcript = transcribe_audio(audio_file)

# Step 2: Summarize
summary = generate_summary(transcript)

return transcript, summary

except Exception as e:
return f"Error processing audio: {str(e)}", ""


# Create Gradio interface
app = gr.Interface(
fn=process_meeting_audio,
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
outputs=[
gr.Textbox(label="Transcript", lines=10),
gr.Textbox(label="Summary & Action Items", lines=8),
],
title="🎤 AI Meeting Notes",
description="Upload audio to get instant transcripts and summaries.",
)

if __name__ == "__main__":
app.launch()
```

</details>

To deploy, we'll need to create a new Space and upload our files.

1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space)
2. **Choose Gradio SDK** and make it public
3. **Upload your files**: Upload `app.py` and `requirements.txt`
4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens))
5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name`

> **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment.

## Next Steps

Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps:

- **Improve your prompt**: Try different prompts to improve the quality for your use case
- **Try different models**: Experiment with various speech and text models
- **Compare performance**: Benchmark speed vs. accuracy across providers
131 changes: 131 additions & 0 deletions docs/inference-providers/guides/first-api-call.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Your First Inference Provider Call

In this guide we're going to help you make your first API call with Inference Providers.

Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required.

We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model.

<Tip>

This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co).

</Tip>

## Step 1: Find a Model on the Hub

Visit the [Hugging Face Hub](https://huggingface.co/models) and look for models with the "Inference Providers" filter, you can select the provider that you want. We'll go with `fal`.

![search image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/search.png)

For this example, we'll use [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side.

## Step 2: Try the Interactive Widget

Before writing any code, try the widget directly on the model page:

![widget image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/widget.png)

Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects.

1. Enter a prompt like "A serene mountain landscape at sunset"
2. Click **"Generate"**
3. Watch as the model creates an image in seconds

This widget uses the same endpoint you're about to implement in code.

<Tip warning={true}>

You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model.

</Tip>

## Step 3: From Clicks to Code

Now let's replicate this with Python. Click the **"View Code Snippets"** button in the widget to see the generated code snippets.

![code snippets image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/code-snippets.png)

You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your [settings page](https://huggingface.co/settings/tokens).

Set your token as an environment variable:

```bash
export HF_TOKEN="your_token_here"
```

The Python or TypeScript code snippet will use the token from the environment variable.

<hfoptions id="python-code-snippet">

<hfoption id="python">

Install the required package:

```bash
pip install huggingface_hub
```

You can now use the code snippet to generate an image:

```python
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
provider="fal-ai",
api_key=os.environ["HF_TOKEN"],
)

# output is a PIL.Image object
image = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-schnell",
)
```

</hfoption>

<hfoption id="typescript">

Install the required package:

```bash
npm install @huggingface/inference
```

```typescript
import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const image = await client.textToImage({
provider: "fal-ai",
model: "black-forest-labs/FLUX.1-schnell",
inputs: "Astronaut riding a horse",
parameters: { num_inference_steps: 5 },
});
/// Use the generated image (it's a Blob)
```
</hfoption>

</hfoptions>

## What Just Happened?

Nice work! You've successfully used a production-grade AI model without any complex setup. In just a few lines of code, you:

- Connected to a powerful text-to-image model
- Generated a custom image from text
- Saved the result locally

The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically.

## Next Steps

Now that you've seen how easy it is to use AI models, you might wonder:
- What was that "provider" system doing behind the scenes?
- How does billing work?
- What other models can you use?

Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing.
Loading