-
Notifications
You must be signed in to change notification settings - Fork 374
[GUIDES] Improve inference providers documentation with guides #1797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
588daf9
add more detail to the billing page
burtenshaw cd9801b
add first basic guide on inference
burtenshaw 3ee3d4f
update index with tts providers
burtenshaw bbe629c
add building your first app guide
burtenshaw 3a51d0b
update toc with guides pages
burtenshaw 20b5364
move pricing change to separate pr 1799
burtenshaw 186e741
add js implementation to first app
burtenshaw 4212ef4
use auto not together
burtenshaw 381be97
use auto in full app code
burtenshaw 4316d65
add app screenshots
burtenshaw 2f56129
use auto in first api call
burtenshaw 9b7050b
simplify js logic
burtenshaw 49f164b
Update docs/inference-providers/guides/building-first-app.md
burtenshaw 26f9027
Apply suggestions from code review
burtenshaw cd49e24
add section on specify provider vs auto
burtenshaw d92998f
Merge branch 'improve-inference-providers-documentation' of https://g…
burtenshaw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,226 @@ | ||
| # Building Your First AI App with Inference Providers | ||
|
|
||
| You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items. | ||
|
|
||
| This project demonstrates real-world AI orchestration using multiple specialized providers within a single application. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| Our app will: | ||
| 1. **Accept audio** as a microphone input through a web interface | ||
| 2. **Transcribe speech** using a fast speech-to-text model | ||
| 3. **Generate summaries** using a powerful language model | ||
| 4. **Deploy to the web** for easy sharing | ||
|
|
||
| **Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI) | ||
|
|
||
| ## Step 1: Set Up Authentication | ||
|
|
||
| Before we start coding, authenticate with Hugging Face using the CLI: | ||
|
|
||
| ```bash | ||
| pip install huggingface_hub | ||
| huggingface-cli login | ||
| ``` | ||
|
|
||
| When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). | ||
|
|
||
| ## Step 2: Build the User Interface | ||
|
|
||
| Now let's create a simple web interface using Gradio: | ||
|
|
||
| ```python | ||
| import gradio as gr | ||
| from huggingface_hub import InferenceClient | ||
|
|
||
| def process_meeting_audio(audio_file): | ||
| """Process uploaded audio file and return transcript + summary""" | ||
| if audio_file is None: | ||
| return "Please upload an audio file.", "" | ||
|
|
||
| # We'll implement the AI logic next | ||
| return "Transcript will appear here...", "Summary will appear here..." | ||
|
|
||
| # Create the Gradio interface | ||
| app = gr.Interface( | ||
| fn=process_meeting_audio, | ||
| inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), | ||
| outputs=[ | ||
| gr.Textbox(label="Transcript", lines=10), | ||
| gr.Textbox(label="Summary & Action Items", lines=8) | ||
| ], | ||
| title="🎤 AI Meeting Notes", | ||
| description="Upload an audio file to get an instant transcript and summary with action items." | ||
| ) | ||
|
|
||
| if __name__ == "__main__": | ||
| app.launch() | ||
| ``` | ||
|
|
||
| Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items. | ||
|
|
||
| ## Step 3: Add Speech Transcription | ||
|
|
||
| Now let's implement the transcription using `fal.ai` and OpenAI's `whisper-large-v3` model for fast, reliable speech processing: | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python | ||
| def transcribe_audio(audio_file_path): | ||
| """Transcribe audio using fal.ai for speed""" | ||
| client = InferenceClient(provider="fal-ai") | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Pass the file path directly - the client handles file reading | ||
| transcript = client.automatic_speech_recognition( | ||
| audio=audio_file_path, | ||
| model="openai/whisper-large-v3" | ||
| ) | ||
|
|
||
| return transcript.text | ||
| ``` | ||
|
|
||
| ## Step 4: Add AI Summarization | ||
|
|
||
| Next, we'll use a powerful language model like `Qwen/Qwen3-235B-A22B-FP8` from Qwen via Together AI for summarization: | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python | ||
| def generate_summary(transcript): | ||
| """Generate summary using Together AI""" | ||
| client = InferenceClient(provider="together") | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| prompt = f""" | ||
| Analyze this meeting transcript and provide: | ||
| 1. A concise summary of key points | ||
| 2. Action items with responsible parties | ||
| 3. Important decisions made | ||
|
|
||
| Transcript: {transcript} | ||
|
|
||
| Format with clear sections: | ||
| ## Summary | ||
| ## Action Items | ||
| ## Decisions Made | ||
| """ | ||
|
|
||
| response = client.chat.completions.create( | ||
| model="Qwen/Qwen3-235B-A22B-FP8", | ||
| messages=[{"role": "user", "content": prompt}], | ||
| max_tokens=1000 | ||
| ) | ||
|
|
||
| return response.choices[0].message.content | ||
| ``` | ||
|
|
||
| Note, we're also defining a custom summary prompt to ensure the output is formatted as a summary with action items and decisions made. | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Step 5: Deploy on Hugging Face Spaces | ||
|
|
||
| To deploy, we'll need to create a `requirements.txt` file and a `app.py` file. | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| `requirements.txt`: | ||
|
|
||
| ```txt | ||
| gradio | ||
| huggingface_hub | ||
| ``` | ||
|
|
||
| `app.py`: | ||
|
|
||
| <details> | ||
| <summary><strong>📋 Click to view the complete app.py file</strong></summary> | ||
|
|
||
| ```python | ||
| import gradio as gr | ||
| from huggingface_hub import InferenceClient | ||
|
|
||
|
|
||
| def transcribe_audio(audio_file_path): | ||
| """Transcribe audio using fal.ai for speed""" | ||
| client = InferenceClient(provider="fal-ai") | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| # Pass the file path directly - the client handles file reading | ||
| transcript = client.automatic_speech_recognition( | ||
| audio=audio_file_path, model="openai/whisper-large-v3" | ||
| ) | ||
|
|
||
| return transcript.text | ||
|
|
||
|
|
||
| def generate_summary(transcript): | ||
| """Generate summary using Together AI""" | ||
| client = InferenceClient(provider="together") | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| prompt = f""" | ||
| Analyze this meeting transcript and provide: | ||
| 1. A concise summary of key points | ||
| 2. Action items with responsible parties | ||
| 3. Important decisions made | ||
|
|
||
| Transcript: {transcript} | ||
|
|
||
| Format with clear sections: | ||
| ## Summary | ||
| ## Action Items | ||
| ## Decisions Made | ||
| """ | ||
|
|
||
| response = client.chat.completions.create( | ||
| model="Qwen/Qwen3-235B-A22B-FP8", | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| messages=[{"role": "user", "content": prompt}], | ||
| max_tokens=1000, | ||
| ) | ||
|
|
||
| return response.choices[0].message.content | ||
|
|
||
|
|
||
| def process_meeting_audio(audio_file): | ||
| """Main processing function""" | ||
| if audio_file is None: | ||
| return "Please upload an audio file.", "" | ||
|
|
||
| try: | ||
| # Step 1: Transcribe | ||
| transcript = transcribe_audio(audio_file) | ||
|
|
||
| # Step 2: Summarize | ||
| summary = generate_summary(transcript) | ||
|
|
||
| return transcript, summary | ||
|
|
||
| except Exception as e: | ||
| return f"Error processing audio: {str(e)}", "" | ||
|
|
||
|
|
||
| # Create Gradio interface | ||
| app = gr.Interface( | ||
| fn=process_meeting_audio, | ||
| inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), | ||
| outputs=[ | ||
| gr.Textbox(label="Transcript", lines=10), | ||
| gr.Textbox(label="Summary & Action Items", lines=8), | ||
| ], | ||
| title="🎤 AI Meeting Notes", | ||
| description="Upload audio to get instant transcripts and summaries.", | ||
| ) | ||
|
|
||
| if __name__ == "__main__": | ||
| app.launch() | ||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| To deploy, we'll need to create a new Space and upload our files. | ||
burtenshaw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| 1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space) | ||
| 2. **Choose Gradio SDK** and make it public | ||
| 3. **Upload your files**: Upload `app.py` and `requirements.txt` | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens)) | ||
| 5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name` | ||
|
|
||
| > **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment. | ||
|
|
||
| ## Next Steps | ||
|
|
||
| Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps: | ||
|
|
||
| - **Improve your prompt**: Try different prompts to improve the quality for your use case | ||
| - **Try different models**: Experiment with various speech and text models | ||
| - **Compare performance**: Benchmark speed vs. accuracy across providers | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| # Your First Inference Provider Call | ||
|
|
||
| In this guide we're going to help you make your first API call with Inference Providers. | ||
|
|
||
| Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required. | ||
|
|
||
| We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model. | ||
|
|
||
| <Tip> | ||
|
|
||
| This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co). | ||
|
|
||
| </Tip> | ||
|
|
||
| ## Step 1: Find a Model on the Hub | ||
|
|
||
| Visit the [Hugging Face Hub](https://huggingface.co/models) and look for models with the "Inference Providers" filter, you can select the provider that you want. We'll go with `fal`. | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|  | ||
|
|
||
| For this example, we'll use [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side. | ||
|
|
||
| ## Step 2: Try the Interactive Widget | ||
|
|
||
| Before writing any code, try the widget directly on the model page: | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|  | ||
|
|
||
| Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects. | ||
|
|
||
| 1. Enter a prompt like "A serene mountain landscape at sunset" | ||
| 2. Click **"Generate"** | ||
| 3. Watch as the model creates an image in seconds | ||
|
|
||
| This widget uses the same endpoint you're about to implement in code. | ||
|
|
||
| <Tip warning={true}> | ||
|
|
||
| You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model. | ||
|
|
||
| </Tip> | ||
|
|
||
| ## Step 3: From Clicks to Code | ||
|
|
||
| Now let's replicate this with Python. Click the **"View Code Snippets"** button in the widget to see the generated code snippets. | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|  | ||
|
|
||
| You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your [settings page](https://huggingface.co/settings/tokens). | ||
|
|
||
| Set your token as an environment variable: | ||
|
|
||
| ```bash | ||
| export HF_TOKEN="your_token_here" | ||
| ``` | ||
|
|
||
| The Python or TypeScript code snippet will use the token from the environment variable. | ||
|
|
||
| <hfoptions id="python-code-snippet"> | ||
|
|
||
| <hfoption id="python"> | ||
|
|
||
| Install the required package: | ||
|
|
||
| ```bash | ||
| pip install huggingface_hub | ||
| ``` | ||
|
|
||
| You can now use the code snippet to generate an image: | ||
|
|
||
| ```python | ||
| import os | ||
| from huggingface_hub import InferenceClient | ||
|
|
||
| client = InferenceClient( | ||
| provider="fal-ai", | ||
burtenshaw marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| api_key=os.environ["HF_TOKEN"], | ||
| ) | ||
|
|
||
| # output is a PIL.Image object | ||
| image = client.text_to_image( | ||
| "Astronaut riding a horse", | ||
| model="black-forest-labs/FLUX.1-schnell", | ||
| ) | ||
| ``` | ||
|
|
||
| </hfoption> | ||
|
|
||
| <hfoption id="typescript"> | ||
|
|
||
| Install the required package: | ||
|
|
||
| ```bash | ||
| npm install @huggingface/inference | ||
| ``` | ||
|
|
||
| ```typescript | ||
| import { InferenceClient } from "@huggingface/inference"; | ||
|
|
||
| const client = new InferenceClient(process.env.HF_TOKEN); | ||
|
|
||
| const image = await client.textToImage({ | ||
| provider: "fal-ai", | ||
| model: "black-forest-labs/FLUX.1-schnell", | ||
| inputs: "Astronaut riding a horse", | ||
| parameters: { num_inference_steps: 5 }, | ||
| }); | ||
| /// Use the generated image (it's a Blob) | ||
| ``` | ||
| </hfoption> | ||
|
|
||
| </hfoptions> | ||
|
|
||
| ## What Just Happened? | ||
|
|
||
| Nice work! You've successfully used a production-grade AI model without any complex setup. In just a few lines of code, you: | ||
|
|
||
| - Connected to a powerful text-to-image model | ||
| - Generated a custom image from text | ||
| - Saved the result locally | ||
|
|
||
| The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically. | ||
|
|
||
| ## Next Steps | ||
|
|
||
| Now that you've seen how easy it is to use AI models, you might wonder: | ||
| - What was that "provider" system doing behind the scenes? | ||
| - How does billing work? | ||
| - What other models can you use? | ||
|
|
||
| Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can upload these as a colab notebook, so that people can just execute these as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. I'll gunna come back to this and just re-use the new model repo notebooks.