Skip to content

Commit bbe629c

Browse files
committed
add building your first app guide
1 parent 3ee3d4f commit bbe629c

File tree

1 file changed

+226
-0
lines changed

1 file changed

+226
-0
lines changed
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Building Your First AI App with Inference Providers
2+
3+
You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items.
4+
5+
This project demonstrates real-world AI orchestration using multiple specialized providers within a single application.
6+
7+
## Project Overview
8+
9+
Our app will:
10+
1. **Accept audio** as a microphone input through a web interface
11+
2. **Transcribe speech** using a fast speech-to-text model
12+
3. **Generate summaries** using a powerful language model
13+
4. **Deploy to the web** for easy sharing
14+
15+
**Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI)
16+
17+
## Step 1: Set Up Authentication
18+
19+
Before we start coding, authenticate with Hugging Face using the CLI:
20+
21+
```bash
22+
pip install huggingface_hub
23+
huggingface-cli login
24+
```
25+
26+
When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained).
27+
28+
## Step 2: Build the User Interface
29+
30+
Now let's create a simple web interface using Gradio:
31+
32+
```python
33+
import gradio as gr
34+
from huggingface_hub import InferenceClient
35+
36+
def process_meeting_audio(audio_file):
37+
"""Process uploaded audio file and return transcript + summary"""
38+
if audio_file is None:
39+
return "Please upload an audio file.", ""
40+
41+
# We'll implement the AI logic next
42+
return "Transcript will appear here...", "Summary will appear here..."
43+
44+
# Create the Gradio interface
45+
app = gr.Interface(
46+
fn=process_meeting_audio,
47+
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
48+
outputs=[
49+
gr.Textbox(label="Transcript", lines=10),
50+
gr.Textbox(label="Summary & Action Items", lines=8)
51+
],
52+
title="🎤 AI Meeting Notes",
53+
description="Upload an audio file to get an instant transcript and summary with action items."
54+
)
55+
56+
if __name__ == "__main__":
57+
app.launch()
58+
```
59+
60+
Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items.
61+
62+
## Step 3: Add Speech Transcription
63+
64+
Now let's implement the transcription using `fal.ai` and OpenAI's `whisper-large-v3` model for fast, reliable speech processing:
65+
66+
```python
67+
def transcribe_audio(audio_file_path):
68+
"""Transcribe audio using fal.ai for speed"""
69+
client = InferenceClient(provider="fal-ai")
70+
71+
# Pass the file path directly - the client handles file reading
72+
transcript = client.automatic_speech_recognition(
73+
audio=audio_file_path,
74+
model="openai/whisper-large-v3"
75+
)
76+
77+
return transcript.text
78+
```
79+
80+
## Step 4: Add AI Summarization
81+
82+
Next, we'll use a powerful language model like `Qwen/Qwen3-235B-A22B-FP8` from Qwen via Together AI for summarization:
83+
84+
```python
85+
def generate_summary(transcript):
86+
"""Generate summary using Together AI"""
87+
client = InferenceClient(provider="together")
88+
89+
prompt = f"""
90+
Analyze this meeting transcript and provide:
91+
1. A concise summary of key points
92+
2. Action items with responsible parties
93+
3. Important decisions made
94+
95+
Transcript: {transcript}
96+
97+
Format with clear sections:
98+
## Summary
99+
## Action Items
100+
## Decisions Made
101+
"""
102+
103+
response = client.chat.completions.create(
104+
model="Qwen/Qwen3-235B-A22B-FP8",
105+
messages=[{"role": "user", "content": prompt}],
106+
max_tokens=1000
107+
)
108+
109+
return response.choices[0].message.content
110+
```
111+
112+
Note, we're also defining a custom summary prompt to ensure the output is formatted as a summary with action items and decisions made.
113+
114+
## Step 5: Deploy on Hugging Face Spaces
115+
116+
To deploy, we'll need to create a `requirements.txt` file and a `app.py` file.
117+
118+
`requirements.txt`:
119+
120+
```txt
121+
gradio
122+
huggingface_hub
123+
```
124+
125+
`app.py`:
126+
127+
<details>
128+
<summary><strong>📋 Click to view the complete app.py file</strong></summary>
129+
130+
```python
131+
import gradio as gr
132+
from huggingface_hub import InferenceClient
133+
134+
135+
def transcribe_audio(audio_file_path):
136+
"""Transcribe audio using fal.ai for speed"""
137+
client = InferenceClient(provider="fal-ai")
138+
139+
# Pass the file path directly - the client handles file reading
140+
transcript = client.automatic_speech_recognition(
141+
audio=audio_file_path, model="openai/whisper-large-v3"
142+
)
143+
144+
return transcript.text
145+
146+
147+
def generate_summary(transcript):
148+
"""Generate summary using Together AI"""
149+
client = InferenceClient(provider="together")
150+
151+
prompt = f"""
152+
Analyze this meeting transcript and provide:
153+
1. A concise summary of key points
154+
2. Action items with responsible parties
155+
3. Important decisions made
156+
157+
Transcript: {transcript}
158+
159+
Format with clear sections:
160+
## Summary
161+
## Action Items
162+
## Decisions Made
163+
"""
164+
165+
response = client.chat.completions.create(
166+
model="Qwen/Qwen3-235B-A22B-FP8",
167+
messages=[{"role": "user", "content": prompt}],
168+
max_tokens=1000,
169+
)
170+
171+
return response.choices[0].message.content
172+
173+
174+
def process_meeting_audio(audio_file):
175+
"""Main processing function"""
176+
if audio_file is None:
177+
return "Please upload an audio file.", ""
178+
179+
try:
180+
# Step 1: Transcribe
181+
transcript = transcribe_audio(audio_file)
182+
183+
# Step 2: Summarize
184+
summary = generate_summary(transcript)
185+
186+
return transcript, summary
187+
188+
except Exception as e:
189+
return f"Error processing audio: {str(e)}", ""
190+
191+
192+
# Create Gradio interface
193+
app = gr.Interface(
194+
fn=process_meeting_audio,
195+
inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"),
196+
outputs=[
197+
gr.Textbox(label="Transcript", lines=10),
198+
gr.Textbox(label="Summary & Action Items", lines=8),
199+
],
200+
title="🎤 AI Meeting Notes",
201+
description="Upload audio to get instant transcripts and summaries.",
202+
)
203+
204+
if __name__ == "__main__":
205+
app.launch()
206+
```
207+
208+
</details>
209+
210+
To deploy, we'll need to create a new Space and upload our files.
211+
212+
1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space)
213+
2. **Choose Gradio SDK** and make it public
214+
3. **Upload your files**: Upload `app.py` and `requirements.txt`
215+
4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens))
216+
5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name`
217+
218+
> **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment.
219+
220+
## Next Steps
221+
222+
Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps:
223+
224+
- **Improve your prompt**: Try different prompts to improve the quality for your use case
225+
- **Try different models**: Experiment with various speech and text models
226+
- **Compare performance**: Benchmark speed vs. accuracy across providers

0 commit comments

Comments
 (0)