Skip to content

Commit 34d1e7e

Browse files
committed
Duplication - FR titles - Editing EN title for tuto_08
1 parent 236e26c commit 34d1e7e

File tree

170 files changed

+39608
-2
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

170 files changed

+39608
-2
lines changed

pages/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1086,7 +1086,7 @@
10861086
+ [AI Endpoints - Build a Python Chatbot with LangChain](public_cloud/ai_machine_learning/endpoints_tuto_05_chatbot_langchain_python)
10871087
+ [AI Endpoints - Build a JavaScript Chatbot with LangChain](public_cloud/ai_machine_learning/endpoints_tuto_06_chatbot_langchain_javascript)
10881088
+ [AI Endpoints - Create your own AI chatbot using LangChain4j and Quarkus](public_cloud/ai_machine_learning/endpoints_tuto_07_chatbot_langchain4j_quarkus)
1089-
+ [AI Endpoints - Streaming Chatbot with LangChain4j and Quarkus](public_cloud/ai_machine_learning/endpoints_tuto_08_streaming_chatbot_langchain4j_quarkus)
1089+
+ [AI Endpoints - Create a Streaming Chatbot with LangChain4j and Quarkus](public_cloud/ai_machine_learning/endpoints_tuto_08_streaming_chatbot_langchain4j_quarkus)
10901090
+ [AI Endpoints - Enable conversational memory in your chatbot using LangChain](public_cloud/ai_machine_learning/endpoints_tuto_09_chatbot_memory_langchain)
10911091
+ [AI Endpoints - Create a Memory Chatbot with LangChain4j](public_cloud/ai_machine_learning/endpoints_tuto_10_memory_chatbot_langchain4j)
10921092
+ [AI Endpoints - Build a RAG Chatbot with LangChain](public_cloud/ai_machine_learning/endpoints_tuto_11_rag_chatbot_langchain)
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
---
2+
title: AI Endpoints - Create your own audio summarizer
3+
excerpt: Summarize hours of meetings ASR and LLM AI endpoints
4+
updated: 2025-04-18
5+
---
6+
7+
> [!primary]
8+
>
9+
> AI Endpoints is currently in **Beta**. Although we aim to offer a production-ready product even in this testing phase, service availability may not be guaranteed. Please be careful if you use endpoints for production, as the Beta phase is not yet complete.
10+
>
11+
> AI Endpoints is covered by the **[OVHcloud AI Endpoints Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/48743bf-AI_Endpoints-ALL-1.1.pdf)** and the **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**.
12+
>
13+
14+
## Introduction
15+
16+
Are you looking for a way to efficiently summarize your meetings, broadcasts, and podcasts for quick reference or to provide to others? Look no further!
17+
18+
## Objective
19+
20+
In this tutorial, you will create an Audio Summarizer assistant that can not only transcribe but also summarize all your audio files.
21+
22+
Indeed, thanks to [AI Endpoints](https://endpoints.ai.cloud.ovh.net/), it’s never been easier to create a virtual assistant that can help you stay on top of your meetings and keep track of important information.
23+
24+
This tutorial will explore how AI APIs can be connected to create an advanced virtual assistant capable of transcribing and summarizing any audio file using **ASR (Automatic Speech Recognition)** technologies and popular **LLMs (Large Language Models)**. We will also build an app to use our assistant!
25+
26+
![connect-ai-apis](images/ai-endpoint-puzzles-connexion.png)
27+
28+
## Definitions
29+
30+
- **Automatic Speech Recognition (ASR)**: Technology that converts spoken language into written text. ASR will be used in this context to transcribe long audio recordings into text, which will then be summarized using LLMs.
31+
- **Large Language Models (LLMs)**: Advanced models trained to understand context and generate human-like responses. In this use case, the LLM prompt will be designed to generate a summary of the input text based on the output from the ASR endpoint.
32+
33+
## Requirements
34+
35+
- A [Public Cloud project](/links/public-cloud/public-cloud) in your OVHcloud account
36+
- An access token for **OVHcloud AI Endpoints**. To create an API token, follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide.
37+
38+
## Instructions
39+
40+
### Set up the environment
41+
42+
In order to use AI Endpoints APIs easily, create a `.env` file to store environment variables:
43+
44+
```bash
45+
ASR_AI_ENDPOINT=https://nvr-asr-en-gb.endpoints.kepler.ai.cloud.ovh.net/api/v1/asr/recognize
46+
LLM_AI_ENDPOINT=https://mixtral-8x22b-instruct-v01.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1
47+
OVH_AI_ENDPOINTS_ACCESS_TOKEN=<ai-endpoints-api-token>
48+
```
49+
50+
**Make sure to replace the token value (`OVH_AI_ENDPOINTS_ACCESS_TOKEN`) by yours.** If you do not have one yet, follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide.
51+
52+
Then, create a `requirements.txt` file with the following libraries:
53+
54+
```bash
55+
openai==1.13.3
56+
gradio==4.36.1
57+
pydub==0.25.1
58+
python-dotenv==1.0.1
59+
```
60+
61+
Then, launch the installation of these dependencies:
62+
63+
```console
64+
pip install -r requirements.txt
65+
```
66+
67+
*Note that Python 3.11 is used in this tutorial.*
68+
69+
### Importing necessary libraries and variables
70+
71+
Once this is done, you can create a Python file named `audio-summarizer-app.py`, where you will first import Python librairies as follows:
72+
73+
```python
74+
import gradio as gr
75+
import io
76+
import os
77+
import requests
78+
from pydub import AudioSegment
79+
from dotenv import load_dotenv
80+
from openai import OpenAI
81+
```
82+
83+
After these lines, load and access the environnement variables of your `.env` file:
84+
85+
```python
86+
# access the environment variables from the .env file
87+
load_dotenv()
88+
89+
asr_ai_endpoint_url = os.environ.get("ASR_AI_ENDPOINT")
90+
llm_ai_endpoint_url = os.getenv("LLM_AI_ENDPOINT")
91+
ai_endpoint_token = os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
92+
```
93+
94+
💡 You are now ready to start coding your web app.
95+
96+
### Transcribe audio file with ASR
97+
98+
First, create the **Automatic Speech Recognition** function in order to transcribe audio files into text:
99+
100+
```python
101+
def asr_transcription(audio):
102+
103+
if audio is None:
104+
return " "
105+
106+
else:
107+
# preprocess audio
108+
processed_audio = "/tmp/my_audio.wav"
109+
audio_input = AudioSegment.from_file(audio, "mp3")
110+
process_audio_to_wav = audio_input.set_channels(1)
111+
process_audio_to_wav = process_audio_to_wav.set_frame_rate(16000)
112+
process_audio_to_wav.export(processed_audio, format="wav")
113+
114+
# headers
115+
headers = headers = {
116+
'accept': 'application/json',
117+
"Authorization": f"Bearer {ai_endpoint_token}",
118+
}
119+
120+
# put processed audio file as endpoint input
121+
files = [
122+
('audio', open(processed_audio, 'rb')),
123+
]
124+
125+
# get response from endpoint
126+
response = requests.post(
127+
asr_ai_endpoint_url,
128+
files=files,
129+
headers=headers
130+
)
131+
132+
# return complete transcription
133+
if response.status_code == 200:
134+
# Handle response
135+
response_data = response.json()
136+
resp=''
137+
for alternative in response_data:
138+
resp+=alternative['alternatives'][0]['transcript']
139+
else:
140+
print("Error:", response.status_code)
141+
142+
return resp
143+
```
144+
145+
**In this function:**
146+
147+
- The audio file is preprocessed as follows: `.wav` format, `1` channel, `16000` frame rate
148+
- The transformed audio `processed_audio` is read
149+
- An API call is made to the ASR endpoint named `nvr-asr-en-gb`
150+
- The full response is stored in `resp` variable and returned by the function
151+
152+
🎉 Now that you have this function, you are ready to transcribe audio files.
153+
154+
Now it’s time to call an LLM to summarize the transcribed text.
155+
156+
### Summarize audio with LLM
157+
158+
In this second step, create the `chat_completion` function to use `Mixtral8x22B` effectively (or any other model):
159+
160+
**What to do?**
161+
162+
- Check that the transcription exists
163+
- Use the OpenAI API compatibility to call the LLM
164+
- Customize your prompt in order to specify LLM task
165+
- Return the audio summary
166+
167+
```python
168+
def chat_completion(new_message):
169+
170+
if new_message==" ":
171+
return "Please, send an input audio to get its summary!"
172+
173+
else:
174+
# auth
175+
client = OpenAI(
176+
base_url=llm_ai_endpoint_url,
177+
api_key=ai_endpoint_token
178+
)
179+
180+
# prompt
181+
history_openai_format = [{"role": "user", "content": f"Summarize the following text in a few words: {new_message}"}]
182+
# return summary
183+
return client.chat.completions.create(
184+
model="Mixtral-8x22B-Instruct-v0.1",
185+
messages=history_openai_format,
186+
temperature=0,
187+
max_tokens=1024
188+
).choices.pop().message.content
189+
```
190+
191+
⚡️ You're almost there! The final step is to build your web app, making your solution easy to use with just a few lines of code.
192+
193+
### Build the app with Gradio
194+
195+
[Gradio](https://www.gradio.app/) is an open-source Python library that allows to quickly create user interfaces for Machine Learning models and demos.
196+
197+
**What does it mean in practice?**
198+
199+
Inside a Gradio Block, you can:
200+
201+
- Define a theme for your UI
202+
- Add a title to your web app with gr.HTML()
203+
- Upload audio thanks to the dedicated component, gr.Audio()
204+
- Obtain the result of the written transcription with the gr.Textbox()
205+
- Get a summary of the audio with the powerful LLM and a second gr.Textbox() component
206+
- Add a clear button with gr.ClearButton() to reset the page of the web app
207+
208+
```python
209+
with gr.Blocks(theme=gr.themes.Default(primary_hue="blue"), fill_height=True) as demo:
210+
211+
# add title and description
212+
with gr.Row():
213+
gr.HTML(
214+
"""
215+
<div align="center">
216+
<h1>Welcome on Audio Summarizer web app 💬!</h1>
217+
<i>Transcribe and summarize your broadcast, meetings, conversations, potcasts and much more...</i>
218+
</div>
219+
<br>
220+
"""
221+
)
222+
223+
# audio zone for user question
224+
gr.Markdown("## Upload your audio file 📢")
225+
with gr.Row():
226+
inp_audio = gr.Audio(
227+
label = "Audio file in .wav or .mp3 format:",
228+
sources = ['upload'],
229+
type = "filepath",
230+
)
231+
232+
# written transcription of user question
233+
with gr.Row():
234+
inp_text = gr.Textbox(
235+
label = "Audio transcription into text:",
236+
)
237+
238+
# chabot answer
239+
gr.Markdown("## Chatbot summarization 🤖")
240+
with gr.Row():
241+
out_resp = gr.Textbox(
242+
label = "Get a summary of your audio:",
243+
)
244+
245+
with gr.Row():
246+
247+
# clear inputs
248+
clear = gr.ClearButton([inp_audio, inp_text, out_resp])
249+
250+
# update functions
251+
inp_audio.change(
252+
fn = asr_transcription,
253+
inputs = inp_audio,
254+
outputs = inp_text
255+
)
256+
inp_text.change(
257+
fn = chat_completion,
258+
inputs = inp_text,
259+
outputs = out_resp
260+
)
261+
```
262+
263+
Then, you can launch it in the `main`:
264+
265+
```python
266+
if __name__ == '__main__':
267+
demo.launch(server_name="0.0.0.0", server_port=8000)
268+
```
269+
270+
### Launch Gradio web app locally
271+
272+
🚀 That’s it! Now, your web app is ready to be used! You can start this Gradio app locally by launching the following command:
273+
274+
```python
275+
python audio-summarizer-app.py
276+
```
277+
278+
![app-overview](images/app-overview.png)
279+
280+
You can upload your audio files, get a transcript and then a summary!
281+
282+
## Conclusion
283+
284+
Well done 🎉! You have learned how to build your own Audio Summarizer app in a few lines of code. You’ve also seen how easy it is to use AI Endpoints to create innovative turnkey solutions.
285+
286+
➡️ Access the full code [here](https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/audio-summarizer-assistant).
287+
288+
## Going further
289+
290+
If you want to go further and deploy your web app in the cloud, making your interface accessible to everyone, refer to the following articles and tutorials:
291+
292+
- [AI Deploy – Tutorial – Build & use a custom Docker image](/pages/public_cloud/ai_machine_learning/deploy_tuto_12_build_custom_image)
293+
- [AI Deploy – Tutorial – Deploy a Gradio app for sketch recognition](/pages/public_cloud/ai_machine_learning/deploy_tuto_05_gradio_sketch_recognition)
294+
295+
If you need training or technical assistance to implement our solutions, contact your sales representative or click on [this link](/links/professional-services) to get a quote and ask our Professional Services experts for a custom analysis of your project.
296+
297+
## Feedback
298+
299+
Please feel free to send us your questions, feedback, and suggestions regarding AI Endpoints and its features:
300+
301+
- In the #ai-endpoints channel of the OVHcloud [Discord server](https://discord.gg/ovhcloud), where you can engage with the community and OVHcloud team members.

0 commit comments

Comments
 (0)