|
| 1 | +--- |
| 2 | +title: AI Endpoints - Create your own audio summarizer |
| 3 | +excerpt: Summarize hours of meetings ASR and LLM AI endpoints |
| 4 | +updated: 2025-04-18 |
| 5 | +--- |
| 6 | + |
| 7 | +> [!primary] |
| 8 | +> |
| 9 | +> AI Endpoints is currently in **Beta**. Although we aim to offer a production-ready product even in this testing phase, service availability may not be guaranteed. Please be careful if you use endpoints for production, as the Beta phase is not yet complete. |
| 10 | +> |
| 11 | +> AI Endpoints is covered by the **[OVHcloud AI Endpoints Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/48743bf-AI_Endpoints-ALL-1.1.pdf)** and the **[OVHcloud Public Cloud Special Conditions](https://storage.gra.cloud.ovh.net/v1/AUTH_325716a587c64897acbef9a4a4726e38/contracts/d2a208c-Conditions_particulieres_OVH_Stack-WE-9.0.pdf)**. |
| 12 | +> |
| 13 | +
|
| 14 | +## Introduction |
| 15 | + |
| 16 | +Are you looking for a way to efficiently summarize your meetings, broadcasts, and podcasts for quick reference or to provide to others? Look no further! |
| 17 | + |
| 18 | +## Objective |
| 19 | + |
| 20 | +In this tutorial, you will create an Audio Summarizer assistant that can not only transcribe but also summarize all your audio files. |
| 21 | + |
| 22 | +Indeed, thanks to [AI Endpoints](https://endpoints.ai.cloud.ovh.net/), it’s never been easier to create a virtual assistant that can help you stay on top of your meetings and keep track of important information. |
| 23 | + |
| 24 | +This tutorial will explore how AI APIs can be connected to create an advanced virtual assistant capable of transcribing and summarizing any audio file using **ASR (Automatic Speech Recognition)** technologies and popular **LLMs (Large Language Models)**. We will also build an app to use our assistant! |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Definitions |
| 29 | + |
| 30 | +- **Automatic Speech Recognition (ASR)**: Technology that converts spoken language into written text. ASR will be used in this context to transcribe long audio recordings into text, which will then be summarized using LLMs. |
| 31 | +- **Large Language Models (LLMs)**: Advanced models trained to understand context and generate human-like responses. In this use case, the LLM prompt will be designed to generate a summary of the input text based on the output from the ASR endpoint. |
| 32 | + |
| 33 | +## Requirements |
| 34 | + |
| 35 | +- A [Public Cloud project](/links/public-cloud/public-cloud) in your OVHcloud account |
| 36 | +- An access token for **OVHcloud AI Endpoints**. To create an API token, follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide. |
| 37 | + |
| 38 | +## Instructions |
| 39 | + |
| 40 | +### Set up the environment |
| 41 | + |
| 42 | +In order to use AI Endpoints APIs easily, create a `.env` file to store environment variables: |
| 43 | + |
| 44 | +```bash |
| 45 | +ASR_AI_ENDPOINT=https://nvr-asr-en-gb.endpoints.kepler.ai.cloud.ovh.net/api/v1/asr/recognize |
| 46 | +LLM_AI_ENDPOINT=https://mixtral-8x22b-instruct-v01.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1 |
| 47 | +OVH_AI_ENDPOINTS_ACCESS_TOKEN=<ai-endpoints-api-token> |
| 48 | +``` |
| 49 | + |
| 50 | +**Make sure to replace the token value (`OVH_AI_ENDPOINTS_ACCESS_TOKEN`) by yours.** If you do not have one yet, follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide. |
| 51 | + |
| 52 | +Then, create a `requirements.txt` file with the following libraries: |
| 53 | + |
| 54 | +```bash |
| 55 | +openai==1.13.3 |
| 56 | +gradio==4.36.1 |
| 57 | +pydub==0.25.1 |
| 58 | +python-dotenv==1.0.1 |
| 59 | +``` |
| 60 | + |
| 61 | +Then, launch the installation of these dependencies: |
| 62 | + |
| 63 | +```console |
| 64 | +pip install -r requirements.txt |
| 65 | +``` |
| 66 | + |
| 67 | +*Note that Python 3.11 is used in this tutorial.* |
| 68 | + |
| 69 | +### Importing necessary libraries and variables |
| 70 | + |
| 71 | +Once this is done, you can create a Python file named `audio-summarizer-app.py`, where you will first import Python librairies as follows: |
| 72 | + |
| 73 | +```python |
| 74 | +import gradio as gr |
| 75 | +import io |
| 76 | +import os |
| 77 | +import requests |
| 78 | +from pydub import AudioSegment |
| 79 | +from dotenv import load_dotenv |
| 80 | +from openai import OpenAI |
| 81 | +``` |
| 82 | + |
| 83 | +After these lines, load and access the environnement variables of your `.env` file: |
| 84 | + |
| 85 | +```python |
| 86 | +# access the environment variables from the .env file |
| 87 | +load_dotenv() |
| 88 | + |
| 89 | +asr_ai_endpoint_url = os.environ.get("ASR_AI_ENDPOINT") |
| 90 | +llm_ai_endpoint_url = os.getenv("LLM_AI_ENDPOINT") |
| 91 | +ai_endpoint_token = os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN") |
| 92 | +``` |
| 93 | + |
| 94 | +💡 You are now ready to start coding your web app. |
| 95 | + |
| 96 | +### Transcribe audio file with ASR |
| 97 | + |
| 98 | +First, create the **Automatic Speech Recognition** function in order to transcribe audio files into text: |
| 99 | + |
| 100 | +```python |
| 101 | +def asr_transcription(audio): |
| 102 | + |
| 103 | + if audio is None: |
| 104 | + return " " |
| 105 | + |
| 106 | + else: |
| 107 | + # preprocess audio |
| 108 | + processed_audio = "/tmp/my_audio.wav" |
| 109 | + audio_input = AudioSegment.from_file(audio, "mp3") |
| 110 | + process_audio_to_wav = audio_input.set_channels(1) |
| 111 | + process_audio_to_wav = process_audio_to_wav.set_frame_rate(16000) |
| 112 | + process_audio_to_wav.export(processed_audio, format="wav") |
| 113 | + |
| 114 | + # headers |
| 115 | + headers = headers = { |
| 116 | + 'accept': 'application/json', |
| 117 | + "Authorization": f"Bearer {ai_endpoint_token}", |
| 118 | + } |
| 119 | + |
| 120 | + # put processed audio file as endpoint input |
| 121 | + files = [ |
| 122 | + ('audio', open(processed_audio, 'rb')), |
| 123 | + ] |
| 124 | + |
| 125 | + # get response from endpoint |
| 126 | + response = requests.post( |
| 127 | + asr_ai_endpoint_url, |
| 128 | + files=files, |
| 129 | + headers=headers |
| 130 | + ) |
| 131 | + |
| 132 | + # return complete transcription |
| 133 | + if response.status_code == 200: |
| 134 | + # Handle response |
| 135 | + response_data = response.json() |
| 136 | + resp='' |
| 137 | + for alternative in response_data: |
| 138 | + resp+=alternative['alternatives'][0]['transcript'] |
| 139 | + else: |
| 140 | + print("Error:", response.status_code) |
| 141 | + |
| 142 | + return resp |
| 143 | +``` |
| 144 | + |
| 145 | +**In this function:** |
| 146 | + |
| 147 | +- The audio file is preprocessed as follows: `.wav` format, `1` channel, `16000` frame rate |
| 148 | +- The transformed audio `processed_audio` is read |
| 149 | +- An API call is made to the ASR endpoint named `nvr-asr-en-gb` |
| 150 | +- The full response is stored in `resp` variable and returned by the function |
| 151 | + |
| 152 | +🎉 Now that you have this function, you are ready to transcribe audio files. |
| 153 | + |
| 154 | +Now it’s time to call an LLM to summarize the transcribed text. |
| 155 | + |
| 156 | +### Summarize audio with LLM |
| 157 | + |
| 158 | +In this second step, create the `chat_completion` function to use `Mixtral8x22B` effectively (or any other model): |
| 159 | + |
| 160 | +**What to do?** |
| 161 | + |
| 162 | +- Check that the transcription exists |
| 163 | +- Use the OpenAI API compatibility to call the LLM |
| 164 | +- Customize your prompt in order to specify LLM task |
| 165 | +- Return the audio summary |
| 166 | + |
| 167 | +```python |
| 168 | +def chat_completion(new_message): |
| 169 | + |
| 170 | + if new_message==" ": |
| 171 | + return "Please, send an input audio to get its summary!" |
| 172 | + |
| 173 | + else: |
| 174 | + # auth |
| 175 | + client = OpenAI( |
| 176 | + base_url=llm_ai_endpoint_url, |
| 177 | + api_key=ai_endpoint_token |
| 178 | + ) |
| 179 | + |
| 180 | + # prompt |
| 181 | + history_openai_format = [{"role": "user", "content": f"Summarize the following text in a few words: {new_message}"}] |
| 182 | + # return summary |
| 183 | + return client.chat.completions.create( |
| 184 | + model="Mixtral-8x22B-Instruct-v0.1", |
| 185 | + messages=history_openai_format, |
| 186 | + temperature=0, |
| 187 | + max_tokens=1024 |
| 188 | + ).choices.pop().message.content |
| 189 | +``` |
| 190 | + |
| 191 | +⚡️ You're almost there! The final step is to build your web app, making your solution easy to use with just a few lines of code. |
| 192 | + |
| 193 | +### Build the app with Gradio |
| 194 | + |
| 195 | +[Gradio](https://www.gradio.app/) is an open-source Python library that allows to quickly create user interfaces for Machine Learning models and demos. |
| 196 | + |
| 197 | +**What does it mean in practice?** |
| 198 | + |
| 199 | +Inside a Gradio Block, you can: |
| 200 | + |
| 201 | +- Define a theme for your UI |
| 202 | +- Add a title to your web app with gr.HTML() |
| 203 | +- Upload audio thanks to the dedicated component, gr.Audio() |
| 204 | +- Obtain the result of the written transcription with the gr.Textbox() |
| 205 | +- Get a summary of the audio with the powerful LLM and a second gr.Textbox() component |
| 206 | +- Add a clear button with gr.ClearButton() to reset the page of the web app |
| 207 | + |
| 208 | +```python |
| 209 | +with gr.Blocks(theme=gr.themes.Default(primary_hue="blue"), fill_height=True) as demo: |
| 210 | + |
| 211 | + # add title and description |
| 212 | + with gr.Row(): |
| 213 | + gr.HTML( |
| 214 | + """ |
| 215 | + <div align="center"> |
| 216 | + <h1>Welcome on Audio Summarizer web app 💬!</h1> |
| 217 | + <i>Transcribe and summarize your broadcast, meetings, conversations, potcasts and much more...</i> |
| 218 | + </div> |
| 219 | + <br> |
| 220 | + """ |
| 221 | + ) |
| 222 | + |
| 223 | + # audio zone for user question |
| 224 | + gr.Markdown("## Upload your audio file 📢") |
| 225 | + with gr.Row(): |
| 226 | + inp_audio = gr.Audio( |
| 227 | + label = "Audio file in .wav or .mp3 format:", |
| 228 | + sources = ['upload'], |
| 229 | + type = "filepath", |
| 230 | + ) |
| 231 | + |
| 232 | + # written transcription of user question |
| 233 | + with gr.Row(): |
| 234 | + inp_text = gr.Textbox( |
| 235 | + label = "Audio transcription into text:", |
| 236 | + ) |
| 237 | + |
| 238 | + # chabot answer |
| 239 | + gr.Markdown("## Chatbot summarization 🤖") |
| 240 | + with gr.Row(): |
| 241 | + out_resp = gr.Textbox( |
| 242 | + label = "Get a summary of your audio:", |
| 243 | + ) |
| 244 | + |
| 245 | + with gr.Row(): |
| 246 | + |
| 247 | + # clear inputs |
| 248 | + clear = gr.ClearButton([inp_audio, inp_text, out_resp]) |
| 249 | + |
| 250 | + # update functions |
| 251 | + inp_audio.change( |
| 252 | + fn = asr_transcription, |
| 253 | + inputs = inp_audio, |
| 254 | + outputs = inp_text |
| 255 | + ) |
| 256 | + inp_text.change( |
| 257 | + fn = chat_completion, |
| 258 | + inputs = inp_text, |
| 259 | + outputs = out_resp |
| 260 | + ) |
| 261 | +``` |
| 262 | + |
| 263 | +Then, you can launch it in the `main`: |
| 264 | + |
| 265 | +```python |
| 266 | +if __name__ == '__main__': |
| 267 | + demo.launch(server_name="0.0.0.0", server_port=8000) |
| 268 | +``` |
| 269 | + |
| 270 | +### Launch Gradio web app locally |
| 271 | + |
| 272 | +🚀 That’s it! Now, your web app is ready to be used! You can start this Gradio app locally by launching the following command: |
| 273 | + |
| 274 | +```python |
| 275 | +python audio-summarizer-app.py |
| 276 | +``` |
| 277 | + |
| 278 | + |
| 279 | + |
| 280 | +You can upload your audio files, get a transcript and then a summary! |
| 281 | + |
| 282 | +## Conclusion |
| 283 | + |
| 284 | +Well done 🎉! You have learned how to build your own Audio Summarizer app in a few lines of code. You’ve also seen how easy it is to use AI Endpoints to create innovative turnkey solutions. |
| 285 | + |
| 286 | +➡️ Access the full code [here](https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/audio-summarizer-assistant). |
| 287 | + |
| 288 | +## Going further |
| 289 | + |
| 290 | +If you want to go further and deploy your web app in the cloud, making your interface accessible to everyone, refer to the following articles and tutorials: |
| 291 | + |
| 292 | +- [AI Deploy – Tutorial – Build & use a custom Docker image](/pages/public_cloud/ai_machine_learning/deploy_tuto_12_build_custom_image) |
| 293 | +- [AI Deploy – Tutorial – Deploy a Gradio app for sketch recognition](/pages/public_cloud/ai_machine_learning/deploy_tuto_05_gradio_sketch_recognition) |
| 294 | + |
| 295 | +If you need training or technical assistance to implement our solutions, contact your sales representative or click on [this link](/links/professional-services) to get a quote and ask our Professional Services experts for a custom analysis of your project. |
| 296 | + |
| 297 | +## Feedback |
| 298 | + |
| 299 | +Please feel free to send us your questions, feedback, and suggestions regarding AI Endpoints and its features: |
| 300 | + |
| 301 | +- In the #ai-endpoints channel of the OVHcloud [Discord server](https://discord.gg/ovhcloud), where you can engage with the community and OVHcloud team members. |
0 commit comments