ovh
diff --git a/‎pages/public_cloud/ai_machine_learning/endpoints_tuto_02_voice_virtual_assistant/guide.de-de.md‎
Lines changed: 129 additions & 59 deletions b/‎pages/public_cloud/ai_machine_learning/endpoints_tuto_02_voice_virtual_assistant/guide.de-de.md‎
Lines changed: 129 additions & 59 deletions
@@ -1,7 +1,7 @@
 ---
 title: AI Endpoints - Create your own voice assistant
 excerpt: Create a voice-enabled chatbot using ASR, LLM, and TTS endpoints in under 100 lines of code
-updated: 2025-07-31
+updated: 2025-10-01
 ---
 
 > [!primary]
@@ -44,14 +44,16 @@ All of this is done by connecting **AI Endpoints** like puzzle pieces—allowing
 In order to use AI Endpoints APIs easily, create a `.env` file to store environment variables:
 
 ```bash
-ASR_GRPC_ENDPOINT=nvr-asr-en-us.endpoints-grpc.kepler.ai.cloud.ovh.net:443
+ASR_AI_ENDPOINT=https://whisper-large-v3.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1
 TTS_GRPC_ENDPOINT=nvr-tts-en-us.endpoints-grpc.kepler.ai.cloud.ovh.net:443
 LLM_AI_ENDPOINT=https://mixtral-8x7b-instruct-v01.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1
 OVH_AI_ENDPOINTS_ACCESS_TOKEN=<ai-endpoints-api-token>
 ```
 
 **Make sure to replace the token value (`OVH_AI_ENDPOINTS_ACCESS_TOKEN`) by yours.** If you do not have one yet, follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide.
 
+In this tutorial, we will be using the `Whisper-Large-V3` and `Mixtral-8x7b-Instruct-V01` models. Feel free to choose alternative models available on the [AI Endpoints catalog](https://catalog.endpoints.ai.ovh.net/).
+
 Then, create a `requirements.txt` file with the following libraries:
 
 ```bash
@@ -89,62 +91,89 @@ After these lines, load and access the environnement variables of your `.env` fi
 ```python
 # access the environment variables from the .env file
 load_dotenv()
+
+ASR_AI_ENDPOINT = os.environ.get('ASR_AI_ENDPOINT')
+TTS_GRPC_ENDPOINT = os.environ.get('TTS_GRPC_ENDPOINT')
+LLM_AI_ENDPOINT = os.environ.get('LLM_AI_ENDPOINT')
+OVH_AI_ENDPOINTS_ACCESS_TOKEN = os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN')
+```
+
+ Next, define the clients that will be used to interact with the models:
+
+```python
+llm_client = OpenAI(
+    base_url=LLM_AI_ENDPOINT,
+    api_key=OVH_AI_ENDPOINTS_ACCESS_TOKEN
+)
+
+tts_client = riva.client.SpeechSynthesisService(
+    riva.client.Auth(
+        uri=TTS_GRPC_ENDPOINT,
+        use_ssl=True,
+        metadata_args=[["authorization", f"bearer {OVH_AI_ENDPOINTS_ACCESS_TOKEN}"]]
+    )
+)
+
+asr_client = OpenAI(
+    base_url=ASR_AI_ENDPOINT,
+    api_key=OVH_AI_ENDPOINTS_ACCESS_TOKEN
+)
 ```
 
 💡 You are now ready to start coding your web app!
 
 ### Transcribe input question with ASR
 
-First, create the **Automatic Speech Recognition (ASR)** function in order to transcribe microphone input into text.
+First, create the **Automatic Speech Recognition (ASR)** function in order to transcribe microphone input into text:
 
 ```python
-def asr_transcription(question):
+def asr_transcription(question, asr_client):
+    return asr_client.audio.transcriptions.create(
+        model="whisper-large-v3",
+        file=question
+    ).text
+```
 
-    asr_service = riva.client.ASRService(
-                    riva.client.Auth(uri=os.environ.get('ASR_GRPC_ENDPOINT'), use_ssl=True, metadata_args=[["authorization", f"bearer {os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}"]])
-                )
-    
-    # set up config
-    asr_config = riva.client.RecognitionConfig(
-        language_code="en-US",   # languages: en-US
-        max_alternatives=1,
-        enable_automatic_punctuation=True,
-        audio_channel_count = 1,
-    )  
+**In this function:**
+
+- The audio input is sent from microphone recording, as `question`.
+- An API call is made to the ASR AI Endpoint named `whisper-large-v3`.
+- The text from the transcript response is returned by the function.
+
+🎉 Now that you have this function, you are ready to transcribe audio files.
+
+### Generate LLM response to input question
 
-    # get asr model response
-    response = asr_service.offline_recognize(question, asr_config)
+Now, create a function that calls the LLM client to provide responses to questions:
 
-    return response.results[0].alternatives[0].transcript
+```python
+def llm_answer(input, llm_client):
+    response = llm_client.chat.completions.create(
+                model="Mixtral-8x7B-Instruct-v0.1", 
+                messages=input,
+                temperature=0,
+                max_tokens=1024,
+            )
+    msg = response.choices[0].message.content
+
+    return msg
 ```
 
 **In this function:**
 
-- The audio input is sent from microphone recording
-- An API call is made to the ASR AI Endpoint named `nvr-asr-en-gb`
-- The full response is stored in `resp` variable and returned by the function
+- The conversation/messages are retrieved as parameters.
+- A call is made to the chat completion LLM endpoint, using the `Mixtral8x7B` model.
+- The model's response is extracted and the final message text is returned.
 
-🎉 Now that you have this function, you are ready to transcribe audio files.
-
-Now it’s time to implement the TTS to transform the LLM response into spoken words.
+⏳ Almost there! All that remains is to implement the TTS to transform the LLM response into spoken words.
 
 ### Return the response using TTS
 
 Then, build the **Text To Speech (TTS)** function in order to transform the written answer into oral reply:
 
-**What to do?**
-
-- The LLM response is retrieved
-- A call is made to the TTS AI endpoint named `nvr-tts-en-us`
-- The audio sample and the sample rate are returned to play the audio automatical
-
 ```python
-def tts_synthesis(response):
-    
-    tts_service = riva.client.SpeechSynthesisService(
-                    riva.client.Auth(uri=os.environ.get('TTS_GRPC_ENDPOINT'), use_ssl=True, metadata_args=[["authorization", f"bearer {os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}"]])
-                )
-    
+def tts_synthesis(response, tts_client):
+
     # set up config
     sample_rate_hz = 48000
     req = {
@@ -153,31 +182,37 @@ def tts_synthesis(response):
             "sample_rate_hz" : sample_rate_hz,                    # sample rate: 48KHz audio
             "voice_name"     : "English-US.Female-1"              # voices: `English-US.Female-1`, `English-US.Male-1`
     }
-    
+
     # return response
     req["text"] = response
-    response = tts_service.synthesize(**req)
-
-    return np.frombuffer(response.audio, dtype=np.int16), sample_rate_hz
+    synthesized_response = tts_client.synthesize(**req)
+    
+    return np.frombuffer(synthesized_response.audio, dtype=np.int16), sample_rate_hz
 ```
 
+**In this function:**
+
+- The LLM response is retrieved.
+- A call is made to the TTS AI endpoint named `nvr-tts-en-us`.
+- The audio sample and the sample rate are returned to play the audio automatically.
+
 ⚡️ You're almost there! The final step is to build your web app, making your solution easy to use with just a few lines of code.
 
 ### Build the LLM chat app with Streamlit
 
-In this last step, create the chatbot app using [Mixtral8x7B](https://endpoints.ai.cloud.ovh.net/models/e2ecb4a7-98d5-420d-9789-e0aa6ddf0ffc) endpoint (or any other model) and [Streamlit](https://streamlit.io/), an open-source Python library that allows to quickly create user interfaces for Machine Learning models and demos. Here is a working code example:
+In this last step, create the chatbot app using [Streamlit](https://streamlit.io/), an open-source Python library that allows to quickly create user interfaces for Machine Learning models and demos. Here is a working code example:
 
 ```python
 # streamlit interface
 with st.container():
     st.title("💬 Audio Virtual Assistant Chatbot")
-    
+
 with st.container(height=600):
     messages = st.container()
-    
+
     if "messages" not in st.session_state:
         st.session_state["messages"] = [{"role": "system", "content": "Hello, I'm AVA!", "avatar":"🤖"}]
-    
+
     for msg in st.session_state.messages:
         messages.chat_message(msg["role"], avatar=msg["avatar"]).write(msg["content"])
 
@@ -191,26 +226,19 @@ with st.container():
             use_container_width=True,
             key='recorder'
         )
-    
+
     if recording:  
-        user_question = asr_transcription(recording['bytes'])
-        
+        user_question = asr_transcription(recording['bytes'], asr_client)
+
         if prompt := user_question:
-            client = OpenAI(base_url=os.getenv("LLM_AI_ENDPOINT"), api_key=os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN'))
             st.session_state.messages.append({"role": "user", "content": prompt, "avatar":"👤"})
             messages.chat_message("user", avatar="👤").write(prompt)
-            response = client.chat.completions.create(
-                model="Mixtral-8x7B-Instruct-v0.1", 
-                messages=st.session_state.messages,
-                temperature=0,
-                max_tokens=1024,
-            )
-            msg = response.choices[0].message.content
-            st.session_state.messages.append({"role": "system", "content": msg, "avatar": "🤖"})
+            msg = llm_answer(st.session_state.messages, llm_client)
+            st.session_state.messages.append({"role": "assistant", "content": msg, "avatar": "🤖"})
             messages.chat_message("system", avatar="🤖").write(msg)
 
             if msg is not None:
-                audio_samples, sample_rate_hz = tts_synthesis(msg)
+                audio_samples, sample_rate_hz = tts_synthesis(msg, tts_client)
                 placeholder.audio(audio_samples, sample_rate=sample_rate_hz, autoplay=True)
 ```
 
@@ -219,11 +247,53 @@ with st.container():
 🚀 That’s it! Now your web app is ready to be used! You can start this Streamlit app locally by launching the following command:
 
 ```python
-streamlit run audio-virtual-assistant.py
+streamlit run audio-virtual-assistant-app.py
 ```
 
 ![app-overview](images/app_overview.png)
 
+### Improvements
+
+By default, the `nvr-tts-en-us` model supports only a limited number of characters per request when generating audio. If you exceed this limit, you will encounter errors in your application.
+
+To work around this limitation, you can replace the existing `tts_synthesis` function with the following implementation, which processes text in chunks:
+
+```python
+def tts_synthesis(response, tts_client):
+    # Split response into chunks of max 1000 characters
+    max_chunk_length = 1000
+    words = response.split()
+    chunks = []
+    current_chunk = ""
+
+    for word in words:
+        if len(current_chunk) + len(word) + 1 <= max_chunk_length:
+            current_chunk += " " + word if current_chunk else word
+        else:
+            chunks.append(current_chunk)
+            current_chunk = word
+    if current_chunk:
+        chunks.append(current_chunk)
+
+    all_audio = np.array([], dtype=np.int16)
+    sample_rate_hz = 16000
+
+    # Process each chunk and concatenate the resulting audio
+    for text in chunks:
+        req = {
+            "language_code": "en-US",
+            "encoding": riva.client.AudioEncoding.LINEAR_PCM,
+            "sample_rate_hz": sample_rate_hz,
+            "voice_name": "English-US.Female-1",
+            "text": text.strip(),
+        }
+        synthesized = tts_client.synthesize(**req)
+        audio_segment = np.frombuffer(synthesized.audio, dtype=np.int16)
+        all_audio = np.concatenate((all_audio, audio_segment))
+
+    return all_audio, sample_rate_hz
+```
+
 ## Conclusion
 
 You’ve just created an Audio Virtual Assistant capable of natural conversation using voice, powered by ASR, LLM, and TTS endpoints.
@@ -245,4 +315,4 @@ If you need training or technical assistance to implement our solutions, contact
 
 Please feel free to send us your questions, feedback, and suggestions regarding AI Endpoints and its features:
 
-- In the #ai-endpoints channel of the OVHcloud [Discord server](https://discord.gg/ovhcloud), where you can engage with the community and OVHcloud team members.
+- In the #ai-endpoints channel of the OVHcloud [Discord server](https://discord.gg/ovhcloud), where you can engage with the community and OVHcloud team members.