Merge pull request #278129 from sally-baolian/patch-255

prmerger-automator[bot] · web-flow · commit 2e6d16cbdd20 · 2024-06-14T15:04:52.000Z
Text streaming
diff --git a/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md b/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md
@@ -318,6 +318,73 @@ For Linux and Windows, `GStreamer` is required to enable this feature.
 Refer [this instruction](how-to-use-codec-compressed-audio-input-streams.md) to install and configure `GStreamer` for Speech SDK.
 For Android, iOS and macOS, no extra configuration is needed starting version 1.20.
 
+## Text streaming
+
+Text streaming allows real-time text processing for rapid audio generation. It's perfect for dynamic text vocalization, such as reading outputs from AI models like GPT in real-time. This feature minimizes latency and improves the fluidity and responsiveness of audio outputs, making it ideal for interactive applications, live events, and responsive AI-driven dialogues.
+
+### How to use text streaming
+
+To use the text streaming feature, connect to the websocket V2 endpoint: `wss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2`
+
+::: zone pivot="programming-language-csharp"
+
+See the sample code for setting the endpoint:
+
+```csharp
+// IMPORTANT: MUST use the websocket v2 endpoint
+var ttsEndpoint = $"wss://{Environment.GetEnvironmentVariable("AZURE_TTS_REGION")}.tts.speech.microsoft.com/cognitiveservices/websocket/v2";
+var speechConfig = SpeechConfig.FromEndpoint(
+    new Uri(ttsEndpoint),
+    Environment.GetEnvironmentVariable("AZURE_TTS_API_KEY"));
+```
+
+#### Key steps
+
+1. **Create a text stream request**: Use `SpeechSynthesisRequestInputType.TextStream` to initiate a text stream.
+1. **Set global properties**: Adjust settings such as output format and voice name directly, as the feature handles partial text inputs and doesn't support SSML. Refer to the following sample code for instructions on how to set them. OpenAI text to speech voices aren't supported by the text streaming feature. See this [language table](language-support.md?tabs=tts#supported-languages) for full language support. 
+
+    ```csharp
+    // Set output format
+    speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm);
+
+    // Set a voice name
+    SpeechConfig.SetProperty(PropertyId.SpeechServiceConnection_SynthVoice, "en-US-AvaMultilingualNeural");
+    ```
+   
+1. **Stream your text**: For each text chunk generated from a GPT model, use `request.InputStream.Write(text);` to send the text to the stream.
+1. **Close the stream**: Once the GPT model completes its output, close the stream using `request.InputStream.Close();`.
+
+For detailed implementation, see the [sample code on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/tts-text-stream)
+
+::: zone-end
+
+::: zone pivot="programming-language-python"
+
+See the sample code for setting the endpoint:
+
+```python
+# IMPORTANT: MUST use the websocket v2 endpoint
+speech_config = speechsdk.SpeechConfig(endpoint=f"wss://{os.getenv('AZURE_TTS_REGION')}.tts.speech.microsoft.com/cognitiveservices/websocket/v2",
+                                       subscription=os.getenv("AZURE_TTS_API_KEY"))
+```
+
+#### Key steps
+
+1. **Create a text stream request**: Use `speechsdk.SpeechSynthesisRequestInputType.TextStream` to initiate a text stream.
+1. **Set global properties**: Adjust settings such as output format and voice name directly, as the feature handles partial text inputs and doesn't support SSML. Refer to the following sample code for instructions on how to set them. OpenAI text to speech voices aren't supported by the text streaming feature. See this [language table](language-support.md?tabs=tts#supported-languages) for full language support. 
+
+    ```python
+    # set a voice name
+    speech_config.speech_synthesis_voice_name = "en-US-AvaMultilingualNeural"
+    ```
+   
+1. **Stream your text**: For each text chunk generated from a GPT model, use `request.input_stream.write(text)` to send the text to the stream.
+1. **Close the stream**: Once the GPT model completes its output, close the stream using `request.input_stream.close()`.
+
+For detailed implementation, see the [sample code on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/python/tts-text-stream).
+
+::: zone-end
+
 ## Others tips
 
 ### Cache CRL files
diff --git a/articles/ai-services/speech-service/includes/quickstarts/openai-speech/csharp.md b/articles/ai-services/speech-service/includes/quickstarts/openai-speech/csharp.md
@@ -230,6 +230,7 @@ Here are some more considerations:
 
 - To change the speech recognition language, replace `en-US` with another [supported language](~/articles/ai-services/speech-service/language-support.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US`. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/ai-services/speech-service/language-identification.md).
 - To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/ai-services/speech-service/language-support.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
+- To reduce latency for text to speech output, use the text streaming feature, which enables real-time text processing for fast audio generation and minimizes latency, enhancing the fluidity and responsiveness of real-time audio outputs. Refer to [how to use text streaming](~/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md#text-streaming). 
 - To use a different [model](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability), replace `gpt-35-turbo-instruct` with the ID of another [deployment](/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model). The deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/).
 - Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses might be filtered if harmful content is detected. For more information, see the [content filtering](/azure/ai-services/openai/concepts/content-filter) article.
 
diff --git a/articles/ai-services/speech-service/includes/quickstarts/openai-speech/python.md b/articles/ai-services/speech-service/includes/quickstarts/openai-speech/python.md
@@ -177,6 +177,7 @@ Here are some more considerations:
 
 - To change the speech recognition language, replace `en-US` with another [supported language](~/articles/ai-services/speech-service/language-support.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US`. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/ai-services/speech-service/language-identification.md).
 - To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/ai-services/speech-service/language-support.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
+- To reduce latency for text to speech output, use the text streaming feature, which enables real-time text processing for fast audio generation and minimizes latency, enhancing the fluidity and responsiveness of real-time audio outputs. Refer to [how to use text streaming](~/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md#text-streaming). 
 - To use a different [model](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability), replace `gpt-35-turbo-instruct` with the ID of another [deployment](/azure/ai-services/openai/how-to/create-resource#deploy-a-model). Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/).
 - Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses might be filtered if harmful content is detected. For more information, see the [content filtering](/azure/ai-services/openai/concepts/content-filter) article.