Skip to content

Commit 2e6d16c

Browse files
Merge pull request #278129 from sally-baolian/patch-255
Text streaming
2 parents 0d99de3 + d7b451f commit 2e6d16c

File tree

3 files changed

+69
-0
lines changed

3 files changed

+69
-0
lines changed

articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,73 @@ For Linux and Windows, `GStreamer` is required to enable this feature.
318318
Refer [this instruction](how-to-use-codec-compressed-audio-input-streams.md) to install and configure `GStreamer` for Speech SDK.
319319
For Android, iOS and macOS, no extra configuration is needed starting version 1.20.
320320
321+
## Text streaming
322+
323+
Text streaming allows real-time text processing for rapid audio generation. It's perfect for dynamic text vocalization, such as reading outputs from AI models like GPT in real-time. This feature minimizes latency and improves the fluidity and responsiveness of audio outputs, making it ideal for interactive applications, live events, and responsive AI-driven dialogues.
324+
325+
### How to use text streaming
326+
327+
To use the text streaming feature, connect to the websocket V2 endpoint: `wss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2`
328+
329+
::: zone pivot="programming-language-csharp"
330+
331+
See the sample code for setting the endpoint:
332+
333+
```csharp
334+
// IMPORTANT: MUST use the websocket v2 endpoint
335+
var ttsEndpoint = $"wss://{Environment.GetEnvironmentVariable("AZURE_TTS_REGION")}.tts.speech.microsoft.com/cognitiveservices/websocket/v2";
336+
var speechConfig = SpeechConfig.FromEndpoint(
337+
new Uri(ttsEndpoint),
338+
Environment.GetEnvironmentVariable("AZURE_TTS_API_KEY"));
339+
```
340+
341+
#### Key steps
342+
343+
1. **Create a text stream request**: Use `SpeechSynthesisRequestInputType.TextStream` to initiate a text stream.
344+
1. **Set global properties**: Adjust settings such as output format and voice name directly, as the feature handles partial text inputs and doesn't support SSML. Refer to the following sample code for instructions on how to set them. OpenAI text to speech voices aren't supported by the text streaming feature. See this [language table](language-support.md?tabs=tts#supported-languages) for full language support.
345+
346+
```csharp
347+
// Set output format
348+
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm);
349+
350+
// Set a voice name
351+
SpeechConfig.SetProperty(PropertyId.SpeechServiceConnection_SynthVoice, "en-US-AvaMultilingualNeural");
352+
```
353+
354+
1. **Stream your text**: For each text chunk generated from a GPT model, use `request.InputStream.Write(text);` to send the text to the stream.
355+
1. **Close the stream**: Once the GPT model completes its output, close the stream using `request.InputStream.Close();`.
356+
357+
For detailed implementation, see the [sample code on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/tts-text-stream)
358+
359+
::: zone-end
360+
361+
::: zone pivot="programming-language-python"
362+
363+
See the sample code for setting the endpoint:
364+
365+
```python
366+
# IMPORTANT: MUST use the websocket v2 endpoint
367+
speech_config = speechsdk.SpeechConfig(endpoint=f"wss://{os.getenv('AZURE_TTS_REGION')}.tts.speech.microsoft.com/cognitiveservices/websocket/v2",
368+
subscription=os.getenv("AZURE_TTS_API_KEY"))
369+
```
370+
371+
#### Key steps
372+
373+
1. **Create a text stream request**: Use `speechsdk.SpeechSynthesisRequestInputType.TextStream` to initiate a text stream.
374+
1. **Set global properties**: Adjust settings such as output format and voice name directly, as the feature handles partial text inputs and doesn't support SSML. Refer to the following sample code for instructions on how to set them. OpenAI text to speech voices aren't supported by the text streaming feature. See this [language table](language-support.md?tabs=tts#supported-languages) for full language support.
375+
376+
```python
377+
# set a voice name
378+
speech_config.speech_synthesis_voice_name = "en-US-AvaMultilingualNeural"
379+
```
380+
381+
1. **Stream your text**: For each text chunk generated from a GPT model, use `request.input_stream.write(text)` to send the text to the stream.
382+
1. **Close the stream**: Once the GPT model completes its output, close the stream using `request.input_stream.close()`.
383+
384+
For detailed implementation, see the [sample code on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/python/tts-text-stream).
385+
386+
::: zone-end
387+
321388
## Others tips
322389

323390
### Cache CRL files

articles/ai-services/speech-service/includes/quickstarts/openai-speech/csharp.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ Here are some more considerations:
230230

231231
- To change the speech recognition language, replace `en-US` with another [supported language](~/articles/ai-services/speech-service/language-support.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US`. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/ai-services/speech-service/language-identification.md).
232232
- To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/ai-services/speech-service/language-support.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
233+
- To reduce latency for text to speech output, use the text streaming feature, which enables real-time text processing for fast audio generation and minimizes latency, enhancing the fluidity and responsiveness of real-time audio outputs. Refer to [how to use text streaming](~/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md#text-streaming).
233234
- To use a different [model](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability), replace `gpt-35-turbo-instruct` with the ID of another [deployment](/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model). The deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/).
234235
- Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses might be filtered if harmful content is detected. For more information, see the [content filtering](/azure/ai-services/openai/concepts/content-filter) article.
235236

articles/ai-services/speech-service/includes/quickstarts/openai-speech/python.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@ Here are some more considerations:
177177

178178
- To change the speech recognition language, replace `en-US` with another [supported language](~/articles/ai-services/speech-service/language-support.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US`. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/ai-services/speech-service/language-identification.md).
179179
- To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/ai-services/speech-service/language-support.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
180+
- To reduce latency for text to speech output, use the text streaming feature, which enables real-time text processing for fast audio generation and minimizes latency, enhancing the fluidity and responsiveness of real-time audio outputs. Refer to [how to use text streaming](~/articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md#text-streaming).
180181
- To use a different [model](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability), replace `gpt-35-turbo-instruct` with the ID of another [deployment](/azure/ai-services/openai/how-to/create-resource#deploy-a-model). Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/).
181182
- Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses might be filtered if harmful content is detected. For more information, see the [content filtering](/azure/ai-services/openai/concepts/content-filter) article.
182183

0 commit comments

Comments
 (0)