Skip to content

Commit 6c0f882

Browse files
Update how-to-lower-speech-synthesis-latency.md
1 parent aa8dc6d commit 6c0f882

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,42 @@ For Linux and Windows, `GStreamer` is required to enable this feature.
318318
Refer [this instruction](how-to-use-codec-compressed-audio-input-streams.md) to install and configure `GStreamer` for Speech SDK.
319319
For Android, iOS and macOS, no extra configuration is needed starting version 1.20.
320320
321+
## Text stream
322+
323+
The text stream API allows real-time text processing for rapid audio generation. It's perfect for dynamic text vocalization, such as reading outputs from AI models like GPT in real-time. This API minimizes latency and improves the fluidity and responsiveness of audio outputs, making it ideal for interactive applications, live events, and responsive AI-driven dialogues.
324+
325+
### How to use the text stream API
326+
327+
To use the text stream API, connect to the websocket V2 endpoint: `wss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2`
328+
329+
::: zone pivot="programming-language-csharp"
330+
331+
#### Key steps
332+
333+
1. **Create a text stream request**: Use `SpeechSynthesisRequestInputType.TextStream` to initiate a text stream.
334+
1. **Set global properties**: Adjust settings such as output format and voice name directly, as the API handles partial text inputs and doesn't support SSML. Refer to the following sample code for instructions on how to set them.
335+
336+
```csharp
337+
// Set output format
338+
speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Raw24Khz16BitMonoPcm);
339+
340+
// Set a voice name
341+
SpeechConfig.SetProperty(PropertyId.SpeechServiceConnection_SynthVoice, "en-US-AvaMultilingualNeural");
342+
```
343+
344+
1. **Stream your text**: For each text chunk generated from a GPT model, use `request.InputStream.Write(text);` to send the text to the stream.
345+
1. **Close the stream**: Once the GPT model completes its output, close the stream using `request.InputStream.Close();`.
346+
347+
For detailed implementation, see the [sample code on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/tts-text-stream)
348+
349+
::: zone-end
350+
351+
::: zone pivot="programming-language-python"
352+
353+
354+
355+
::: zone-end
356+
321357
## Others tips
322358
323359
### Cache CRL files

0 commit comments

Comments
 (0)