Skip to content

Commit 0d2425c

Browse files
authored
Merge pull request #4170 from MicrosoftDocs/main
Publish to live, Wednesday 4AM PST, 4/16
2 parents 6e661da + cd7af18 commit 0d2425c

File tree

1 file changed

+38
-18
lines changed

1 file changed

+38
-18
lines changed

articles/ai-services/speech-service/how-to-lower-speech-synthesis-latency.md

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,19 @@ Normally, we measure the latency by `first byte latency` and `finish latency`, a
2424

2525
| Latency | Description | [SpeechSynthesisResult](/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisresult) property key |
2626
|-----------|-------------|------------|
27-
| first byte latency | Indicates the time delay between the start of the synthesis task and receipt of the first chunk of audio data. | SpeechServiceResponse_SynthesisFirstByteLatencyMs |
28-
| finish latency | Indicates the time delay between the start of the synthesis task and the receipt of the whole synthesized audio data. | SpeechServiceResponse_SynthesisFinishLatencyMs |
27+
| `first byte client latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received on the client including network latency.| `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
28+
| `finish client latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received on the client including network latency. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
29+
| `network latency` | The network latency between the client and Azure TTS service. | `SpeechServiceResponse_SynthesisNetworkLatencyMs` |
30+
| `first byte service latency` | Indicates the time delay between Azure TTS service received synthesis request and the first audio chunk is returned. | `SpeechServiceResponse_SynthesisServiceLatencyMs` |
2931

3032
The Speech SDK puts the latency durations in the Properties collection of [`SpeechSynthesisResult`](/dotnet/api/microsoft.cognitiveservices.speech.speechsynthesisresult). The following sample code shows these values.
3133

3234
```csharp
3335
var result = await synthesizer.SpeakTextAsync(text);
34-
Console.WriteLine($"first byte latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs)} ms");
35-
Console.WriteLine($"finish latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs)} ms");
36+
Console.WriteLine($"first byte client latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs)} ms");
37+
Console.WriteLine($"finish client latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs)} ms");
38+
Console.WriteLine($"network latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisNetworkLatencyMs)} ms");
39+
Console.WriteLine($"first byte service latency: \t{result.Properties.GetProperty(PropertyId.SpeechServiceResponse_SynthesisServiceLatencyMs)} ms");
3640
// you can also get the result id, and send to us when you need help for diagnosis
3741
var resultId = result.ResultId;
3842
```
@@ -43,15 +47,19 @@ var resultId = result.ResultId;
4347

4448
| Latency | Description | [SpeechSynthesisResult](/cpp/cognitive-services/speech/speechsynthesisresult) property key |
4549
|-----------|-------------|------------|
46-
| `first byte latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received. | `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
47-
| `finish latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
50+
| `first byte client latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received on the client including network latency.| `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
51+
| `finish client latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received on the client including network latency. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
52+
| `network latency` | The network latency between the client and Azure TTS service. | `SpeechServiceResponse_SynthesisNetworkLatencyMs` |
53+
| `first byte service latency` | Indicates the time delay between Azure TTS service received synthesis request and the first audio chunk is returned. | `SpeechServiceResponse_SynthesisServiceLatencyMs` |
4854

4955
The Speech SDK measured the latencies and puts them in the property bag of [`SpeechSynthesisResult`](/cpp/cognitive-services/speech/speechsynthesisresult). Refer following codes to get them.
5056

5157
```cpp
5258
auto result = synthesizer->SpeakTextAsync(text).get();
5359
auto firstByteLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisFirstByteLatencyMs));
5460
auto finishedLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisFinishLatencyMs));
61+
auto firstByteLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisNetworkLatencyMs));
62+
auto firstByteLatency = std::stoi(result->Properties.GetProperty(PropertyId::SpeechServiceResponse_SynthesisServiceLatencyMs));
5563
// you can also get the result id, and send to us when you need help for diagnosis
5664
auto resultId = result->ResultId;
5765
```
@@ -62,15 +70,19 @@ auto resultId = result->ResultId;
6270
6371
| Latency | Description | [SpeechSynthesisResult](/java/api/com.microsoft.cognitiveservices.speech.speechsynthesisresult) property key |
6472
|-----------|-------------|------------|
65-
| `first byte latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received. | `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
66-
| `finish latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
73+
| `first byte client latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received on the client including network latency.| `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
74+
| `finish client latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received on the client including network latency. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
75+
| `network latency` | The network latency between the client and Azure TTS service. | `SpeechServiceResponse_SynthesisNetworkLatencyMs` |
76+
| `first byte service latency` | Indicates the time delay between Azure TTS service received synthesis request and the first audio chunk is returned. | `SpeechServiceResponse_SynthesisServiceLatencyMs` |
6777
6878
The Speech SDK measured the latencies and puts them in the property bag of [`SpeechSynthesisResult`](/java/api/com.microsoft.cognitiveservices.speech.speechsynthesisresult). Refer following codes to get them.
6979
7080
```java
7181
SpeechSynthesisResult result = synthesizer.SpeakTextAsync(text).get();
72-
System.out.println("first byte latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs) + " ms.");
73-
System.out.println("finish latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs) + " ms.");
82+
System.out.println("first byte client latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs) + " ms.");
83+
System.out.println("finish client latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs) + " ms.");
84+
System.out.println("network latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisNetworkLatencyMs) + " ms.");
85+
System.out.println("first byte service latency: \t" + result.getProperties().getProperty(PropertyId.SpeechServiceResponse_SynthesisServiceLatencyMs) + " ms.");
7486
// you can also get the result id, and send to us when you need help for diagnosis
7587
String resultId = result.getResultId();
7688
```
@@ -82,15 +94,19 @@ String resultId = result.getResultId();
8294

8395
| Latency | Description | [SpeechSynthesisResult](/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechsynthesisresult) property key |
8496
|-----------|-------------|------------|
85-
| `first byte latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received. | `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
86-
| `finish latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
97+
| `first byte client latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received on the client including network latency.| `SpeechServiceResponse_SynthesisFirstByteLatencyMs` |
98+
| `finish client latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received on the client including network latency. | `SpeechServiceResponse_SynthesisFinishLatencyMs` |
99+
| `network latency` | The network latency between the client and Azure TTS service. | `SpeechServiceResponse_SynthesisNetworkLatencyMs` |
100+
| `first byte service latency` | Indicates the time delay between Azure TTS service received synthesis request and the first audio chunk is returned. | `SpeechServiceResponse_SynthesisServiceLatencyMs` |
87101

88102
The Speech SDK measured the latencies and puts them in the property bag of [`SpeechSynthesisResult`](/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechsynthesisresult). Refer following codes to get them.
89103

90104
```python
91105
result = synthesizer.speak_text_async(text).get()
92-
first_byte_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs))
93-
finished_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs))
106+
first_byte_client_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFirstByteLatencyMs))
107+
finished_client_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisFinishLatencyMs))
108+
network_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisNetworkLatencyMs))
109+
first_byte_service_latency = int(result.properties.get_property(speechsdk.PropertyId.SpeechServiceResponse_SynthesisServiceLatencyMs))
94110
# you can also get the result id, and send to us when you need help for diagnosis
95111
result_id = result.result_id
96112
```
@@ -101,15 +117,19 @@ result_id = result.result_id
101117

102118
| Latency | Description | [SPXSpeechSynthesisResult](/objectivec/cognitive-services/speech/spxspeechsynthesisresult) property key |
103119
|-----------|-------------|------------|
104-
| `first byte latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received. | `SPXSpeechServiceResponseSynthesisFirstByteLatencyMs` |
105-
| `finish latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received. | `SPXSpeechServiceResponseSynthesisFinishLatencyMs` |
120+
| `first byte client latency` | Indicates the time delay between the synthesis starts and the first audio chunk is received on the client including network latency. | `SPXSpeechServiceResponseSynthesisFirstByteLatencyMs` |
121+
| `finish client latency` | Indicates the time delay between the synthesis starts and the whole synthesized audio is received on the client including network latency. | `SPXSpeechServiceResponseSynthesisFinishLatencyMs` |
122+
| `network latency` | The network latency between the client and Azure TTS service. | `SPXSpeechServiceResponseSynthesisNetworkLatencyMs` |
123+
| `first byte service latency` | Indicates the time delay between Azure TTS service received synthesis request and the first audio chunk is returned. | `SPXSpeechServiceResponseSynthesisServiceLatencyMs` |
106124

107125
The Speech SDK measured the latencies and puts them in the property bag of [`SPXSpeechSynthesisResult`](/objectivec/cognitive-services/speech/spxspeechsynthesisresult). Refer following codes to get them.
108126

109127
```Objective-C
110128
SPXSpeechSynthesisResult *speechResult = [speechSynthesizer speakText:text];
111-
int firstByteLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFirstByteLatencyMs]];
112-
int finishedLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFinishLatencyMs]];
129+
int firstByteClientLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFirstByteLatencyMs]];
130+
int finishedClientLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisFinishLatencyMs]];
131+
int networkLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisNetworkLatencyMs]];
132+
int firstByteServiceLatency = [intString [speechResult.properties getPropertyById:SPXSpeechServiceResponseSynthesisServiceLatencyMs]];
113133
// you can also get the result id, and send to us when you need help for diagnosis
114134
NSString *resultId = result.resultId;
115135
```

0 commit comments

Comments
 (0)