Skip to content

Commit 5229394

Browse files
committed
Docs Editor: Update language-identification.md
1 parent c2216d2 commit 5229394

File tree

1 file changed

+46
-50
lines changed

1 file changed

+46
-50
lines changed

articles/cognitive-services/Speech-Service/language-identification.md

Lines changed: 46 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -13,144 +13,159 @@ ms.author: eur
1313
zone_pivot_groups: programming-languages-speech-services-nomore-variant
1414
---
1515

16-
# Language identification
16+
# Language identification (preview)
1717

18-
Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md#language-identification).
18+
Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md#language-identification).
1919

2020
Language identification (LID) use cases include:
2121

2222
* [Standalone language identification](#standalone-language-identification) when you only need to identify the language in an audio source.
23-
* [Speech-to-text recognition](#speech-to-text) when you need to identify the language in an audio source and then transcribe it to text.
24-
* [Speech translation](#speech-translation) when you need to identify the language in an audio source and then translate it to another language.
23+
* [Speech-to-text recognition](#speech-to-text) when you need to identify the language in an audio source and then transcribe it to text.
24+
* [Speech translation](#speech-translation) when you need to identify the language in an audio source and then translate it to another language.
2525

26-
Note that for speech recognition, the initial latency is higher with language identification. You should only include this optional feature as needed.
26+
Note that for speech recognition, the initial latency is higher with language identification. You should only include this optional feature as needed.
2727

2828
## Configuration options
2929

30-
Whether you use language identification [on its own](#standalone-language-identification), with [speech-to-text](#speech-to-text), or with [speech translation](#speech-translation), there are some common concepts and configuration options.
30+
Whether you use language identification [on its own](#standalone-language-identification), with [speech-to-text](#speech-to-text), or with [speech translation](#speech-translation), there are some common concepts and configuration options.
3131

3232
- Define a list of [candidate languages](#candidate-languages) that you expect in the audio.
3333
- Decide whether to use [at-start or continuous](#at-start-and-continuous-language-identification) language identification.
3434
- Prioritize [low latency or high accuracy](#accuracy-and-latency-prioritization) of results.
3535

36-
Then you make a [recognize once or continuous recognition](#recognize-once-or-continuous) request to the Speech service.
36+
Then you make a [recognize once or continuous recognition](#recognize-once-or-continuous) request to the Speech service.
3737

3838
Code snippets are included with the concepts described next. Complete samples for each use case are provided further below.
3939

4040
### Candidate languages
4141

42-
You provide candidate languages, at least one of which is expected be in the audio. You can include up to 4 languages for [at-start LID](#at-start-and-continuous-language-identification) or up to 10 languages for [continuous LID](#at-start-and-continuous-language-identification).
42+
You provide candidate languages, at least one of which is expected be in the audio. You can include up to 4 languages for [at-start LID](#at-start-and-continuous-language-identification) or up to 10 languages for [continuous LID](#at-start-and-continuous-language-identification).
4343

44-
You must provide the full 4-letter locale, but language identification only uses one locale per base language. Do not include multiple locales (e.g., "en-US" and "en-GB") for the same language.
44+
You must provide the full 4-letter locale, but language identification only uses one locale per base language. Do not include multiple locales (e.g., "en-US" and "en-GB") for the same language.
4545

4646
::: zone pivot="programming-language-csharp"
47+
4748
```csharp
4849
var autoDetectSourceLanguageConfig =
4950
AutoDetectSourceLanguageConfig.FromLanguages(new string[] { "en-US", "de-DE", "zh-CN" });
5051
```
52+
5153
::: zone-end
5254
::: zone pivot="programming-language-cpp"
55+
5356
```cpp
5457
auto autoDetectSourceLanguageConfig =
5558
AutoDetectSourceLanguageConfig::FromLanguages({ "en-US", "de-DE", "zh-CN" });
5659
```
60+
5761
::: zone-end
5862
::: zone pivot="programming-language-python"
63+
5964
```python
6065
auto_detect_source_language_config = \
6166
speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE", "zh-CN"])
6267
```
68+
6369
::: zone-end
6470
::: zone pivot="programming-language-java"
71+
6572
```java
6673
AutoDetectSourceLanguageConfig autoDetectSourceLanguageConfig =
6774
AutoDetectSourceLanguageConfig.fromLanguages(Arrays.asList("en-US", "de-DE", "zh-CN"));
6875
```
76+
6977
::: zone-end
7078
::: zone pivot="programming-language-javascript"
79+
7180
```javascript
7281
var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages([("en-US", "de-DE", "zh-CN"]);
7382
```
83+
7484
::: zone-end
7585
::: zone pivot="programming-language-objectivec"
86+
7687
```objective-c
7788
NSArray *languages = @[@"en-US", @"de-DE", @"zh-CN"];
7889
SPXAutoDetectSourceLanguageConfiguration* autoDetectSourceLanguageConfig = \
7990
[[SPXAutoDetectSourceLanguageConfiguration alloc]init:languages];
8091
```
92+
8193
::: zone-end
8294
83-
For more information, see [supported languages](language-support.md#language-identification).
95+
For more information, see [supported languages](language-support.md#language-identification).
8496
8597
### At-start and Continuous language identification
8698
87-
Speech supports both at-start and continuous language identification (LID).
99+
Speech supports both at-start and continuous language identification (LID).
88100
89101
> [!NOTE]
90102
> Continuous language identification is only supported with Speech SDKs in C#, C++, and Python.
91-
92103
- At-start LID identifies the language once within the first few seconds of audio. Use at-start LID if the language in the audio won't change.
93-
- Continuous LID can identify multiple languages for the duration of the audio. Use continuous LID if the language in the audio could change. Continuous LID does not support changing languages within the same sentence. For example, if you are primarily speaking Spanish and insert some English words, it will not detect the language change per word.
104+
- Continuous LID can identify multiple languages for the duration of the audio. Use continuous LID if the language in the audio could change. Continuous LID does not support changing languages within the same sentence. For example, if you are primarily speaking Spanish and insert some English words, it will not detect the language change per word.
94105
95106
You implement at-start LID or continuous LID by calling methods for [recognize once or continuous](#recognize-once-or-continuous). Results also depend upon your [Accuracy and Latency prioritization](#accuracy-and-latency-prioritization).
96107
97108
### Accuracy and Latency prioritization
98109
99-
You can choose to prioritize accuracy or latency with language identification.
110+
You can choose to prioritize accuracy or latency with language identification.
100111
101112
> [!NOTE]
102113
> Latency is prioritized by default with the Speech SDK. You can choose to prioritize accuracy or latency with the Speech SDKs for C#, C++, and Python.
114+
Prioritize `Latency` if you need a low-latency result such as during live streaming. Set the priority to `Accuracy` if the audio quality may be poor, and more latency is acceptable. For example, a voicemail could have background noise, or some silence at the beginning. Allowing the engine more time will improve language identification results.
103115
104-
Prioritize `Latency` if you need a low-latency result such as during live streaming. Set the priority to `Accuracy` if the audio quality may be poor, and more latency is acceptable. For example, a voicemail could have background noise, or some silence at the beginning. Allowing the engine more time will improve language identification results.
105-
106-
* **At-start:** With at-start LID in `Latency` mode the result is returned in less than 5 seconds. With at-start LID in `Accuracy` mode the result is returned within 30 seconds. You set the priority for at-start LID with the `SpeechServiceConnection_SingleLanguageIdPriority` property.
107-
* **Continuous:** With continuous LID in `Latency` mode the results are returned every 2 seconds for the duration of the audio. With continuous LID in `Accuracy` mode the results are returned within no set time frame for the duration of the audio. You set the priority for continuous LID with the `SpeechServiceConnection_ContinuousLanguageIdPriority` property.
116+
* **At-start:** With at-start LID in `Latency` mode the result is returned in less than 5 seconds. With at-start LID in `Accuracy` mode the result is returned within 30 seconds. You set the priority for at-start LID with the `SpeechServiceConnection_SingleLanguageIdPriority` property.
117+
* **Continuous:** With continuous LID in `Latency` mode the results are returned every 2 seconds for the duration of the audio. With continuous LID in `Accuracy` mode the results are returned within no set time frame for the duration of the audio. You set the priority for continuous LID with the `SpeechServiceConnection_ContinuousLanguageIdPriority` property.
108118
109119
> [!IMPORTANT]
110120
> With [speech-to-text](#speech-to-text) and [speech translation](#speech-translation) continuous recognition, do not set `Accuracy`with the SpeechServiceConnection_ContinuousLanguageIdPriority property. The setting will be ignored without error, and the default priority of `Latency` will remain in effect. Only [standalone language identification](#standalone-language-identification) supports continuous LID with `Accuracy` prioritization.
111-
112-
Speech uses at-start LID with `Latency` prioritization by default. You need to set a priority property for any other LID configuration.
121+
Speech uses at-start LID with `Latency` prioritization by default. You need to set a priority property for any other LID configuration.
113122
114123
::: zone pivot="programming-language-csharp"
115124
Here is an example of using continuous LID while still prioritizing latency.
125+
116126
```csharp
117127
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_ContinuousLanguageIdPriority, "Latency");
118128
```
129+
119130
::: zone-end
120131
::: zone pivot="programming-language-cpp"
121132
Here is an example of using continuous LID while still prioritizing latency.
133+
122134
```cpp
123135
speechConfig->SetProperty(PropertyId::SpeechServiceConnection_ContinuousLanguageIdPriority, "Latency");
124136
```
137+
125138
::: zone-end
126139
::: zone pivot="programming-language-python"
127140
Here is an example of using continuous LID while still prioritizing latency.
141+
128142
```python
129143
speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnection_ContinuousLanguageIdPriority, value='Latency')
130144
```
145+
131146
::: zone-end
132147
133148
When prioritizing `Latency`, the Speech service returns one of the candidate languages provided even if those languages were not in the audio. For example, if `fr-FR` (French) and `en-US` (English) are provided as candidates, but German is spoken, either `fr-FR` or `en-US` would be returned. When prioritizing `Accuracy`, the Speech service will return the string `Unknown` as the detected language if none of the candidate languages are detected or if the language identification confidence is low.
134149
135150
> [!NOTE]
136151
> You may see cases where an empty string will be returned instead of `Unknown`, due to Speech service inconsistency.
137152
> While this note is present, applications should check for both the `Unknown` and empty string case and treat them identically.
138-
139153
### Recognize once or continuous
140154
141-
Language identification is completed with recognition objects and operations. You will make a request to the Speech service for recognition of audio.
155+
Language identification is completed with recognition objects and operations. You will make a request to the Speech service for recognition of audio.
142156
143157
> [!NOTE]
144158
> Don't confuse recognition with identification. Recognition can be used with or without language identification.
145-
146159
Let's map these concepts to the code. You will either call the recognize once method, or the start and stop continuous recognition methods. You choose from:
160+
147161
- Recognize once with at-start LID
148162
- Continuous recognition with at start LID
149-
- Continuous recognition with continuous LID
163+
- Continuous recognition with continuous LID
150164
151-
The `SpeechServiceConnection_ContinuousLanguageIdPriority` property is always required for continuous LID. Without it the speech service defaults to at-start lid.
165+
The `SpeechServiceConnection_ContinuousLanguageIdPriority` property is always required for continuous LID. Without it the speech service defaults to at-start lid.
152166
153167
::: zone pivot="programming-language-csharp"
168+
154169
```csharp
155170
// Recognize once with At-start LID
156171
var result = await recognizer.RecognizeOnceAsync();
@@ -164,8 +179,10 @@ speechConfig.SetProperty(PropertyId.SpeechServiceConnection_ContinuousLanguageId
164179
await recognizer.StartContinuousRecognitionAsync();
165180
await recognizer.StopContinuousRecognitionAsync();
166181
```
182+
167183
::: zone-end
168184
::: zone pivot="programming-language-cpp"
185+
169186
```cpp
170187
// Recognize once with At-start LID
171188
auto result = recognizer->RecognizeOnceAsync().get();
@@ -179,8 +196,10 @@ speechConfig->SetProperty(PropertyId::SpeechServiceConnection_ContinuousLanguage
179196
recognizer->StartContinuousRecognitionAsync().get();
180197
recognizer->StopContinuousRecognitionAsync().get();
181198
```
199+
182200
::: zone-end
183201
::: zone pivot="programming-language-python"
202+
184203
```python
185204
# Recognize once with At-start LID
186205
result = recognizer.recognize_once()
@@ -194,23 +213,21 @@ speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnect
194213
source_language_recognizer.start_continuous_recognition()
195214
source_language_recognizer.stop_continuous_recognition()
196215
```
197-
::: zone-end
198216
217+
::: zone-end
199218
200219
## Standalone language identification
201220
202-
You use standalone language identification when you only need to identify the language in an audio source.
221+
You use standalone language identification when you only need to identify the language in an audio source.
203222
204223
> [!NOTE]
205224
> Standalone source language identification is only supported with the Speech SDKs for C#, C++, and Python.
206-
207225
::: zone pivot="programming-language-csharp"
208226
209227
### [Recognize once](#tab/once)
210228
211229
:::code language="csharp" source="~/samples-cognitive-services-speech-sdk/samples/csharp/sharedcontent/console/standalone_language_detection_samples.cs" id="languageDetectionInAccuracyWithFile":::
212230
213-
214231
### [Continuous recognition](#tab/continuous)
215232
216233
:::code language="csharp" source="~/samples-cognitive-services-speech-sdk/samples/csharp/sharedcontent/console/standalone_language_detection_samples.cs" id="languageDetectionContinuousWithFile":::
@@ -235,39 +252,31 @@ See more examples of standalone language identification on [GitHub](https://gith
235252
236253
See more examples of standalone language identification on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/standalone_language_detection_samples.cpp).
237254
238-
239255
::: zone-end
240256
241257
::: zone pivot="programming-language-python"
242258
243259
### [Recognize once](#tab/once)
244260
245-
246261
:::code language="python" source="~/samples-cognitive-services-speech-sdk/samples/python/console/speech_language_detection_sample.py" id="SpeechLanguageDetectionWithFile":::
247262
248-
249263
### [Continuous recognition](#tab/continuous)
250264
251-
252265
:::code language="python" source="~/samples-cognitive-services-speech-sdk/samples/python/console/speech_language_detection_sample.py" id="SpeechContinuousLanguageDetectionWithFile":::
253266
254267
---
255268
256269
See more examples of standalone language identification on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_language_detection_sample.py).
257270
258-
259271
::: zone-end
260272
261-
262273
## Speech-to-text
263274
264275
You use Speech-to-text recognition when you need to identify the language in an audio source and then transcribe it to text. For more information, see [Speech-to-text overview](speech-to-text.md).
265276
266277
> [!NOTE]
267278
> Speech-to-text recognition with at-start language identification is supported with Speech SDKs in C#, C++, Python, Java, JavaScript, and Objective-C. Speech-to-text recognition with continuous language identification is only supported with Speech SDKs in C#, C++, and Python.
268-
>
269279
> Currently for speech-to-text recognition with continuous language identification, you must create a SpeechConfig from the `wss://{region}.stt.speech.microsoft.com/speech/universal/v2` endpoint string, as shown in code examples. In a future SDK release you won't need to set it.
270-
271280
::: zone pivot="programming-language-csharp"
272281

273282
### [Recognize once](#tab/once)
@@ -389,7 +398,6 @@ See more examples of speech-to-text recognition with language identification on
389398

390399
::: zone pivot="programming-language-cpp"
391400

392-
393401
### [Recognize once](#tab/once)
394402

395403
```cpp
@@ -414,14 +422,12 @@ auto autoDetectSourceLanguageResult =
414422
auto detectedLanguage = autoDetectSourceLanguageResult->Language;
415423
```
416424

417-
418425
### [Continuous recognition](#tab/continuous)
419426

420427
:::code language="cpp" source="~/samples-cognitive-services-speech-sdk/samples/cpp/windows/console/samples/speech_recognition_samples.cpp" id="SpeechContinuousRecognitionAndLanguageIdWithMultiLingualFile":::
421428

422429
See more examples of speech-to-text recognition with language identification on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp).
423430

424-
425431
::: zone-end
426432

427433
::: zone pivot="programming-language-java"
@@ -454,7 +460,6 @@ See more examples of speech-to-text recognition with language identification on
454460

455461
::: zone pivot="programming-language-python"
456462

457-
458463
### [Recognize once](#tab/once)
459464

460465
```Python
@@ -469,10 +474,8 @@ auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(re
469474
detected_language = auto_detect_source_language_result.language
470475
```
471476

472-
473477
### [Continuous recognition](#tab/continuous)
474478

475-
476479
```python
477480
import azure.cognitiveservices.speech as speechsdk
478481
import time
@@ -539,7 +542,6 @@ SPXAutoDetectSourceLanguageResult *languageDetectionResult = [[SPXAutoDetectSour
539542
NSString *detectedLanguage = [languageDetectionResult language];
540543
```
541544

542-
543545
::: zone-end
544546

545547
::: zone pivot="programming-language-javascript"
@@ -556,7 +558,6 @@ speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult)
556558

557559
::: zone-end
558560

559-
560561
### Using Speech-to-text custom models
561562

562563
::: zone pivot="programming-language-csharp"
@@ -647,19 +648,15 @@ var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fr
647648

648649
::: zone-end
649650

650-
651651
## Speech translation
652652

653653
You use Speech translation when you need to identify the language in an audio source and then translate it to another language. For more information, see [Speech translation overview](speech-translation.md).
654654

655655
> [!NOTE]
656656
> Speech translation with language identification is only supported with Speech SDKs in C#, C++, and Python.
657-
>
658657
> Currently for speech translation with language identification, you must create a SpeechConfig from the `wss://{region}.stt.speech.microsoft.com/speech/universal/v2` endpoint string, as shown in code examples. In a future SDK release you won't need to set it.
659-
660658
::: zone pivot="programming-language-csharp"
661659
662-
663660
### [Recognize once](#tab/once)
664661
665662
```csharp
@@ -991,7 +988,6 @@ See more examples of speech translation with language identification on [GitHub]
991988
992989
::: zone pivot="programming-language-python"
993990
994-
995991
### [Recognize once](#tab/once)
996992
997993
```python

0 commit comments

Comments
 (0)