Skip to content

Commit ba80e22

Browse files
authored
Merge pull request #188315 from dargilco/dargilco/lid-doc-undetected-language
Update LID doc to address the 'Unknown` vs. empty string bug
2 parents 2779884 + a89028c commit ba80e22

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

articles/cognitive-services/Speech-Service/language-identification.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ You implement at-start LID or continuous LID by calling methods for [recognize o
9999
You can choose to prioritize accuracy or latency with language identification.
100100
101101
> [!NOTE]
102-
> Low latency is prioritized by default with the Speech SDK. You can choose to prioritize accuracy or latency with the Speech SDKs for C#, C++, and Python.
102+
> Latency is prioritized by default with the Speech SDK. You can choose to prioritize accuracy or latency with the Speech SDKs for C#, C++, and Python.
103103
104104
Prioritize `Latency` if you need a low-latency result such as during live streaming. Set the priority to `Accuracy` if the audio quality may be poor, and more latency is acceptable. For example, a voicemail could have background noise, or some silence at the beginning. Allowing the engine more time will improve language identification results.
105105
@@ -130,7 +130,11 @@ speech_config.set_property(property_id=speechsdk.PropertyId.SpeechServiceConnect
130130
```
131131
::: zone-end
132132
133-
For continuous LID using `Latency` as the priority, the Speech service returns one of the candidate languages provided even if those languages were not in the audio. For example, if `fr-FR` (French) and `en-US` (English) are provided as candidates, but German is spoken, either "French" or "English" would be returned. Otherwise the Speech service returns "Unknown" if none of the candidate languages are detected or if the language identification confidence is low.
133+
When prioritizing `Latency`, the Speech service returns one of the candidate languages provided even if those languages were not in the audio. For example, if `fr-FR` (French) and `en-US` (English) are provided as candidates, but German is spoken, either `fr-FR` or `en-US` would be returned. When prioritizing `Accuracy`, the Speech service will return the string `Unknown` as the detected language if none of the candidate languages are detected or if the language identification confidence is low.
134+
135+
> [!NOTE]
136+
> You may see cases where an empty string will be returned instead of `Unknown`, due to Speech service inconsistency.
137+
> While this note is present, applications should check for both the `Unknown` and empty string case and treat them identically.
134138
135139
### Recognize once or continuous
136140

0 commit comments

Comments
 (0)