Skip to content

Commit 3e027f0

Browse files
committed
Addding Speech Known Issues page
1 parent 385b96e commit 3e027f0

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/ai-services/speech-service/known-issues.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ This table lists the current known issues for the Speech to text feature:
2525

2626
|Issue ID|Category|Tile|Description|Workaround|Issues publish date|
2727
|--------|--------|----|-----------|----------|-------------------|
28-
| 1001 | Content | STT transcriptions with pound units | In certain instances, the use of pound units can pose difficulties for transcription. When phrases are spoken in a UK dialect, they're often inaccurately converted during real-time transcription, leading to the term 'pounds' being automatically translated to 'lbs' irrespective of the language setting. | Users can use Custom Display Post Processing (DPP) to train a custom speech model to correct default DPP results (for example, Pounds {tab} Pounds). Refer to [Custom Rewrite Rules](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 6, 2025 |
29-
| 1002 | Content | STT transcriptions with cardinal directions | The speech recognition model 20241218 might inaccurately interpret audio inputs that include cardinal directions, resulting in unexpected transcription outcomes. For instance, an audio file containing "SW123456" might be transcribed as "Southwest 123456," and similar errors can occur with other cardinal directions. | Potential workaround is to use Custom Display formatting where "Southwest" is mapped to "SW" in a rewrite rule: [Custom Rewrite Rules](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 6, 2025 |
28+
| 1001 | Content | STT transcriptions with pound units | In certain instances, the use of pound units can pose difficulties for transcription. When phrases are spoken in a UK dialect, they're often inaccurately converted during real-time transcription, leading to the term 'pounds' being automatically translated to 'lbs' irrespective of the language setting. | Users can use Custom Display Post Processing (DPP) to train a custom speech model to correct default DPP results (for example, Pounds {tab} Pounds). Refer to [Custom Rewrite Rules](/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 6, 2025 |
29+
| 1002 | Content | STT transcriptions with cardinal directions | The speech recognition model 20241218 might inaccurately interpret audio inputs that include cardinal directions, resulting in unexpected transcription outcomes. For instance, an audio file containing "SW123456" might be transcribed as "Southwest 123456," and similar errors can occur with other cardinal directions. | Potential workaround is to use Custom Display formatting where "Southwest" is mapped to "SW" in a rewrite rule: [Custom Rewrite Rules](/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 6, 2025 |
3030
| 1003 | Model | STT transcriptions might include unexpected internal system tags. | Unexpected tags like 'nsnoise' have been appearing in transcription results. Initially customers reported this issue for the Arabic model (ar-SA), this issue was also observed in English models (en-US and en-GB). These tags are causing intermittent problems in the transcription outputs. To address this issue, a filter will be added to remove 'nsnoise' from the training data in future model updates. | N/A | June 6, 2025 |
31-
| 1004 | Model | STT transcriptions with inaccurate spellings of language specific names and words | Inaccurate transcription of language specific names due to lack of entity coverage in base model for tier 2 locales (scenario specific to when our base models didn't see a specific word before). | Customers can train [Custom Speech](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-speech-overview) models to include unknown names and words as training data. As a second step, unknown words can be added as [Phrase List](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/improve-accuracy-phrase-list?tabs=terminal&pivots=programming-language-csharp) at runtime. Biasing phrase list to a word known in the training corpus can greatly improve recognition accuracy. | June 6, 2025 |
32-
| 1005 | File types | Words out of context added in STT real time output occasionally | Audio files that consist solely of background noise can result in inaccurate transcriptions. Ideally, only spoken sentences should be transcribed, but this isn't occurring with the nl-NL model. | Audio files that consist of background noise, captured echo reflections from surfaces in an environment or audio playback from a device while device microphone is active can result in inaccurate transcriptions. Customers can use the Microsoft Audio Stack built into the Speech SDK for noise suppression of observed background noise and echo cancellation. This should help optimize the audio being fed to the STT service: [Use the Microsoft Audio Stack (MAS) - Speech service - Azure AI services \| Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=java). | June 6, 2025 |
31+
| 1004 | Model | STT transcriptions with inaccurate spellings of language specific names and words | Inaccurate transcription of language specific names due to lack of entity coverage in base model for tier 2 locales (scenario specific to when our base models didn't see a specific word before). | Customers can train [Custom Speech](/azure/ai-services/speech-service/custom-speech-overview) models to include unknown names and words as training data. As a second step, unknown words can be added as [Phrase List](/azure/ai-services/speech-service/improve-accuracy-phrase-list?tabs=terminal&pivots=programming-language-csharp) at runtime. Biasing phrase list to a word known in the training corpus can greatly improve recognition accuracy. | June 6, 2025 |
32+
| 1005 | File types | Words out of context added in STT real time output occasionally | Audio files that consist solely of background noise can result in inaccurate transcriptions. Ideally, only spoken sentences should be transcribed, but this isn't occurring with the nl-NL model. | Audio files that consist of background noise, captured echo reflections from surfaces in an environment or audio playback from a device while device microphone is active can result in inaccurate transcriptions. Customers can use the Microsoft Audio Stack built into the Speech SDK for noise suppression of observed background noise and echo cancellation. This should help optimize the audio being fed to the STT service: [Use the Microsoft Audio Stack (MAS) - Speech service - Azure AI services \| Microsoft Learn](/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=java). | June 6, 2025 |
3333

3434
### Text to speech (TTS)
3535

@@ -50,7 +50,7 @@ This table lists the current known issues for the Speech SDK/Runtime feature.
5050
|--------|--------|----|-----------|----------|-------------------|
5151
| 3001 | SDK/SR Runtime | Handling of the InitialSilenceTimeout parameter | The issue is related to the handling of the InitialSilenceTimeout parameter. When set to 0, it unexpectedly caused customers to encounter 400 errors. Additionally, the endSilenceTimeout parameter might lead to incorrect transcriptions. When the endSilenceTimeout is set to a value other than "0", the system disregards user input after the specified duration, even if the user continues speaking. Customers want all parts of the conversation to be transcribed, including segments after pauses, to ensure no user input is lost. | The 400 error is due to "InitialSilenceTimeout" parameter not being currently exposed directly in Real-time Speech Recognition endpoint resulting in a failed URL consistency check. To bypass this error, customers can perform the following steps: <br> Adjust their production code to use Region/Key instantiation of SpeechConfig object. <ul> <li>SpeechConfig = fromSubscription (String subscriptionKey, String region); where region is the Azure Region where the Speech resource is located. </li> <li>Set the parameter "InitialSilenceTimeoutMs" to 0, which in effect disables timeout due to initial silence in the recognition audio stream. </li> </ul> Note: For single shot recognition, the session will be terminated after 30 seconds of initial silence. For continuous recognition, the service will report empty phrase after 30 seconds and continue the recognition process. This issue is due to a second parameter "Speech_SegmentationMaximumTimeMs" which determines the maximum length of a phrase and has default value of 30,000 ms. | June 6, 2025 |
5252
| 3002 | SDK/SR Runtime | Handling of SegmentationTimeout parameter | Customers experience random words being generated as part of Speech recognition results (hallucinations) when the SegmentationSilenceTimeout parameter is set to > 1,000 ms. | Customers should maintain the default "SegmentationTimeout" value of 650 ms. | June 6, 2025 |
53-
| 3003 | SDK/SR Runtime | Handling of speaker duration during Real-time diarization in STT | Python SDK not showing duration of speakers when using Real-time diarization with STT. | Check offset and duration on the result following steps on the following Documentation: [Conversation Transcription Result Class](https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.transcription.conversationtranscriptionresult?view=azure-python). | June 6, 2025 |
53+
| 3003 | SDK/SR Runtime | Handling of speaker duration during Real-time diarization in STT | Python SDK not showing duration of speakers when using Real-time diarization with STT. | Check offset and duration on the result following steps on the following Documentation: [Conversation Transcription Result Class](/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.transcription.conversationtranscriptionresult?view=azure-python). | June 6, 2025 |
5454
| 3004 | SDK/TTS Avatar | Frequent disconnections with JavaScript SDK | TTS Avatar isn't loading/Frequent disconnections and reconnections of a custom avatar using the JavaScript SDK. | Customers should open the UDP 3478 port. | June 6, 2025 |
5555

5656
## Recently closed known issues

0 commit comments

Comments
 (0)