Merge pull request #6476 from goergenj/20250808-speech-known-issues-update

prmerger-automator[bot] · web-flow · commit c73743b604f7 · 2025-08-08T17:08:44.000Z
Updated Known Issues page for August
diff --git a/articles/ai-services/speech-service/known-issues.md b/articles/ai-services/speech-service/known-issues.md
@@ -28,6 +28,7 @@ This table lists the current known issues for the Speech to text feature:
 | 1003   | Model | STT transcriptions might include unexpected internal system tags. | Unexpected tags like 'nsnoise' have been appearing in transcription results. Initially customers reported this issue for the Arabic model (ar-SA), this issue was also observed in English models (en-US and en-GB). These tags are causing intermittent problems in the transcription outputs. To address this issue, a filter will be added to remove 'nsnoise' from the training data in future model updates. | N/A | June 9, 2025 |
 | 1004   | Model | STT transcriptions with inaccurate spellings of language specific names and words | Inaccurate transcription of language specific names due to lack of entity coverage in base model for tier 2 locales (scenario specific to when our base models didn't see a specific word before). | Customers can train [Custom Speech](/azure/ai-services/speech-service/custom-speech-overview) models to include unknown names and words as training data. As a second step, unknown words can be added as [Phrase List](/azure/ai-services/speech-service/improve-accuracy-phrase-list?tabs=terminal&pivots=programming-language-csharp) at runtime. Biasing phrase list to a word known in the training corpus can greatly improve recognition accuracy. | June 9, 2025 |
 | 1005   | File types | Words out of context added in STT real time output occasionally | Audio files that consist solely of background noise can result in inaccurate transcriptions. Ideally, only spoken sentences should be transcribed, but this isn't occurring with the nl-NL model. | Audio files that consist of background noise, captured echo reflections from surfaces in an environment or audio playback from a device while device microphone is active can result in inaccurate transcriptions. Customers can use the Microsoft Audio Stack built into the Speech SDK for noise suppression of observed background noise and echo cancellation. This should help optimize the audio being fed to the STT service: [Use the Microsoft Audio Stack (MAS)](/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=java). | June 9, 2025 |
+| 1006  | File types  | MP4 decoding failure due to 'moov atom' position | The decoding of MP4 container files might fail because the "moov atom" is located at the end of the file instead of the beginning. This structure makes the file unstreamable for the current service and the underlying Microsoft MTS service, especially for files larger than 10MB. Supporting such formats would require fundamental changes. | Preprocess the file using audio codec utilities to move the 'moov atom' to the beginning or convert to MP3. | August 8, 2025 |
 
 ## Active known issues text to speech (TTS)
 
@@ -40,6 +41,7 @@ This table lists the current known issues for the Text-to-Speech feature.
 | 2003   | TTS Avatar | Missing Blob file names | The 'outputs': 'result' url of Batch avatar synthesis job doesn't have the blob file name. | Customers should use 'subtitleType = soft_embedded' as a temporary workaround. | June 9, 2025 |
 | 2004   | TTS Avatar | Batch synthesis unsupported for TTS | Batch synthesis for avatar doesn't support bring-your-own-storage (BYOS) and it requires the storage account to allow external traffic. | N/A | June 9, 2025 |
 | 2005   | Service | DNS cache refresh before end of July 2025  | Due to compliance reasons, the legacy Speech TTS clusters in Asia are removed on July 31st, 2025. All traffic is migrated from the old IPs to the new ones.<br>Some customers are still accessing the old clusters even after DNS redirection has been completed. This indicates that some customers may have persistent local or secondary DNS caches. | To avoid service downtime, please refresh the DNS cache before the end of July 2025. | July 24, 2025 |
+| 2006  | TTS | Word boundary duplication in output | Azure TTS sometimes returns duplicated word boundary entries in the synthesis output, particularly when using certain SSML configurations. This can lead to inaccurate timing data and misalignment in downstream applications. | Post-process the output to filter out duplicate word boundaries based on timestamp and word content. | August 8, 2025 |
 
 ## Active known issues speech SDK/Runtime