Skip to content

Commit cd157b7

Browse files
authored
Merge pull request #5452 from MicrosoftDocs/main
6/10/2025 11:00 AM IST Publish
2 parents 4e306c3 + 2d456de commit cd157b7

File tree

8 files changed

+249
-186
lines changed

8 files changed

+249
-186
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ _repo.*/
1111

1212
.openpublishing.buildcore.ps1
1313

14+
.vscode/
15+
1416
*sec.endpointdlp
1517

1618
# CoPilot instructions and prompts

.vscode/settings.json

Lines changed: 0 additions & 5 deletions
This file was deleted.

articles/ai-services/content-understanding/toc.yml

Lines changed: 53 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -26,65 +26,63 @@ items:
2626
- name: Glossary
2727
displayName: glossary, definition, updates, previews
2828
href: glossary.md
29-
- name: Azure AI Foundry portal
29+
- name: Quickstarts
3030
items:
31-
- name: Analyzer templates
31+
- name: Try Azure AI Foundry portal
32+
items:
33+
- name: Use Content Understanding with a single file
34+
displayName: quickstart, extract, text, images, OCR, optical character recognition, foundry, standard, mode
35+
href: quickstart/use-ai-foundry.md
36+
- name: Use Content Understanding with multiple files
37+
displayName: quickstart, extract, text, images, OCR, optical character recognition, foundry, pro, mode
38+
href: quickstart/use-ai-foundry-pro-mode.md
39+
- name: Try Content Understanding REST API
40+
displayName: quickstart, extract, text, images, OCR, optical character recognition, rest, standard, mode
41+
href: quickstart/use-rest-api.md
42+
- name: Modalities
43+
items:
44+
- name: Document
45+
items:
46+
- name: Overview
47+
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
48+
href: document/overview.md
49+
- name: Elements 🆕
50+
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
51+
href: document/elements.md
52+
- name: Markdown 🆕
53+
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
54+
href: document/markdown.md
55+
- name: Image
56+
displayName: image, OCR, optical character recognition, text, extraction, analysis, detection, recognition, model
57+
href: image/overview.md
58+
- name: Audio
59+
displayName: speech, audio, voice, recognition, synthesis, speaker, identification, verification, diarization, transcription, translation, language, understanding, sentiment, analysis, emotion, detection, pronunciation, model
60+
href: audio/overview.md
61+
- name: Video
62+
href: video/overview.md
63+
displayName: video, audio, voice, recognition, synthesis, speaker, identification, verification, diarization, transcription, translation, language, understanding, sentiment, analysis, emotion, detection, pronunciation, model
64+
- name: Concepts
65+
items:
66+
- name: Analyzer templates in Azure AI Foundry
3267
displayName: analyzer, templates, document, text, images, video, audio, multimodal, visual, structured, content, field, extraction
3368
href: concepts/analyzer-templates.md
34-
- name: Quickstarts
35-
items:
36-
- name: Try Content Understanding with a single file"
37-
displayName: quickstart, extract, text, images, OCR, optical character recognition, foundry, standard, mode
38-
href: quickstart/use-ai-foundry.md
39-
- name: Try Content Understanding with multiple files"
40-
displayName: quickstart, extract, text, images, OCR, optical character recognition, foundry, pro, mode
41-
href: quickstart/use-ai-foundry-pro-mode.md
42-
- name: Analyzers
69+
- name: Prebuilt analyzers 🆕
70+
displayName: analyzer, templates, document, text, images, video, audio, multimodal, visual, structured, content, field, extraction
71+
href: concepts/prebuilt-analyzers.md
72+
- name: "Modes: standard and pro 🆕"
73+
displayName: standard, pro, modes, analyzers, optimization, fields
74+
href: concepts/standard-pro-modes.md
75+
- name: Best practices
76+
displayName: best practices, analyzers, optimization, fields
77+
href: concepts/best-practices.md
78+
- name: Tutorials
4379
items:
44-
- name: Quickstart
45-
displayName: quickstart, extract, text, images, OCR, optical character recognition, content filtering, filter
46-
href: quickstart/use-rest-api.md
47-
- name: Modalities
48-
items:
49-
- name: Document
50-
items:
51-
- name: Overview
52-
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
53-
href: document/overview.md
54-
- name: Elements 🆕
55-
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
56-
href: document/elements.md
57-
- name: xMarkdown 🆕
58-
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
59-
href: document/markdown.md
60-
- name: Image
61-
displayName: image, OCR, optical character recognition, text, extraction, analysis, detection, recognition, model
62-
href: image/overview.md
63-
- name: Audio
64-
displayName: speech, audio, voice, recognition, synthesis, speaker, identification, verification, diarization, transcription, translation, language, understanding, sentiment, analysis, emotion, detection, pronunciation, model
65-
href: audio/overview.md
66-
- name: Video
67-
href: video/overview.md
68-
displayName: video, audio, voice, recognition, synthesis, speaker, identification, verification, diarization, transcription, translation, language, understanding, sentiment, analysis, emotion, detection, pronunciation, model
69-
- name: Concepts
70-
items:
71-
- name: Prebuilt analyzers 🆕
72-
displayName: analyzer, templates, document, text, images, video, audio, multimodal, visual, structured, content, field, extraction
73-
href: concepts/prebuilt-analyzers.md
74-
- name: "Modes: standard and pro 🆕"
75-
displayName: standard, pro, modes, analyzers, optimization, fields
76-
href: concepts/standard-pro-modes.md
77-
- name: Best practices
78-
displayName: best practices, analyzers, optimization, fields
79-
href: concepts/best-practices.md
80-
- name: Tutorials
81-
items:
82-
- name: Create a custom analyzer 🆕
83-
displayName: custom, analyzer, document, text, images, video, audio, multimodal, visual, structured, content, field, extraction
84-
href: tutorial/create-custom-analyzer.md
85-
- name: Build a retrieval-augmented solution
86-
displayName: RAG, retrieval, augmented, generation, knowledge, base, search, index, vector
87-
href: tutorial/build-rag-solution.md
80+
- name: Create a custom analyzer 🆕
81+
displayName: custom, analyzer, document, text, images, video, audio, multimodal, visual, structured, content, field, extraction
82+
href: tutorial/create-custom-analyzer.md
83+
- name: Build a retrieval-augmented solution
84+
displayName: RAG, retrieval, augmented, generation, knowledge, base, search, index, vector
85+
href: tutorial/build-rag-solution.md
8886
- name: Classifiers 🆕
8987
items:
9088
- name: Overview

articles/ai-services/openai/faq.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,16 @@ sections:
109109
How do I fix InternalServerError - 500 - Failed to create completion as the model generated invalid Unicode output?
110110
answer:
111111
You can minimize the occurrence of these errors by reducing the temperature of your prompts to less than 1 and ensuring you're using a client with retry logic. Reattempting the request often results in a successful response.
112+
- question: |
113+
How do I fix Server error (500): Unexpected special token
114+
answer: |
115+
This is a a known issue. You can minimize the occurrence of these errors by reducing the temperature of your prompts to less than 1 and ensuring you're using a client with retry logic. Reattempting the request often results in a successful response.
116+
117+
If reducing temperature to less than 1 does not reduce the frequency of this error an alternative workaround is set presence/frequency penalties and logit biases to their default values. In some cases, it may help to set `top_p` to a non-default, lower value to encourage the model to avoid sampling tokens with lower probability tokens.
118+
112119
- question: |
113120
We noticed charges associated with API calls that failed to complete with status code 400. Why are failed API calls generating a charge?
114-
answer:
121+
answer:
115122
If the service performs processing, you will be charged even if the status code is not successful (not 200).
116123
Common examples of this are, a 400 error due to a content filter or input limit, or a 408 error due to a time-out. Charges will also occur when a `status 200` is received with a `finish_reason` of `content_filter`.
117124
In this case the prompt did not have any issues, but the completion generated by the model was detected to violate the content filtering rules, which result in the completion being filtered.

articles/ai-services/speech-service/includes/language-support/stt.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ ms.author: eur
2424
| `ar-OM` | Arabic (Oman) | No | Audio + human-labeled transcript<br/><br/>Plain text |
2525
| `ar-PS` | Arabic (Palestinian Authority) | No | Audio + human-labeled transcript<br/><br/>Plain text |
2626
| `ar-QA` | Arabic (Qatar) | No | Audio + human-labeled transcript<br/><br/>Plain text |
27-
| `ar-SA` | Arabic (Saudi Arabia) | No | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Phrase list |
27+
| `ar-SA` | Arabic (Saudi Arabia) | Yes | Audio + human-labeled transcript<br/><br/>Plain text<br/><br/>Structured text<br/><br/>Phrase list |
2828
| `ar-SY` | Arabic (Syria) | No | Audio + human-labeled transcript<br/><br/>Plain text |
2929
| `ar-TN` | Arabic (Tunisia) | No | Audio + human-labeled transcript<br/><br/>Plain text |
3030
| `ar-YE` | Arabic (Yemen) | No | Audio + human-labeled transcript<br/><br/>Plain text |
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Azure AI Speech known issues
3+
titlesuffix: Azure AI services
4+
description: Known and common issues with Azure AI Speech.
5+
manager: heikora
6+
ms.service: azure-ai-speech
7+
ms.topic: reference
8+
ms.date: 06/09/2025
9+
author: goergenj
10+
ms.author: jagoerge
11+
---
12+
13+
# Azure AI Speech known issues
14+
15+
Azure AI Speech is updated regularly and we're continually improving and enhancing its features and capabilities. This page details known issues related to Azure AI Speech and provides steps to resolve them. Before submitting a support request, review the following list to see if your problem is already being addressed and to find a possible solution.
16+
17+
* For more information regarding service-level outages, *see* the [Azure status page](https://azure.status.microsoft/en-us/status).
18+
* To set up outage notifications and alerts, *see* the [Azure Service Health Portal](/azure/service-health/service-health-portal-update).
19+
20+
## Active known issues speech to text (STT)
21+
22+
This table lists the current known issues for the Speech to text feature:
23+
24+
|Issue ID|Category|Tile|Description|Workaround|Issues publish date|
25+
|--------|--------|----|-----------|----------|-------------------|
26+
| 1001 | Content | STT transcriptions with pound units | In certain instances, the use of pound units can pose difficulties for transcription. When phrases are spoken in a UK dialect, they're often inaccurately converted during real-time transcription, leading to the term 'pounds' being automatically translated to 'lbs' irrespective of the language setting. | Users can use Custom Display Post Processing (DPP) to train a custom speech model to correct default DPP results (for example, Pounds {tab} Pounds). Refer to [Custom Rewrite Rules](/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 9, 2025 |
27+
| 1002 | Content | STT transcriptions with cardinal directions | The speech recognition model 20241218 might inaccurately interpret audio inputs that include cardinal directions, resulting in unexpected transcription outcomes. For instance, an audio file containing "SW123456" might be transcribed as "Southwest 123456," and similar errors can occur with other cardinal directions. | Potential workaround is to use Custom Display formatting where "Southwest" is mapped to "SW" in a rewrite rule: [Custom Rewrite Rules](/azure/ai-services/speech-service/how-to-custom-speech-display-text-format#custom-rewrite). | June 9, 2025 |
28+
| 1003 | Model | STT transcriptions might include unexpected internal system tags. | Unexpected tags like 'nsnoise' have been appearing in transcription results. Initially customers reported this issue for the Arabic model (ar-SA), this issue was also observed in English models (en-US and en-GB). These tags are causing intermittent problems in the transcription outputs. To address this issue, a filter will be added to remove 'nsnoise' from the training data in future model updates. | N/A | June 9, 2025 |
29+
| 1004 | Model | STT transcriptions with inaccurate spellings of language specific names and words | Inaccurate transcription of language specific names due to lack of entity coverage in base model for tier 2 locales (scenario specific to when our base models didn't see a specific word before). | Customers can train [Custom Speech](/azure/ai-services/speech-service/custom-speech-overview) models to include unknown names and words as training data. As a second step, unknown words can be added as [Phrase List](/azure/ai-services/speech-service/improve-accuracy-phrase-list?tabs=terminal&pivots=programming-language-csharp) at runtime. Biasing phrase list to a word known in the training corpus can greatly improve recognition accuracy. | June 9, 2025 |
30+
| 1005 | File types | Words out of context added in STT real time output occasionally | Audio files that consist solely of background noise can result in inaccurate transcriptions. Ideally, only spoken sentences should be transcribed, but this isn't occurring with the nl-NL model. | Audio files that consist of background noise, captured echo reflections from surfaces in an environment or audio playback from a device while device microphone is active can result in inaccurate transcriptions. Customers can use the Microsoft Audio Stack built into the Speech SDK for noise suppression of observed background noise and echo cancellation. This should help optimize the audio being fed to the STT service: [Use the Microsoft Audio Stack (MAS)](/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=java). | June 9, 2025 |
31+
32+
## Active known issues text to speech (TTS)
33+
34+
This table lists the current known issues for the Text-to-Speech feature.
35+
36+
|Issue ID|Category|Tile|Description|Workaround|Issues publish date|
37+
|--------|--------|----|-----------|----------|-------------------|
38+
| 2001 | Service | Model copying via Rest API | The TTS service doesn't allow model copying via the REST API for disaster recovery purposes. | N/A | June 9, 2025 |
39+
| 2002 | TTS Avatar | Missing parameters | TTS Avatar parameters "avatarPosition" and "avatarSize" not supported in Batch synthesis. | N/A | June 9, 2025 |
40+
| 2003 | TTS Avatar | Missing Blob file names | The 'outputs': 'result' url of Batch avatar synthesis job doesn't have the blob file name. | Customers should use 'subtitleType = soft_embedded' as a temporary workaround. | June 9, 2025 |
41+
| 2004 | TTS Avatar | Batch synthesis unsupported for TTS | Batch synthesis for avatar doesn't support bring-your-own-storage (BYOS) and it requires the storage account to allow external traffic. | N/A | June 9, 2025 |
42+
43+
## Active known issues speech SDK/Runtime
44+
45+
This table lists the current known issues for the Speech SDK/Runtime feature.
46+
47+
|Issue ID|Category|Tile|Description|Workaround|Issues publish date|
48+
|--------|--------|----|-----------|----------|-------------------|
49+
| 3001 | SDK/SR Runtime | Handling of the InitialSilenceTimeout parameter | The issue is related to the handling of the InitialSilenceTimeout parameter. When set to 0, it unexpectedly caused customers to encounter 400 errors. Additionally, the endSilenceTimeout parameter might lead to incorrect transcriptions. When the endSilenceTimeout is set to a value other than "0", the system disregards user input after the specified duration, even if the user continues speaking. Customers want all parts of the conversation to be transcribed, including segments after pauses, to ensure no user input is lost. | The 400 error is due to "InitialSilenceTimeout" parameter not being currently exposed directly in Real-time Speech Recognition endpoint resulting in a failed URL consistency check. To bypass this error, customers can perform the following steps: <br> Adjust their production code to use Region/Key instantiation of SpeechConfig object. <ul> <li>SpeechConfig = fromSubscription (String subscriptionKey, String region); where region is the Azure Region where the Speech resource is located. </li> <li>Set the parameter "InitialSilenceTimeoutMs" to 0, which in effect disables timeout due to initial silence in the recognition audio stream. </li> </ul> Note: For single shot recognition, the session will be terminated after 30 seconds of initial silence. For continuous recognition, the service will report empty phrase after 30 seconds and continue the recognition process. This issue is due to a second parameter "Speech_SegmentationMaximumTimeMs" which determines the maximum length of a phrase and has default value of 30,000 ms. | June 9, 2025 |
50+
| 3002 | SDK/SR Runtime | Handling of SegmentationTimeout parameter | Customers experience random words being generated as part of Speech recognition results (hallucinations) when the SegmentationSilenceTimeout parameter is set to > 1,000 ms. | Customers should maintain the default "SegmentationTimeout" value of 650 ms. | June 9, 2025 |
51+
| 3003 | SDK/SR Runtime | Handling of speaker duration during Real-time diarization in STT | Python SDK not showing duration of speakers when using Real-time diarization with STT. | Check offset and duration on the result following steps on the following Documentation: [Conversation Transcription Result Class](/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.transcription.conversationtranscriptionresult). | June 9, 2025 |
52+
| 3004 | SDK/TTS Avatar | Frequent disconnections with JavaScript SDK | TTS Avatar isn't loading/Frequent disconnections and reconnections of a custom avatar using the JavaScript SDK. | Customers should open the UDP 3478 port. | June 9, 2025 |
53+
54+
## Recently closed known issues
55+
56+
Fixed known issues are organized in this section in descending order by fixed date. Fixed issues are retained for at least 60 days.
57+
58+
## Related content
59+
60+
* [Azure Service Health Portal](/azure/service-health/service-health-portal-update)
61+
* [Azure Status overview](/azure/service-health/azure-status-overview)
62+
* [What's new in Azure AI Translator?](./releasenotes.md)

articles/ai-services/speech-service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -573,6 +573,8 @@ items:
573573
href: /legal/cognitive-services/speech-service/speaker-recognition/data-privacy-speaker-recognition?context=/azure/ai-services/speech-service/context/context
574574
- name: Resources
575575
items:
576+
- name: Known issues
577+
href: known-issues.md
576578
- name: Support and help options
577579
href: ../cognitive-services-support-options.md?context=/azure/ai-services/speech-service/context/context
578580
- name: Pricing calculator

0 commit comments

Comments
 (0)