Update articles/ai-services/speech-service/faq-stt.yml

wangkenpu · PatrickFarley · web-flow · commit 97782184306b · 2025-10-22T12:00:40.000+08:00
Co-authored-by: Patrick Farley &lt;pafarley@microsoft.com&gt;
diff --git a/articles/ai-services/speech-service/faq-stt.yml b/articles/ai-services/speech-service/faq-stt.yml
@@ -195,7 +195,7 @@ sections:
         answer: |
           The recognized text is generated based on the audio input, the reference text, the `EnableMiscue` configuration and the assessment mode.
 
-          In **scripted assessment**, which offers two modes, single-shot and continuous, and the behavior differs slightly. In single-shot mode, if `EnableMiscue` is set to `false`, the system forces the recognized text to match the reference text. When `EnableMiscue` is `true`, only words present in the reference text are considered as recognized results from the audio input. Continuous mode does not support the `EnableMiscue` option and behaves similarly to single-shot mode with `EnableMiscue` set to `true`. Differences between recognized and reference text may occur due to factors such as pronunciation variations, background noise, or limitations in the speech recognition model.
+          In **scripted assessment**, there are two modes, single-shot and continuous, and the behavior differs slightly. In single-shot mode, if `EnableMiscue` is set to `false`, the system forces the recognized text to match the reference text. When `EnableMiscue` is `true`, only the words present in the reference text are considered as recognized results from the audio input. Continuous mode does not support the `EnableMiscue` option and behaves similarly to single-shot mode with `EnableMiscue` set to `true`. Differences between recognized and reference text might occur due to factors such as pronunciation variations, background noise, or limitations in the speech recognition model.
 
           In **unscripted assessment**, the recognized text is generated solely from the audio input without any reference text, which can lead to discrepancies between the recognized text and the intended content. In these cases, the recognized text reflects what the system interprets from the audio and may not always align with the expected message. If you notice significant differences, review the audio quality and speaker clarity, or consider using Azure Speech-to-Text to transcribe the audio first. You can then use that transcription as the reference text for a more accurate assessment.