Skip to content

Commit 9c2b041

Browse files
authored
Merge pull request #227722 from sally-baolian/patch-100
Update pronunciation-assessment-tool.md
2 parents eb732bb + 0b6cdd6 commit 9c2b041

File tree

3 files changed

+72
-17
lines changed

3 files changed

+72
-17
lines changed

articles/cognitive-services/Speech-Service/how-to-pronunciation-assessment.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -678,7 +678,53 @@ Pronunciation assessment results for the spoken word "hello" are shown as a JSON
678678
}
679679
```
680680

681+
## Pronunciation assessment in streaming mode
682+
683+
Pronunciation assessment supports uninterrupted streaming mode. The recording time can be unlimited through the Speech SDK. As long as you don't stop recording, the evaluation process doesn't finish and you can pause and resume evaluation conveniently. In streaming mode, the `AccuracyScore`, `FluencyScore` , and `CompletenessScore` will vary over time throughout the recording and evaluation process.
684+
685+
::: zone pivot="programming-language-csharp"
686+
687+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
688+
689+
::: zone-end
690+
691+
::: zone pivot="programming-language-cpp"
692+
693+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
694+
695+
::: zone-end
696+
697+
::: zone pivot="programming-language-java"
698+
699+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/android/sdkdemo/app/src/main/java/com/microsoft/cognitiveservices/speech/samples/sdkdemo/MainActivity.java#L548).
700+
701+
::: zone-end
702+
703+
::: zone pivot="programming-language-python"
704+
705+
706+
::: zone-end
707+
708+
::: zone pivot="programming-language-javascript"
709+
710+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/node/pronunciationAssessment.js).
711+
712+
::: zone-end
713+
714+
::: zone pivot="programming-language-objectivec"
715+
716+
::: zone-end
717+
718+
::: zone pivot="programming-language-swift"
719+
720+
::: zone-end
721+
722+
::: zone pivot="programming-language-go"
723+
724+
::: zone-end
725+
681726
## Next steps
682727

728+
- Learn our quality [benchmark](https://techcommunity.microsoft.com/t5/ai-cognitive-services-blog/speech-service-update-hierarchical-transformer-for-pronunciation/ba-p/3740866)
683729
- Try out [pronunciation assessment in Speech Studio](pronunciation-assessment-tool.md)
684-
- Try out the [pronunciation assessment demo](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/BrowserJS) and watch the [video tutorial](https://www.youtube.com/watch?v=zFlwm7N4Awc) of pronunciation assessment.
730+
- Check out easy-to-deploy Pronunciation Assessment [demo](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/BrowserJS) and watch the [video tutorial](https://www.youtube.com/watch?v=zFlwm7N4Awc) of pronunciation assessment.
18.9 KB
Loading

articles/cognitive-services/Speech-Service/pronunciation-assessment-tool.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Pronunciation assessment uses the Speech-to-Text capability to provide subjectiv
1919
Pronunciation assessment provides various assessment results in different granularities, from individual phonemes to the entire text input.
2020
- At the full-text level, pronunciation assessment offers additional Fluency and Completeness scores: Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words, and Completeness indicates how many words are pronounced in the speech to the reference text input. An overall score aggregated from Accuracy, Fluency and Completeness is then given to indicate the overall pronunciation quality of the given speech.
2121
- At the word-level, pronunciation assessment can automatically detect miscues and provide accuracy score simultaneously, which provides more detailed information on omission, repetition, insertions, and mispronunciation in the given speech.
22-
- Syllable-level accuracy scores are currently only available via the [JSON file](?tabs=json#scores-within-words) or [Speech SDK](how-to-pronunciation-assessment.md).
22+
- Syllable-level accuracy scores are currently available via the [JSON file](?tabs=json#pronunciation-assessment-results) or [Speech SDK](how-to-pronunciation-assessment.md).
2323
- At the phoneme level, pronunciation assessment provides accuracy scores of each phoneme, helping learners to better understand the pronunciation details of their speech.
2424

2525
This article describes how to use the pronunciation assessment tool through the [Speech Studio](https://speech.microsoft.com). You can get immediate feedback on the accuracy and fluency of your speech without writing any code. For information about how to integrate pronunciation assessment in your speech applications, see [How to use pronunciation assessment](how-to-pronunciation-assessment.md).
@@ -56,27 +56,12 @@ Follow these steps to assess your pronunciation of the reference text:
5656

5757
:::image type="content" source="media/pronunciation-assessment/pa-upload.png" alt-text="Screenshot of uploading recorded audio to be assessed.":::
5858

59-
6059
## Pronunciation assessment results
6160

6261
Once you've recorded the reference text or uploaded the recorded audio, the **Assessment result** will be output. The result includes your spoken audio and the feedback on the accuracy and fluency of spoken audio, by comparing a machine generated transcript of the input audio with the reference text. You can listen to your spoken audio, and download it if necessary.
6362

6463
You can also check the pronunciation assessment result in JSON. The word-level, syllable-level, and phoneme-level accuracy scores are included in the JSON file.
6564

66-
### Overall scores
67-
68-
Pronunciation Assessment evaluates three aspects of pronunciation: accuracy, fluency, and completeness. At the bottom of **Assessment result**, you can see **Pronunciation score**, **Accuracy score**, **Fluency score**, and **Completeness score**. The **Accuracy score** and the **Fluency score** will vary over time throughout the recording process. The **Completeness score** is only calculated at the end of the evaluation. The **Pronunciation score** is overall score indicating the pronunciation quality of the given speech. During recording, the **Pronunciation score** is aggregated from **Accuracy score** and **Fluency score** with weight. Once completing recording, this overall score is aggregated from **Accuracy score**, **Fluency score**, and **Completeness score** with weight.
69-
70-
**During recording**
71-
72-
:::image type="content" source="media/pronunciation-assessment/pa-recording-display-score.png" alt-text="Screenshot of overall assessment scores when recording." lightbox="media/pronunciation-assessment/pa-recording-display-score.png":::
73-
74-
**Completing recording**
75-
76-
:::image type="content" source="media/pronunciation-assessment/pa-after-recording-display-score.png" alt-text="Screenshot of overall assessment scores after recording." lightbox="media/pronunciation-assessment/pa-after-recording-display-score.png":::
77-
78-
### Scores within words
79-
8065
### [Display](#tab/display)
8166

8267
The complete transcription is shown in the **Display** window. If a word is omitted, inserted, or mispronounced compared to the reference text, the word will be highlighted according to the error type. While hovering over each word, you can see accuracy scores for the whole word or specific phonemes.
@@ -233,7 +218,31 @@ The complete transcription is shown in the `text` attribute. You can see accurac
233218

234219
---
235220

221+
### Assessment scores in streaming mode
222+
223+
Pronunciation Assessment supports uninterrupted streaming mode. The demo on the Speech Studio supports up to 60 minutes of recording in streaming mode for evaluation. The Speech Studio demo allows for up to 60 minutes of recording in streaming mode for evaluation. As long as you do not press the stop recording button, the evaluation process does not finish and you can pause and resume evaluation conveniently.
224+
225+
Pronunciation Assessment evaluates three aspects of pronunciation: accuracy, fluency, and completeness. At the bottom of **Assessment result**, you can see **Pronunciation score** as aggregated overall score which includes 3 sub aspects: **Accuracy score**, **Fluency score**, and **Completeness score**. In streaming mode, since the **Accuracy score**, **Fluency score and Completeness score** will vary over time throughout the recording process, we demonstrate an approach on Speech Studio to display approximate overall score incrementally before the end of the evaluation, which weighted only with Accuracy score and Fluency score. The **Completeness score** is only calculated at the end of the evaluation after you press the stop button, so the final overall score is aggregated from **Accuracy score**, **Fluency score**, and **Completeness score** with weight.
236226

227+
Refer to the demo examples below for the whole process of evaluating pronunciation in streaming mode.
228+
229+
**Start recording**
230+
231+
As you start recording, the scores at the bottom begin to alter from 0.
232+
233+
:::image type="content" source="media/pronunciation-assessment/initial-recording.png" alt-text="Screenshot of overall assessment scores when starting to record." lightbox="media/pronunciation-assessment/initial-recording.png":::
234+
235+
**During recording**
236+
237+
During recording a long paragraph, you can pause recording at any time. You can continue to evaluate your recording as long as you don't press the stop button.
238+
239+
:::image type="content" source="media/pronunciation-assessment/pa-recording-display-score.png" alt-text="Screenshot of overall assessment scores when recording." lightbox="media/pronunciation-assessment/pa-recording-display-score.png":::
240+
241+
**Finish recording**
242+
243+
After you press the stop button, you can see **Pronunciation score**, **Accuracy score**, **Fluency score**, and **Completeness score** at the bottom.
244+
245+
:::image type="content" source="media/pronunciation-assessment/pa-after-recording-display-score.png" alt-text="Screenshot of overall assessment scores after recording." lightbox="media/pronunciation-assessment/pa-after-recording-display-score.png":::
237246

238247
## Next steps
239248

0 commit comments

Comments
 (0)