You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-pronunciation-assessment.md
+47-1Lines changed: 47 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -678,7 +678,53 @@ Pronunciation assessment results for the spoken word "hello" are shown as a JSON
678
678
}
679
679
```
680
680
681
+
## Pronunciation assessment in streaming mode
682
+
683
+
Pronunciation assessment supports uninterrupted streaming mode. The recording time can be unlimited through the Speech SDK. As long as you don't stop recording, the evaluation process doesn't finish and you can pause and resume evaluation conveniently. In streaming mode, the `AccuracyScore`, `FluencyScore` , and `CompletenessScore` will vary over time throughout the recording and evaluation process.
684
+
685
+
::: zone pivot="programming-language-csharp"
686
+
687
+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#:~:text=PronunciationAssessmentWithStream).
688
+
689
+
::: zone-end
690
+
691
+
::: zone pivot="programming-language-cpp"
692
+
693
+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#:~:text=PronunciationAssessmentWithStream).
694
+
695
+
::: zone-end
696
+
697
+
::: zone pivot="programming-language-java"
698
+
699
+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/android/sdkdemo/app/src/main/java/com/microsoft/cognitiveservices/speech/samples/sdkdemo/MainActivity.java#L548).
700
+
701
+
::: zone-end
702
+
703
+
::: zone pivot="programming-language-python"
704
+
705
+
706
+
::: zone-end
707
+
708
+
::: zone pivot="programming-language-javascript"
709
+
710
+
For how to use Pronunciation Assessment in streaming mode in your own application, see [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/node/pronunciationAssessment.js).
- Try out [pronunciation assessment in Speech Studio](pronunciation-assessment-tool.md)
684
-
-Try out the [pronunciation assessment demo](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/BrowserJS) and watch the [video tutorial](https://www.youtube.com/watch?v=zFlwm7N4Awc) of pronunciation assessment.
730
+
-Check out easy-to-deploy Pronunciation Assessment [demo](https://github.com/Azure-Samples/Cognitive-Speech-TTS/tree/master/PronunciationAssessment/BrowserJS) and watch the [video tutorial](https://www.youtube.com/watch?v=zFlwm7N4Awc) of pronunciation assessment.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/pronunciation-assessment-tool.md
+25-16Lines changed: 25 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Pronunciation assessment uses the Speech-to-Text capability to provide subjectiv
19
19
Pronunciation assessment provides various assessment results in different granularities, from individual phonemes to the entire text input.
20
20
- At the full-text level, pronunciation assessment offers additional Fluency and Completeness scores: Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words, and Completeness indicates how many words are pronounced in the speech to the reference text input. An overall score aggregated from Accuracy, Fluency and Completeness is then given to indicate the overall pronunciation quality of the given speech.
21
21
- At the word-level, pronunciation assessment can automatically detect miscues and provide accuracy score simultaneously, which provides more detailed information on omission, repetition, insertions, and mispronunciation in the given speech.
22
-
- Syllable-level accuracy scores are currently only available via the [JSON file](?tabs=json#scores-within-words) or [Speech SDK](how-to-pronunciation-assessment.md).
22
+
- Syllable-level accuracy scores are currently available via the [JSON file](?tabs=json#pronunciation-assessment-results) or [Speech SDK](how-to-pronunciation-assessment.md).
23
23
- At the phoneme level, pronunciation assessment provides accuracy scores of each phoneme, helping learners to better understand the pronunciation details of their speech.
24
24
25
25
This article describes how to use the pronunciation assessment tool through the [Speech Studio](https://speech.microsoft.com). You can get immediate feedback on the accuracy and fluency of your speech without writing any code. For information about how to integrate pronunciation assessment in your speech applications, see [How to use pronunciation assessment](how-to-pronunciation-assessment.md).
@@ -56,27 +56,12 @@ Follow these steps to assess your pronunciation of the reference text:
56
56
57
57
:::image type="content" source="media/pronunciation-assessment/pa-upload.png" alt-text="Screenshot of uploading recorded audio to be assessed.":::
58
58
59
-
60
59
## Pronunciation assessment results
61
60
62
61
Once you've recorded the reference text or uploaded the recorded audio, the **Assessment result** will be output. The result includes your spoken audio and the feedback on the accuracy and fluency of spoken audio, by comparing a machine generated transcript of the input audio with the reference text. You can listen to your spoken audio, and download it if necessary.
63
62
64
63
You can also check the pronunciation assessment result in JSON. The word-level, syllable-level, and phoneme-level accuracy scores are included in the JSON file.
65
64
66
-
### Overall scores
67
-
68
-
Pronunciation Assessment evaluates three aspects of pronunciation: accuracy, fluency, and completeness. At the bottom of **Assessment result**, you can see **Pronunciation score**, **Accuracy score**, **Fluency score**, and **Completeness score**. The **Accuracy score** and the **Fluency score** will vary over time throughout the recording process. The **Completeness score** is only calculated at the end of the evaluation. The **Pronunciation score** is overall score indicating the pronunciation quality of the given speech. During recording, the **Pronunciation score** is aggregated from **Accuracy score** and **Fluency score** with weight. Once completing recording, this overall score is aggregated from **Accuracy score**, **Fluency score**, and **Completeness score** with weight.
69
-
70
-
**During recording**
71
-
72
-
:::image type="content" source="media/pronunciation-assessment/pa-recording-display-score.png" alt-text="Screenshot of overall assessment scores when recording." lightbox="media/pronunciation-assessment/pa-recording-display-score.png":::
73
-
74
-
**Completing recording**
75
-
76
-
:::image type="content" source="media/pronunciation-assessment/pa-after-recording-display-score.png" alt-text="Screenshot of overall assessment scores after recording." lightbox="media/pronunciation-assessment/pa-after-recording-display-score.png":::
77
-
78
-
### Scores within words
79
-
80
65
### [Display](#tab/display)
81
66
82
67
The complete transcription is shown in the **Display** window. If a word is omitted, inserted, or mispronounced compared to the reference text, the word will be highlighted according to the error type. While hovering over each word, you can see accuracy scores for the whole word or specific phonemes.
@@ -233,7 +218,31 @@ The complete transcription is shown in the `text` attribute. You can see accurac
233
218
234
219
---
235
220
221
+
### Assessment scores in streaming mode
222
+
223
+
Pronunciation Assessment supports uninterrupted streaming mode. The demo on the Speech Studio supports up to 60 minutes of recording in streaming mode for evaluation. The Speech Studio demo allows for up to 60 minutes of recording in streaming mode for evaluation. As long as you do not press the stop recording button, the evaluation process does not finish and you can pause and resume evaluation conveniently.
224
+
225
+
Pronunciation Assessment evaluates three aspects of pronunciation: accuracy, fluency, and completeness. At the bottom of **Assessment result**, you can see **Pronunciation score** as aggregated overall score which includes 3 sub aspects: **Accuracy score**, **Fluency score**, and **Completeness score**. In streaming mode, since the **Accuracy score**, **Fluency score and Completeness score** will vary over time throughout the recording process, we demonstrate an approach on Speech Studio to display approximate overall score incrementally before the end of the evaluation, which weighted only with Accuracy score and Fluency score. The **Completeness score** is only calculated at the end of the evaluation after you press the stop button, so the final overall score is aggregated from **Accuracy score**, **Fluency score**, and **Completeness score** with weight.
236
226
227
+
Refer to the demo examples below for the whole process of evaluating pronunciation in streaming mode.
228
+
229
+
**Start recording**
230
+
231
+
As you start recording, the scores at the bottom begin to alter from 0.
232
+
233
+
:::image type="content" source="media/pronunciation-assessment/initial-recording.png" alt-text="Screenshot of overall assessment scores when starting to record." lightbox="media/pronunciation-assessment/initial-recording.png":::
234
+
235
+
**During recording**
236
+
237
+
During recording a long paragraph, you can pause recording at any time. You can continue to evaluate your recording as long as you don't press the stop button.
238
+
239
+
:::image type="content" source="media/pronunciation-assessment/pa-recording-display-score.png" alt-text="Screenshot of overall assessment scores when recording." lightbox="media/pronunciation-assessment/pa-recording-display-score.png":::
240
+
241
+
**Finish recording**
242
+
243
+
After you press the stop button, you can see **Pronunciation score**, **Accuracy score**, **Fluency score**, and **Completeness score** at the bottom.
244
+
245
+
:::image type="content" source="media/pronunciation-assessment/pa-after-recording-display-score.png" alt-text="Screenshot of overall assessment scores after recording." lightbox="media/pronunciation-assessment/pa-after-recording-display-score.png":::
0 commit comments