Skip to content

Commit f39758a

Browse files
Update how-to-pronunciation-assessment.md
1 parent 861a865 commit f39758a

File tree

1 file changed

+64
-60
lines changed

1 file changed

+64
-60
lines changed

articles/ai-services/speech-service/how-to-pronunciation-assessment.md

Lines changed: 64 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ zone_pivot_groups: programming-languages-ai-services
2323

2424
In this article, you learn how to evaluate pronunciation with speech to text through the Speech SDK. Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio.
2525

26+
> [!NOTE]
27+
> Pronunciation assessment uses a specific version of the speech-to-text model, different from the standard speech to text model, to ensure consistent and accurate pronunciation assessment.
28+
2629
## Use pronunciation assessment in streaming mode
2730

2831
Pronunciation assessment supports uninterrupted streaming mode. The recording time can be unlimited through the Speech SDK. As long as you don't stop recording, the evaluation process doesn't finish and you can pause and resume evaluation conveniently.
@@ -77,6 +80,55 @@ For how to use Pronunciation Assessment in streaming mode in your own applicatio
7780

7881
::: zone-end
7982

83+
### Continuous recognition
84+
85+
::: zone pivot="programming-language-csharp"
86+
87+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs) under the function `PronunciationAssessmentContinuousWithFile`.
88+
89+
::: zone-end
90+
91+
::: zone pivot="programming-language-cpp"
92+
93+
If your audio file exceeds 30 seconds, use continuous mode for processing.
94+
95+
::: zone-end
96+
97+
::: zone pivot="programming-language-java"
98+
99+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java) under the function `pronunciationAssessmentContinuousWithFile`.
100+
101+
::: zone-end
102+
103+
::: zone pivot="programming-language-python"
104+
105+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/261160e26dfcae4c3aee93308d58d74e36739b6f/samples/python/console/speech_sample.py) under the function `pronunciation_assessment_continuous_from_file`.
106+
107+
::: zone-end
108+
109+
::: zone pivot="programming-language-javascript"
110+
111+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/261160e26dfcae4c3aee93308d58d74e36739b6f/samples/js/node/pronunciationAssessmentContinue.js).
112+
113+
::: zone-end
114+
115+
::: zone pivot="programming-language-objectivec"
116+
117+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/objective-c/ios/speech-samples/speech-samples/ViewController.m) under the function `pronunciationAssessFromFile`.
118+
119+
::: zone-end
120+
121+
::: zone pivot="programming-language-swift"
122+
123+
If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/swift/ios/speech-samples/speech-samples/ViewController.swift) under the function `continuousPronunciationAssessment`.
124+
125+
::: zone-end
126+
127+
::: zone pivot="programming-language-go"
128+
129+
::: zone-end
130+
131+
80132
## Set configuration parameters
81133

82134
::: zone pivot="programming-language-go"
@@ -262,6 +314,8 @@ This table lists some of the optional methods you can set for the `Pronunciation
262314
> Content and prosody assessments are only available in the [en-US](./language-support.md?tabs=pronunciation-assessment) locale.
263315
>
264316
> To explore the content and prosody assessments, upgrade to the SDK version 1.35.0 or later.
317+
>
318+
> There is no length limit for the topic parameter.
265319
266320
| Method | Description |
267321
|-----------|-------------|
@@ -1029,71 +1083,21 @@ pronunciationAssessmentConfig?.nbestPhonemeCount = 5
10291083

10301084
::: zone-end
10311085

1032-
## Other tips on configuration and SDK usage
1086+
## Pronunciation score calculation
10331087

1034-
- Pronunciation assessment uses a fixed version of the speech to text model, different from the standard speech to text model.
1035-
- Pronunciation scores are calculated by weighting accuracy, prosody, fluency, and completeness scores based on specific formulas for reading and speaking scenarios.
1088+
Pronunciation scores are calculated by weighting accuracy, prosody, fluency, and completeness scores based on specific formulas for reading and speaking scenarios.
10361089

1037-
When sorting the scores of accuracy, prosody, fluency, and completeness from low to high (if each score is available) and representing the lowest score to the highest score as s0 to s3, the pronunciation score is calculated as follows:
1038-
1039-
For reading scenario:
1040-
- With prosody score: PronScore = 0.4 * s0 + 0.2 * s1 + 0.2 * s2 + 0.2 * s3
1041-
- Without prosody score: PronScore = 0.6 * s0 + 0.2 * s1 + 0.2 * s2
1042-
1043-
For the speaking scenario (the completeness score isn't applicable):
1044-
- With prosody score: PronScore = 0.6 * s0 + 0.2 * s1 + 0.2 * s2
1045-
- Without prosody score: PronScore = 0.6 * s0 + 0.4 * s1
1046-
1047-
This formula provides a weighted calculation based on the importance of each score, ensuring a comprehensive evaluation of pronunciation.
1048-
1049-
- Currently, only `en-US` is supported for topics in pronunciation assessment. There is no length limit for the topic parameter.
1050-
### Continuous recognition
1051-
::: zone pivot="programming-language-csharp"
1052-
1053-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs) under the function `PronunciationAssessmentContinuousWithFile`.
1054-
1055-
::: zone-end
1056-
1057-
::: zone pivot="programming-language-cpp"
1058-
1059-
- If your audio file exceeds 30 seconds, use continuous mode for processing.
1060-
1061-
::: zone-end
1062-
1063-
::: zone pivot="programming-language-java"
1064-
1065-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java) under the function `pronunciationAssessmentContinuousWithFile`.
1066-
1067-
::: zone-end
1068-
1069-
::: zone pivot="programming-language-python"
1070-
1071-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/261160e26dfcae4c3aee93308d58d74e36739b6f/samples/python/console/speech_sample.py) under the function `pronunciation_assessment_continuous_from_file`.
1072-
1073-
::: zone-end
1074-
1075-
::: zone pivot="programming-language-javascript"
1076-
1077-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/261160e26dfcae4c3aee93308d58d74e36739b6f/samples/js/node/pronunciationAssessmentContinue.js).
1078-
1079-
::: zone-end
1080-
1081-
::: zone pivot="programming-language-objectivec"
1090+
When sorting the scores of accuracy, prosody, fluency, and completeness from low to high (if each score is available) and representing the lowest score to the highest score as s0 to s3, the pronunciation score is calculated as follows:
10821091

1083-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/objective-c/ios/speech-samples/speech-samples/ViewController.m) under the function `pronunciationAssessFromFile`.
1092+
For reading scenario:
1093+
- With prosody score: PronScore = 0.4 * s0 + 0.2 * s1 + 0.2 * s2 + 0.2 * s3
1094+
- Without prosody score: PronScore = 0.6 * s0 + 0.2 * s1 + 0.2 * s2
10841095

1085-
::: zone-end
1086-
1087-
::: zone pivot="programming-language-swift"
1088-
1089-
- If your audio file exceeds 30 seconds, use continuous mode for processing. The sample code for continuous mode can be found on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/swift/ios/speech-samples/speech-samples/ViewController.swift) under the function `continuousPronunciationAssessment`.
1090-
1091-
::: zone-end
1092-
1093-
::: zone pivot="programming-language-go"
1094-
1095-
::: zone-end
1096+
For the speaking scenario (the completeness score isn't applicable):
1097+
- With prosody score: PronScore = 0.6 * s0 + 0.2 * s1 + 0.2 * s2
1098+
- Without prosody score: PronScore = 0.6 * s0 + 0.4 * s1
10961099

1100+
This formula provides a weighted calculation based on the importance of each score, ensuring a comprehensive evaluation of pronunciation.
10971101

10981102
## Related content
10991103

0 commit comments

Comments
 (0)