You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/conversation-transcription.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: What is Conversation Transcription (Preview) - Speech Service
3
3
titleSuffix: Azure Cognitive Services
4
-
description: Conversation Transcription is a speech-to-text solution that combines speech recognition, speaker identification, and sentence attribution to each speaker (also known as diarization) to provide real-time and/or asynchronous transcription of any conversation. Conversation Transcription makes conversations inclusive for everyone, such as participants who are deaf and hard of hearing.
4
+
description: Conversation Transcription is a speech-to-text solution that combines speech recognition, speaker identification, and sentence attribution to each speaker (also known as diarization) to provide real-time and/or asynchronous transcription of any conversation.
title: Human-labeled transcriptions guidelines - Speech Service
3
3
titleSuffix: Azure Cognitive Services
4
-
description: "If you're looking to improve recognition accuracy, especially issues that are caused when words are deleted or incorrectly substituted, you'll want to use human-labeled transcriptions along with your audio data. What are human-labeled transcriptions? That's easy, they're word-by-word, verbatim transcriptions of an audio file."
4
+
description: To improve speech recognition accuracy, such as when words are deleted or incorrectly substituted, you can use human-labeled transcriptions along with your audio data. Human-labeled transcriptions are word-by-word, verbatim transcriptions of an audio file.
5
5
services: cognitive-services
6
6
author: erhopf
7
7
manager: nitinme
8
+
8
9
ms.service: cognitive-services
9
10
ms.subservice: speech-service
10
11
ms.topic: conceptual
@@ -25,7 +26,7 @@ Human-labeled transcriptions for English audio must be provided as plain text, o
25
26
Here are a few examples:
26
27
27
28
| Characters to avoid | Substitution | Notes |
28
-
|---------------------|--------------|-------|
29
+
|-------------------|------------|-----|
29
30
| “Hello world” | "Hello world" | The opening and closing quotations marks have been substituted with appropriate ASCII characters. |
30
31
| John’s day | John's day | The apostrophe has been substituted with the appropriate ASCII character. |
31
32
| it was good—no, it was great! | it was good--no, it was great! | The em dash was substituted with two hyphens. |
@@ -34,88 +35,88 @@ Here are a few examples:
34
35
35
36
Text normalization is the transformation of words into a consistent format used when training a model. Some normalization rules are applied to text automatically, however, we recommend using these guidelines as you prepare your human-labeled transcription data:
36
37
37
-
* Write out abbreviations in words.
38
-
* Write out non-standard numeric strings in words (such as accounting terms).
39
-
* Non-alphabetic characters or mixed alphanumeric characters should be transcribed as pronounced.
40
-
* Abbreviations that are pronounced as words shouldn't be edited (such as "radar", "laser", "RAM", or "NATO").
41
-
* Write out abbreviations that are pronounced as separate letters with each letter separated by a space.
38
+
- Write out abbreviations in words.
39
+
- Write out non-standard numeric strings in words (such as accounting terms).
40
+
- Non-alphabetic characters or mixed alphanumeric characters should be transcribed as pronounced.
41
+
- Abbreviations that are pronounced as words shouldn't be edited (such as "radar", "laser", "RAM", or "NATO").
42
+
- Write out abbreviations that are pronounced as separate letters with each letter separated by a space.
42
43
43
44
Here are a few examples of normalization that you should perform on the transcription:
44
45
45
-
| Original text | Text after normalization |
46
-
|---------------|--------------------------|
47
-
| Dr. Bruce Banner | Doctor Bruce Banner |
48
-
| James Bond, 007 | James Bond, double oh seven |
49
-
| Ke$ha | Kesha |
50
-
| How long is the 2x4 | How long is the two by four |
| "Holy cow!" said Batman. | holy cow said batman|
68
69
| "What?" said Batman's sidekick, Robin. | what said batman's sidekick robin |
69
-
| Go get -em! | go get em |
70
-
| I'm double-jointed | I'm double jointed |
71
-
| 104 Elm Street | one oh four Elm street |
72
-
| Tune to 102.7 | tune to one oh two point seven |
73
-
| Pi is about 3.14 | pi is about three point one four |
74
-
It costs $3.14| it costs three fourteen |
70
+
| Go get -em! | go get em|
71
+
| I'm double-jointed | I'm double jointed|
72
+
| 104 Elm Street | one oh four Elm street|
73
+
| Tune to 102.7 | tune to one oh two point seven|
74
+
| Pi is about 3.14 | pi is about three point one four|
75
+
|It costs \$3.14| it costs three fourteen|
75
76
76
77
## Mandarin Chinese (zh-CN)
77
78
78
79
Human-labeled transcriptions for Mandarin Chinese audio must be UTF-8 encoded with a byte-order marker. Avoid the use of half-width punctuation characters. These characters can be included inadvertently when you prepare the data in a word-processing program or scrape data from web pages. If these characters are present, make sure to update them with the appropriate full-width substitution.
79
80
80
81
Here are a few examples:
81
82
82
-
| Characters to avoid | Substitution | Notes |
83
-
|---------------------|--------------|-------|
83
+
| Characters to avoid | Substitution | Notes |
84
+
|-------------------|--------------|-----|
84
85
| "你好" | "你好" | The opening and closing quotations marks have been substituted with appropriate characters. |
85
-
| 需要什么帮助? | 需要什么帮助?| The question mark has been substituted with appropriate character. |
86
+
| 需要什么帮助? | 需要什么帮助?| The question mark has been substituted with appropriate character. |
86
87
87
88
### Text normalization for Mandarin Chinese
88
89
89
90
Text normalization is the transformation of words into a consistent format used when training a model. Some normalization rules are applied to text automatically, however, we recommend using these guidelines as you prepare your human-labeled transcription data:
90
91
91
-
* Write out abbreviations in words.
92
-
* Write out numeric strings in spoken form.
92
+
- Write out abbreviations in words.
93
+
- Write out numeric strings in spoken form.
93
94
94
95
Here are a few examples of normalization that you should perform on the transcription:
95
96
96
97
| Original text | Text after normalization |
97
-
|---------------|--------------------------|
98
-
|我今年21| 我今年二十一 |
99
-
|3号楼504| 三号 楼 五 零 四 |
98
+
|-------------|------------------------|
99
+
|我今年 21| 我今年二十一 |
100
+
|3 号楼 504| 三号 楼 五 零 四 |
100
101
101
102
The following normalization rules are automatically applied to transcriptions:
102
103
103
-
* Remove all punctuation
104
-
* Expand numbers to spoken form
105
-
* Convert full-width letters to half-width letters
106
-
* Using uppercase letters for all English words
104
+
- Remove all punctuation
105
+
- Expand numbers to spoken form
106
+
- Convert full-width letters to half-width letters
107
+
- Using uppercase letters for all English words
107
108
108
109
Here are a few examples of normalization automatically performed on the transcription:
109
110
110
111
| Original text | Text after normalization |
111
-
|---------------|--------------------------|
112
+
|-------------|------------------------|
112
113
| 3.1415 | 三 点 一 四 一 五 |
113
-
| ¥3.5 | 三 元 五 角 |
114
-
| w f y z |W F Y Z |
115
-
|1992年8月8日| 一 九 九 二 年 八 月 八 日 |
114
+
| ¥3.5 | 三 元 五 角 |
115
+
| w f y z |W F Y Z |
116
+
|1992 年 8 月 8 日| 一 九 九 二 年 八 月 八 日 |
116
117
| 你吃饭了吗? | 你 吃饭 了 吗 |
117
-
|下午5:00的航班| 下午 五点 的 航班 |
118
-
|我今年21岁| 我 今年 二十 一 岁 |
118
+
|下午 5:00 的航班| 下午 五点 的 航班 |
119
+
|我今年 21 岁| 我 今年 二十 一 岁 |
119
120
120
121
## German (de-DE) and other languages
121
122
@@ -125,42 +126,42 @@ Human-labeled transcriptions for German audio (and other non-English or Mandarin
125
126
126
127
Text normalization is the transformation of words into a consistent format used when training a model. Some normalization rules are applied to text automatically, however, we recommend using these guidelines as you prepare your human-labeled transcription data:
127
128
128
-
*Write decimal points as "," and not ".".
129
-
*Write time separators as ":" and not "." (for example: 12:00 Uhr).
130
-
*Abbreviations such as "ca." aren't replaced. We recommend that you use the full spoken form.
131
-
*The four main mathematical operators (+, -, \*, and /) are removed. We recommend replacing them with the written form: "plus," "minus," "mal," and "geteilt."
132
-
*Comparison operators are removed (=, <, and >). We recommend replacing them with "gleich," "kleiner als," and "grösser als."
133
-
*Write fractions, such as 3/4, in written form (for example: "drei viertel" instead of 3/4).
134
-
*Replace the "€" symbol with its written form "Euro."
129
+
-Write decimal points as "," and not ".".
130
+
-Write time separators as ":" and not "." (for example: 12:00 Uhr).
131
+
-Abbreviations such as "ca." aren't replaced. We recommend that you use the full spoken form.
132
+
-The four main mathematical operators (+, -, \*, and /) are removed. We recommend replacing them with the written form: "plus," "minus," "mal," and "geteilt."
133
+
-Comparison operators are removed (=, <, and >). We recommend replacing them with "gleich," "kleiner als," and "grösser als."
134
+
-Write fractions, such as 3/4, in written form (for example: "drei viertel" instead of 3/4).
135
+
-Replace the "€" symbol with its written form "Euro."
135
136
136
137
Here are a few examples of normalization that you should perform on the transcription:
137
138
138
-
| Original text | Text after user normalization | Text after system normalization |
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-inspect-data.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: "Inspect data quality for Custom Speech - Speech Service"
2
+
title: Inspect data quality for Custom Speech - Speech Service
3
3
titleSuffix: Azure Cognitive Services
4
-
description: "Custom Speech provides tools that allow you to visually inspect the recognition quality of a model by comparing audio data with the corresponding recognition result. From the Custom Speech portal, you can play back uploaded audio and determine if the provided recognition result is correct. This tool allows you to quickly inspect quality of our baseline speech-to-text model or a trained custom model without having to transcribe any audio data."
4
+
description: Custom Speech provides tools that allow you to visually inspect the recognition quality of a model by comparing audio data with the corresponding recognition result. You can play back uploaded audio and determine if the provided recognition result is correct.
5
5
services: cognitive-services
6
6
author: erhopf
7
7
manager: nitinme
@@ -38,18 +38,18 @@ After a test has been successfully created, you can compare the models side by s
38
38
39
39
## Side-by-side model comparisons
40
40
41
-
When the test status is *Succeeded*, click in the test item name to see details of the test. This detail page lists all the utterances in your dataset, indicating the recognition results of the two models alongside the transcription from the submitted dataset.
41
+
When the test status is _Succeeded_, click in the test item name to see details of the test. This detail page lists all the utterances in your dataset, indicating the recognition results of the two models alongside the transcription from the submitted dataset.
42
42
43
43
To help inspect the side-by-side comparison, you can toggle various error types including insertion, deletion, and substitution. By listening to the audio and comparing recognition results in each column (showing human-labeled transcription and the results of two speech-to-text models), you can decide which model meets your needs and where improvements are needed.
44
44
45
-
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in [Evaluate Accuracy](how-to-custom-speech-evaluate-data.md).
45
+
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in [Evaluate Accuracy](how-to-custom-speech-evaluate-data.md).
46
46
47
47
## Next steps
48
48
49
-
*[Evaluate your data](how-to-custom-speech-evaluate-data.md)
50
-
*[Train your model](how-to-custom-speech-train-model.md)
51
-
*[Deploy your model](how-to-custom-speech-deploy-model.md)
49
+
-[Evaluate your data](how-to-custom-speech-evaluate-data.md)
50
+
-[Train your model](how-to-custom-speech-train-model.md)
51
+
-[Deploy your model](how-to-custom-speech-deploy-model.md)
52
52
53
53
## Additional resources
54
54
55
-
*[Prepare test data for Custom Speech](how-to-custom-speech-test-data.md)
55
+
-[Prepare test data for Custom Speech](how-to-custom-speech-test-data.md)
0 commit comments