You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-transcription.md
+34Lines changed: 34 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,6 +97,40 @@ Polling for transcription status may not be the most performant, or provide the
97
97
98
98
For more details, see [Webhooks](webhooks.md).
99
99
100
+
## Speaker Separation (Diarization)
101
+
102
+
Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports Diarization and is capable of recognizing 2 speakers on mono channel recordings.
103
+
104
+
To request that your audio transcription request is processed for diarization you simply have to add the relevant parameter in the HTTP request as shown below.
105
+
106
+
```json
107
+
{
108
+
"recordingsUrl": "<URL to the Azure blob to transcribe>",
109
+
"models": [{"Id":"<optional acoustic model ID>"},{"Id":"<optional language model ID>"}],
110
+
"locale": "<locale to us, for example en-US>",
111
+
"name": "<user defined name of the transcription batch>",
112
+
"description": "<optional description of the transcription>",
113
+
"properties": {
114
+
"AddWordLevelTimestamps" : "True",
115
+
"AddDiarization" : "True"
116
+
}
117
+
}
118
+
```
119
+
120
+
Note that world level timestamps would also have to be 'turned on' as the parameters in the above request indicate.
121
+
122
+
The corresponding audio will contain the speakers identified by a number (currently we support only 2 voices, so the speakers will be identified as 'Speaker 1 'and 'Speaker 2') followed by the transcription output.
123
+
124
+
Also note that Diarization is not available in Stereo recordings. Furthermore, all JSON output will contain the Speaker tag. If diarization is not used it will simply show as 'Speaker: Null'
125
+
126
+
Supported locales are listed below.
127
+
128
+
| Language | locale |
129
+
|--------|-------|
130
+
| English | en-US |
131
+
| Chinese | zh-CN |
132
+
| Deutsch | de-DE |
133
+
100
134
## Sentiment
101
135
102
136
Sentiment is a new feature in Batch Transcription API and is an important feature in the call center domain. Customers can use the `AddSentiment` parameters to their requests to
0 commit comments