Skip to content

Commit 5650f00

Browse files
author
Jill Grant
authored
Merge pull request #19 from eric-urban/eur/real-time-stt-diarization
intermediate speaker ID
2 parents dd37764 + 58615c3 commit 5650f00

File tree

12 files changed

+570
-54
lines changed

12 files changed

+570
-54
lines changed

articles/ai-services/speech-service/get-started-stt-diarization.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: quickstart
10-
ms.date: 01/30/2024
10+
ms.date: 9/18/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-services
1313
keywords: speech to text, speech to text software
14-
#customer intent: As a developer, I want to create speech to text applications that use diarization to improve readability of multiple person conversations.
14+
#customer intent: As a developer, I want to create speech to text applications that use diarization to identify speakers in multiple person conversations.
1515
---
1616

1717
# Quickstart: Create real-time diarization

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cli.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 08/16/2023
5+
ms.date: 9/18/2024
66
ms.author: eur
77
---
88

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cpp.md

Lines changed: 165 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 01/30/2024
5+
ms.date: 9/18/2024
66
ms.author: eur
77
---
88

@@ -60,6 +60,7 @@ Follow these steps to create a console application and install the Speech SDK.
6060
}
6161

6262
auto speechConfig = SpeechConfig::FromSubscription(speechKey, speechRegion);
63+
speechConfig->SetProperty(PropertyId::SpeechServiceResponse_DiarizeIntermediateResults, "true");
6364

6465
speechConfig->SetSpeechRecognitionLanguage("en-US");
6566

@@ -73,14 +74,15 @@ Follow these steps to create a console application and install the Speech SDK.
7374
conversationTranscriber->Transcribing.Connect([](const ConversationTranscriptionEventArgs& e)
7475
{
7576
std::cout << "TRANSCRIBING:" << e.Result->Text << std::endl;
77+
std::cout << "Speaker ID=" << e.Result->SpeakerId << std::endl;
7678
});
7779

7880
conversationTranscriber->Transcribed.Connect([](const ConversationTranscriptionEventArgs& e)
7981
{
8082
if (e.Result->Reason == ResultReason::RecognizedSpeech)
8183
{
82-
std::cout << "TRANSCRIBED: Text=" << e.Result->Text << std::endl;
83-
std::cout << "Speaker ID=" << e.Result->SpeakerId << std::endl;
84+
std::cout << "\n" << "TRANSCRIBED: Text=" << e.Result->Text << std::endl;
85+
std::cout << "Speaker ID=" << e.Result->SpeakerId << "\n" << std::endl;
8486
}
8587
else if (e.Result->Reason == ResultReason::NoMatch)
8688
{
@@ -152,18 +154,170 @@ Follow these steps to create a console application and install the Speech SDK.
152154
The transcribed conversation should be output as text:
153155

154156
```output
155-
TRANSCRIBED: Text=Good morning, Steve. Speaker ID=Unknown
156-
TRANSCRIBED: Text=Good morning. Katie. Speaker ID=Unknown
157-
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1
158-
TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization result until whole audio get processed. Speaker ID=Guest-2
159-
TRANSRIBED: Text=Is the new feature can diarize in real time? Speaker ID=Guest-2
160-
TRANSCRIBED: Text=Absolutely. Speaker ID=GUEST-1
161-
TRANSCRIBED: Text=That's exciting. Let me try it right now. Speaker ID=GUEST-2
162-
CANCELED: Reason=EndOfStream
157+
TRANSCRIBING:good morning
158+
Speaker ID=Unknown
159+
TRANSCRIBING:good morning steve
160+
Speaker ID=Unknown
161+
TRANSCRIBING:good morning steve how are you doing
162+
Speaker ID=Guest-1
163+
TRANSCRIBING:good morning steve how are you doing today
164+
Speaker ID=Guest-1
165+
166+
TRANSCRIBED: Text=Good morning, Steve. How are you doing today?
167+
Speaker ID=Guest-1
168+
169+
TRANSCRIBING:good
170+
Speaker ID=Unknown
171+
TRANSCRIBING:good morning
172+
Speaker ID=Unknown
173+
TRANSCRIBING:good morning kat
174+
Speaker ID=Unknown
175+
TRANSCRIBING:good morning katie i hope you're having a
176+
Speaker ID=Guest-2
177+
TRANSCRIBING:good morning katie i hope you're having a great start to your day
178+
Speaker ID=Guest-2
179+
180+
TRANSCRIBED: Text=Good morning, Katie. I hope you're having a great start to your day.
181+
Speaker ID=Guest-2
182+
183+
TRANSCRIBING:have you
184+
Speaker ID=Unknown
185+
TRANSCRIBING:have you tried
186+
Speaker ID=Unknown
187+
TRANSCRIBING:have you tried the latest
188+
Speaker ID=Unknown
189+
TRANSCRIBING:have you tried the latest real
190+
Speaker ID=Guest-1
191+
TRANSCRIBING:have you tried the latest real time
192+
Speaker ID=Guest-1
193+
TRANSCRIBING:have you tried the latest real time diarization
194+
Speaker ID=Guest-1
195+
TRANSCRIBING:have you tried the latest real time diarization in
196+
Speaker ID=Guest-1
197+
TRANSCRIBING:have you tried the latest real time diarization in microsoft
198+
Speaker ID=Guest-1
199+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech
200+
Speaker ID=Guest-1
201+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service
202+
Speaker ID=Guest-1
203+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which
204+
Speaker ID=Guest-1
205+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can
206+
Speaker ID=Guest-1
207+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you
208+
Speaker ID=Guest-1
209+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said
210+
Speaker ID=Guest-1
211+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what
212+
Speaker ID=Guest-1
213+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what in
214+
Speaker ID=Guest-1
215+
TRANSCRIBING:have you tried the latest real time diarization in microsoft speech service which can tell you who said what in real time
216+
Speaker ID=Guest-1
217+
218+
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time?
219+
Speaker ID=Guest-1
220+
221+
TRANSCRIBING:not yet
222+
Speaker ID=Unknown
223+
TRANSCRIBING:not yet i
224+
Speaker ID=Guest-2
225+
TRANSCRIBING:not yet i've been using
226+
Speaker ID=Guest-2
227+
TRANSCRIBING:not yet i've been using the
228+
Speaker ID=Guest-2
229+
TRANSCRIBING:not yet i've been using the batch
230+
Speaker ID=Guest-2
231+
TRANSCRIBING:not yet i've been using the batch trans
232+
Speaker ID=Guest-2
233+
TRANSCRIBING:not yet i've been using the batch transcription with
234+
Speaker ID=Guest-2
235+
TRANSCRIBING:not yet i've been using the batch transcription with diarization
236+
Speaker ID=Guest-2
237+
TRANSCRIBING:not yet i've been using the batch transcription with diarization function
238+
Speaker ID=Guest-2
239+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality
240+
Speaker ID=Guest-2
241+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but
242+
Speaker ID=Guest-2
243+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it
244+
Speaker ID=Guest-2
245+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces
246+
Speaker ID=Guest-2
247+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces di
248+
Speaker ID=Guest-2
249+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization
250+
Speaker ID=Guest-2
251+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results
252+
Speaker ID=Guest-2
253+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after
254+
Speaker ID=Guest-2
255+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the
256+
Speaker ID=Guest-2
257+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole
258+
Speaker ID=Guest-2
259+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio
260+
Speaker ID=Guest-2
261+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is
262+
Speaker ID=Guest-2
263+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed
264+
Speaker ID=Guest-2
265+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the
266+
Speaker ID=Guest-2
267+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new
268+
Speaker ID=Guest-2
269+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature
270+
Speaker ID=Guest-2
271+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to
272+
Speaker ID=Guest-2
273+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to di
274+
Speaker ID=Guest-2
275+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize
276+
Speaker ID=Guest-2
277+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real
278+
Speaker ID=Guest-2
279+
TRANSCRIBING:not yet i've been using the batch transcription with diarization functionality but it produces diarization results after the whole audio is processed is the new feature able to diarize in real time
280+
Speaker ID=Guest-2
281+
282+
TRANSCRIBED: Text=Not yet. I've been using the batch transcription with diarization functionality, but it produces diarization results after the whole audio is processed. Is the new feature able to diarize in real time?
283+
Speaker ID=Guest-2
284+
285+
TRANSCRIBING:absolutely
286+
Speaker ID=Unknown
287+
TRANSCRIBING:absolutely i
288+
Speaker ID=Unknown
289+
TRANSCRIBING:absolutely i recom
290+
Speaker ID=Guest-1
291+
TRANSCRIBING:absolutely i recommend
292+
Speaker ID=Guest-1
293+
TRANSCRIBING:absolutely i recommend you
294+
Speaker ID=Guest-1
295+
TRANSCRIBING:absolutely i recommend you give it a try
296+
Speaker ID=Guest-1
297+
298+
TRANSCRIBED: Text=Absolutely, I recommend you give it a try.
299+
Speaker ID=Guest-1
300+
301+
TRANSCRIBING:that's exc
302+
Speaker ID=Unknown
303+
TRANSCRIBING:that's exciting
304+
Speaker ID=Unknown
305+
TRANSCRIBING:that's exciting let me
306+
Speaker ID=Guest-2
307+
TRANSCRIBING:that's exciting let me try
308+
Speaker ID=Guest-2
309+
TRANSCRIBING:that's exciting let me try it right now
310+
Speaker ID=Guest-2
311+
312+
TRANSCRIBED: Text=That's exciting. Let me try it right now.
313+
Speaker ID=Guest-2
163314
```
164315

165316
Speakers are identified as Guest-1, Guest-2, and so on, depending on the number of speakers in the conversation.
166317

318+
> [!NOTE]
319+
> You might see `Speaker ID=Unknown` in some of the early intermediate results when the speaker is not yet identified. Without intermediate diarization results (if you don't set the `PropertyId::SpeechServiceResponse_DiarizeIntermediateResults` property to "true"), the speaker ID is always "Unknown".
320+
167321
## Clean up resources
168322

169323
[!INCLUDE [Delete resource](../../common/delete-resource.md)]

0 commit comments

Comments
 (0)