Skip to content

Commit 2f1ef2d

Browse files
Merge pull request #248309 from eric-urban/eur/stt-diarization
STT diarization edits
2 parents 15437f8 + 39137a6 commit 2f1ef2d

File tree

8 files changed

+35
-30
lines changed

8 files changed

+35
-30
lines changed

articles/ai-services/speech-service/get-started-stt-diarization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ zone_pivot_groups: programming-languages-set-twenty-two
1414
keywords: speech to text, speech to text software
1515
---
1616

17-
# Quickstart: Real-time diarization (preview)
17+
# Quickstart: Real-time diarization (Preview)
1818

1919
::: zone pivot="programming-language-csharp"
2020
[!INCLUDE [C# include](includes/quickstarts/stt-diarization/csharp.md)]

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cpp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ Follow these steps to create a new console application and install the Speech SD
134134
}
135135
```
136136
137-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
137+
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
138138
> [!NOTE]
139139
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
140140
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/csharp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ Follow these steps to create a new console application and install the Speech SD
110110
}
111111
```
112112
113-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
113+
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
114114
> [!NOTE]
115115
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
116116
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/intro.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ ms.author: eur
88

99
In this quickstart, you run an application for speech to text transcription with real-time diarization. Here, diarization is distinguishing between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech.
1010

11+
> [!NOTE]
12+
> Real-time diarization is currently in public preview.
13+
1114
The speaker information is included in the result in the speaker ID field. The speaker ID is a generic identifier assigned to each conversation participant by the service during the recognition as different speakers are being identified from the provided audio content.
1215

1316
> [!TIP]

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/java.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Follow these steps to create a new console application for conversation transcri
139139
}
140140
```
141141

142-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
142+
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
143143
> [!NOTE]
144144
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
145145
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/python.md

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -38,28 +38,28 @@ Follow these steps to create a new console application.
3838
1. Copy the following code into `conversation_transcription.py`:
3939

4040
```Python
41-
import os
42-
import time
43-
import azure.cognitiveservices.speech as speechsdk
41+
import os
42+
import time
43+
import azure.cognitiveservices.speech as speechsdk
4444

45-
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
45+
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
4646
print('Canceled event')
4747

48-
def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
48+
def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
4949
print('SessionStopped event')
5050

51-
def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
51+
def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
5252
print('TRANSCRIBED:')
5353
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
54-
print('\tText={}'.format(evt.result.text))
55-
print('\tSpeaker ID={}'.format(evt.result.speaker_id))
54+
print('\tText={}'.format(evt.result.text))
55+
print('\tSpeaker ID={}'.format(evt.result.speaker_id))
5656
elif evt.result.reason == speechsdk.ResultReason.NoMatch:
57-
print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
57+
print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
5858

59-
def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
59+
def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
6060
print('SessionStarted event')
6161

62-
def recognize_from_file():
62+
def recognize_from_file():
6363
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
6464
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
6565
speech_config.speech_recognition_language="en-US"
@@ -68,11 +68,12 @@ Follow these steps to create a new console application.
6868
conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
6969

7070
transcribing_stop = False
71+
7172
def stop_cb(evt: speechsdk.SessionEventArgs):
72-
#"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
73-
print('CLOSING on {}'.format(evt))
74-
nonlocal transcribing_stop
75-
transcribing_stop = True
73+
#"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
74+
print('CLOSING on {}'.format(evt))
75+
nonlocal transcribing_stop
76+
transcribing_stop = True
7677

7778
# Connect callbacks to the events fired by the convesation transcriber
7879
conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
@@ -82,24 +83,24 @@ Follow these steps to create a new console application.
8283
# stop transcribing on either session stopped or canceled events
8384
conversation_transcriber.session_stopped.connect(stop_cb)
8485
conversation_transcriber.canceled.connect(stop_cb)
85-
86+
8687
conversation_transcriber.start_transcribing_async()
8788

8889
# Waits for completion.
8990
while not transcribing_stop:
90-
time.sleep(.5)
91+
time.sleep(.5)
9192

9293
conversation_transcriber.stop_transcribing_async()
9394

94-
# Main
95+
# Main
9596

96-
try:
97+
try:
9798
recognize_from_file()
98-
except Exception as err:
99+
except Exception as err:
99100
print("Encountered exception. {}".format(err))
100101
```
101102

102-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
103+
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
103104
> [!NOTE]
104105
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
105106
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
@@ -139,9 +140,9 @@ TRANSCRIBED:
139140
Text=That's exciting. Let me try it right now.
140141
Speaker ID=Guest-2
141142
Canceled event
142-
CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=606e8b5e65b94419b824d224127d9f92, result=ConversationTranscriptionResult(result_id=21d17c5738b442f8a7d428d0d5363fa8, speaker_id=, text=, reason=ResultReason.Canceled))
143+
CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=92a0abb68636471dac07041b335d9be3, result=ConversationTranscriptionResult(result_id=ad1b1d83b5c742fcacca0692baa8df74, speaker_id=, text=, reason=ResultReason.Canceled))
143144
SessionStopped event
144-
CLOSING on SessionEventArgs(session_id=606e8b5e65b94419b824d224127d9f92)
145+
CLOSING on SessionEventArgs(session_id=92a0abb68636471dac07041b335d9be3)
145146
```
146147

147148
Speakers are identified as Guest-1, Guest-2, and so on, depending on the number of speakers in the conversation.

articles/ai-services/speech-service/speech-to-text.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,17 @@ For a full list of available speech to text languages, see [Language and voice s
2727

2828
With real-time speech to text, the audio is transcribed as speech is recognized from a microphone or file. Use real-time speech to text for applications that need to transcribe audio in real-time such as:
2929
- Transcriptions, captions, or subtitles for live meetings
30+
- [Diarization](get-started-stt-diarization.md)
31+
- [Pronunciation assessment](how-to-pronunciation-assessment.md)
3032
- Contact center agent assist
3133
- Dictation
3234
- Voice agents
33-
- Pronunciation assessment
3435

3536
Real-time speech to text is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
3637

3738
## Batch transcription
3839

39-
Batch transcription is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:
40+
[Batch transcription](batch-transcription.md) is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:
4041
- Transcriptions, captions, or subtitles for pre-recorded audio
4142
- Contact center post-call analytics
4243
- Diarization

articles/ai-services/speech-service/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ items:
6363
href: how-to-recognize-speech.md
6464
- name: Get speech recognition results
6565
href: get-speech-recognition-results.md
66-
- name: Real-time diarization
66+
- name: Real-time diarization quickstart
6767
href: get-started-stt-diarization.md
6868
- name: Batch transcription
6969
items:

0 commit comments

Comments
 (0)