Merge pull request #248309 from eric-urban/eur/stt-diarization

v-regandowner · web-flow · commit 2f1ef2d59ad8 · 2023-08-15T10:56:03.000-04:00
STT diarization edits
diff --git a/articles/ai-services/speech-service/get-started-stt-diarization.md b/articles/ai-services/speech-service/get-started-stt-diarization.md
@@ -14,7 +14,7 @@ zone_pivot_groups: programming-languages-set-twenty-two
 keywords: speech to text, speech to text software
 ---
 
-# Quickstart: Real-time diarization (preview)
+# Quickstart: Real-time diarization (Preview)
 
 ::: zone pivot="programming-language-csharp"
 [!INCLUDE [C# include](includes/quickstarts/stt-diarization/csharp.md)]
diff --git a/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cpp.md b/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cpp.md
@@ -134,7 +134,7 @@ Follow these steps to create a new console application and install the Speech SD
     }
     ```
 
-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
     > [!NOTE]
     > The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
 1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md). 
diff --git a/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/csharp.md b/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/csharp.md
@@ -110,7 +110,7 @@ Follow these steps to create a new console application and install the Speech SD
     }
     ```
 
-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
     > [!NOTE]
     > The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
 1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md). 
diff --git a/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/intro.md b/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/intro.md
@@ -8,6 +8,9 @@ ms.author: eur
 
 In this quickstart, you run an application for speech to text transcription with real-time diarization. Here, diarization is distinguishing between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech. 
 
+> [!NOTE]
+> Real-time diarization is currently in public preview. 
+
 The speaker information is included in the result in the speaker ID field. The speaker ID is a generic identifier assigned to each conversation participant by the service during the recognition as different speakers are being identified from the provided audio content. 
 
 > [!TIP]
diff --git a/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/java.md b/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/java.md
@@ -139,7 +139,7 @@ Follow these steps to create a new console application for conversation transcri
     }
     ```
 
-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
     > [!NOTE]
     > The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
 1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md). 
diff --git a/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/python.md b/articles/ai-services/speech-service/includes/quickstarts/stt-diarization/python.md
@@ -38,28 +38,28 @@ Follow these steps to create a new console application.
 1. Copy the following code into `conversation_transcription.py`: 
 
     ```Python
-        import os
-        import time
-        import azure.cognitiveservices.speech as speechsdk
+    import os
+    import time
+    import azure.cognitiveservices.speech as speechsdk
 
-        def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
+    def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
         print('Canceled event')
 
-        def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
+    def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
         print('SessionStopped event')
 
-        def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
+    def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
         print('TRANSCRIBED:')
         if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
-                print('\tText={}'.format(evt.result.text))
-                print('\tSpeaker ID={}'.format(evt.result.speaker_id))
+            print('\tText={}'.format(evt.result.text))
+            print('\tSpeaker ID={}'.format(evt.result.speaker_id))
         elif evt.result.reason == speechsdk.ResultReason.NoMatch:
-                print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
+            print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
 
-        def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
+    def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
         print('SessionStarted event')
 
-        def recognize_from_file():
+    def recognize_from_file():
         # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
         speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
         speech_config.speech_recognition_language="en-US"
@@ -68,11 +68,12 @@ Follow these steps to create a new console application.
         conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)
 
         transcribing_stop = False
+
         def stop_cb(evt: speechsdk.SessionEventArgs):
-                #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
-                print('CLOSING on {}'.format(evt))
-                nonlocal transcribing_stop
-                transcribing_stop = True
+            #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
+            print('CLOSING on {}'.format(evt))
+            nonlocal transcribing_stop
+            transcribing_stop = True
 
         # Connect callbacks to the events fired by the convesation transcriber
         conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
@@ -82,24 +83,24 @@ Follow these steps to create a new console application.
         # stop transcribing on either session stopped or canceled events
         conversation_transcriber.session_stopped.connect(stop_cb)
         conversation_transcriber.canceled.connect(stop_cb)
-        
+
         conversation_transcriber.start_transcribing_async()
 
         # Waits for completion.
         while not transcribing_stop:
-                time.sleep(.5)
+            time.sleep(.5)
 
         conversation_transcriber.stop_transcribing_async()
 
-        # Main
+    # Main
 
-        try:
+    try:
         recognize_from_file()
-        except Exception as err:
+    except Exception as err:
         print("Encountered exception. {}".format(err))
     ```
 
-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
     > [!NOTE]
     > The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
 1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md). 
@@ -139,9 +140,9 @@ TRANSCRIBED:
         Text=That's exciting. Let me try it right now.
         Speaker ID=Guest-2
 Canceled event
-CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=606e8b5e65b94419b824d224127d9f92, result=ConversationTranscriptionResult(result_id=21d17c5738b442f8a7d428d0d5363fa8, speaker_id=, text=, reason=ResultReason.Canceled))  
+CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=92a0abb68636471dac07041b335d9be3, result=ConversationTranscriptionResult(result_id=ad1b1d83b5c742fcacca0692baa8df74, speaker_id=, text=, reason=ResultReason.Canceled))
 SessionStopped event
-CLOSING on SessionEventArgs(session_id=606e8b5e65b94419b824d224127d9f92)
+CLOSING on SessionEventArgs(session_id=92a0abb68636471dac07041b335d9be3)
 ```
 
 Speakers are identified as Guest-1, Guest-2, and so on, depending on the number of speakers in the conversation.
diff --git a/articles/ai-services/speech-service/speech-to-text.md b/articles/ai-services/speech-service/speech-to-text.md
@@ -27,16 +27,17 @@ For a full list of available speech to text languages, see [Language and voice s
 
 With real-time speech to text, the audio is transcribed as speech is recognized from a microphone or file. Use real-time speech to text for applications that need to transcribe audio in real-time such as:
 - Transcriptions, captions, or subtitles for live meetings
+- [Diarization](get-started-stt-diarization.md)
+- [Pronunciation assessment](how-to-pronunciation-assessment.md)
 - Contact center agent assist
 - Dictation
 - Voice agents
-- Pronunciation assessment
 
 Real-time speech to text is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md). 
 
 ## Batch transcription
 
-Batch transcription is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:
+[Batch transcription](batch-transcription.md) is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:
 - Transcriptions, captions, or subtitles for pre-recorded audio
 - Contact center post-call analytics
 - Diarization
diff --git a/articles/ai-services/speech-service/toc.yml b/articles/ai-services/speech-service/toc.yml
@@ -63,7 +63,7 @@ items:
         href: how-to-recognize-speech.md
       - name: Get speech recognition results
         href: get-speech-recognition-results.md
-      - name: Real-time diarization
+      - name: Real-time diarization quickstart
         href: get-started-stt-diarization.md
     - name: Batch transcription
       items:

Original file line number	Diff line number	Diff line change
`@@ -134,7 +134,7 @@ Follow these steps to create a new console application and install the Speech SD`
`134`	`134`	`}`
`135`	`135`	```
`136`	`136`
`137`		-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
	`137`	+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
`138`	`138`	`> [!NOTE]`
`139`	`139`	> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
`140`	`140`	1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
Original file line number	Diff line number	Diff line change
`@@ -110,7 +110,7 @@ Follow these steps to create a new console application and install the Speech SD`
`110`	`110`	`}`
`111`	`111`	```
`112`	`112`
`113`		-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
	`113`	+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
`114`	`114`	`> [!NOTE]`
`115`	`115`	> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
`116`	`116`	1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
Original file line number	Diff line number	Diff line change
`@@ -139,7 +139,7 @@ Follow these steps to create a new console application for conversation transcri`
`139`	`139`	`}`
`140`	`140`	```
`141`	`141`
`142`		-1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
	`142`	+1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
`143`	`143`	`> [!NOTE]`
`144`	`144`	> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
`145`	`145`	1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).