Skip to content

Commit da83b8d

Browse files
authored
Merge pull request #82063 from jimxieJ/jimxie/update_diarization_feature
Update cts doc for differentiate speaker property
2 parents 5345244 + 417bae7 commit da83b8d

File tree

3 files changed

+13
-3
lines changed

3 files changed

+13
-3
lines changed

articles/cognitive-services/Speech-Service/conversation-transcription.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,13 @@ This is a high-level overview of how Conversation Transcription works.
5050
## Expected inputs
5151

5252
- **Multi-channel audio stream** – For specification and design details, see [Microsoft Speech Device SDK Microphone](./speech-devices-sdk-microphone.md). To learn more or purchase a development kit, see [Get Microsoft Speech Device SDK](./get-speech-devices-sdk.md).
53-
- **User voice samples** – Conversation Transcription needs user profiles in advance of the conversation. You will need to collect audio recordings from each user, then send the recordings to the [Signature Generation Service](https://aka.ms/cts/signaturegenservice) to validate the audio and generate user profiles.
53+
- **User voice samples** – Conversation Transcription needs user profiles in advance of the conversation for speaker identification. You will need to collect audio recordings from each user, then send the recordings to the [Signature Generation Service](https://aka.ms/cts/signaturegenservice) to validate the audio and generate user profiles.
5454

5555
> [!NOTE]
56-
> User voice samples are optional. Without this input, the transcription will show different speakers, but shown as "Speaker1", "Speaker2", etc. instead of recognizing as pre-enrolled specific speaker names.
56+
> User voice samples for voice signatures are required for speaker identification. Speakers who do not have voice samples will be recognized as "Unidentified". Unidentified speakers can still be differentiated when the `DifferentiateGuestSpeakers` property is enabled (see example below). The transcription output will then show speakers as "Guest_0", "Guest_1", etc. instead of recognizing as pre-enrolled specific speaker names.
57+
> ```csharp
58+
> config.SetProperty("DifferentiateGuestSpeakers", "true");
59+
> ```
5760
5861
5962
## Real-time vs. asynchronous

articles/cognitive-services/Speech-Service/includes/how-to/conversation-transcription/real-time-csharp.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ This sample code does the following:
107107
* Creates a `ConversationTranscriber` using the constructor, and subscribes to the necessary events.
108108
* Adds participants to the conversation. The strings `voiceSignatureStringUser1` and `voiceSignatureStringUser2` should come as output from the steps above from the function `GetVoiceSignatureString()`.
109109
* Joins the conversation and begins transcription.
110+
* If you want to differentiate speakers without providing voice samples, please enable `DifferentiateGuestSpeakers` feature as in [Conversation Transcription Overview](../../../conversation-transcription.md).
110111

111112
> [!NOTE]
112113
> `AudioStreamReader` is a helper class you can get on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/quickstart/csharp/dotnet/conversation-transcription/helloworld/AudioStreamReader.cs).
@@ -133,7 +134,9 @@ public static async Task TranscribeConversationsAsync(string voiceSignatureStrin
133134

134135
var config = SpeechConfig.FromSubscription(subscriptionKey, region);
135136
config.SetProperty("ConversationTranscriptionInRoomAndOnline", "true");
136-
// config.SpeechRecognitionLanguage = "zh-cn"; // en-us by default. This code specifies Chinese.
137+
138+
// en-us by default. Adding this code to specify other languages, like zh-cn.
139+
// config.SpeechRecognitionLanguage = "zh-cn";
137140
var stopRecognition = new TaskCompletionSource<int>();
138141

139142
using (var audioInput = AudioConfig.FromWavFileInput(filepath))

articles/cognitive-services/Speech-Service/includes/how-to/conversation-transcription/real-time-javascript.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ This sample code does the following:
7272
* Creates a `ConversationTranscriber` using the constructor.
7373
* Adds participants to the conversation. The strings `voiceSignatureStringUser1` and `voiceSignatureStringUser2` should come as output from the steps above.
7474
* Registers to events and begins transcription.
75+
* If you want to differentiate speakers without providing voice samples, please enable `DifferentiateGuestSpeakers` feature as in [Conversation Transcription Overview](../../../conversation-transcription.md).
7576

7677
```javascript
7778
(function() {
@@ -93,6 +94,9 @@ This sample code does the following:
9394

9495
var speechTranslationConfig = sdk.SpeechTranslationConfig.fromSubscription(subscriptionKey, region);
9596
var audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
97+
speechTranslationConfig.setProperty("ConversationTranscriptionInRoomAndOnline", "true");
98+
99+
// en-us by default. Adding this code to specify other languages, like zh-cn.
96100
speechTranslationConfig.speechRecognitionLanguage = "en-US";
97101

98102
// create conversation and transcriber

0 commit comments

Comments
 (0)