Skip to content

Commit 0524199

Browse files
authored
Merge pull request #264568 from TimShererWithAquent/us200722d
Freshness update: Azure AI Speech service
2 parents 7984ed9 + 41ac651 commit 0524199

File tree

7 files changed

+132
-91
lines changed

7 files changed

+132
-91
lines changed

articles/ai-services/speech-service/get-started-stt-diarization.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,20 @@
11
---
22
title: "Real-time diarization quickstart - Speech service"
33
titleSuffix: Azure AI services
4-
description: In this quickstart, you convert speech to text continuously from a file. The service transcribes the speech and identifies one or more speakers.
4+
description: In this quickstart, you convert speech to text continuously from a file. The Speech service transcribes the speech and identifies one or more speakers.
55
author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
99
ms.topic: quickstart
10-
ms.date: 7/27/2023
10+
ms.date: 01/30/2024
1111
ms.author: eur
1212
zone_pivot_groups: programming-languages-speech-services
1313
keywords: speech to text, speech to text software
14+
#customer intent: As a developer, I want to create speech to text applications that use diarization to improve readability of multiple person conversations.
1415
---
1516

16-
# Quickstart: Real-time diarization (Preview)
17+
# Quickstart: Create real-time diarization (Preview)
1718

1819
::: zone pivot="programming-language-csharp"
1920
[!INCLUDE [C# include](includes/quickstarts/stt-diarization/csharp.md)]
@@ -55,8 +56,7 @@ keywords: speech to text, speech to text software
5556
[!INCLUDE [CLI include](includes/quickstarts/stt-diarization/cli.md)]
5657
::: zone-end
5758

58-
59-
## Next steps
59+
## Next step
6060

6161
> [!div class="nextstepaction"]
6262
> [Learn more about speech recognition](how-to-recognize-speech.md)

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/cpp.md

Lines changed: 24 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 7/27/2023
5+
ms.date: 01/30/2024
66
ms.author: eur
77
---
88

@@ -15,23 +15,27 @@ ms.author: eur
1515
[!INCLUDE [Prerequisites](../../common/azure-prerequisites.md)]
1616

1717
## Set up the environment
18+
1819
The Speech SDK is available as a [NuGet package](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the [SDK installation guide](../../../quickstarts/setup-platform.md?pivots=programming-language-cpp) for any more requirements.
1920

2021
### Set environment variables
2122

2223
[!INCLUDE [Environment variables](../../common/environment-variables.md)]
2324

24-
## Diarization from file with conversation transcription
25+
## Implement diarization from file with conversation transcription
26+
27+
Follow these steps to create a console application and install the Speech SDK.
2528

26-
Follow these steps to create a new console application and install the Speech SDK.
29+
1. Create a new C++ console project in [Visual Studio Community 2022](https://visualstudio.microsoft.com/downloads/) named `ConversationTranscription`.
2730

28-
1. Create a new C++ console project in Visual Studio Community 2022 named `ConversationTranscription`.
29-
1. Install the Speech SDK in your new project with the NuGet package manager.
30-
```powershell
31+
1. Select **Tools** > **Nuget Package Manager** > **Package Manager Console**. In the **Package Manager Console**, run this command:
32+
33+
```console
3134
Install-Package Microsoft.CognitiveServices.Speech
3235
```
33-
1. Replace the contents of `ConversationTranscription.cpp` with the following code:
34-
36+
37+
1. Replace the contents of `ConversationTranscription.cpp` with the following code.
38+
3539
```cpp
3640
#include <iostream>
3741
#include <stdlib.h>
@@ -134,20 +138,23 @@ Follow these steps to create a new console application and install the Speech SD
134138
}
135139
```
136140

137-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
138-
> [!NOTE]
139-
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
140-
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
141+
1. Get the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) or use your own `.wav` file. Replace `katiesteve.wav` with the path and name of your `.wav` file.
142+
143+
The application recognizes speech from multiple participants in the conversation. Your audio file should contain multiple speakers.
144+
145+
> [!NOTE]
146+
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
141147

148+
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
142149

143-
[Build and run](/cpp/build/vscpp-step-2-build) your application to start conversation transcription:
150+
1. [Build and run](/cpp/build/vscpp-step-2-build) your application to start conversation transcription:
144151

145-
> [!IMPORTANT]
146-
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` environment variables as described [above](#set-environment-variables). If you don't set these variables, the sample will fail with an error message.
152+
> [!IMPORTANT]
153+
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` [environment variables](#set-environment-variables). If you don't set these variables, the sample fails with an error message.
147154

148-
The transcribed conversation should be output as text:
155+
The transcribed conversation should be output as text:
149156

150-
```console
157+
```output
151158
TRANSCRIBED: Text=Good morning, Steve. Speaker ID=Unknown
152159
TRANSCRIBED: Text=Good morning. Katie. Speaker ID=Unknown
153160
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/csharp.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 7/27/2023
5+
ms.date: 01/30/2024
66
ms.author: eur
77
---
88

@@ -15,25 +15,32 @@ ms.author: eur
1515
[!INCLUDE [Prerequisites](../../common/azure-prerequisites.md)]
1616

1717
## Set up the environment
18+
1819
The Speech SDK is available as a [NuGet package](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the [SDK installation guide](../../../quickstarts/setup-platform.md?pivots=programming-language-csharp) for any more requirements.
1920

2021
### Set environment variables
2122

2223
[!INCLUDE [Environment variables](../../common/environment-variables.md)]
2324

24-
## Diarization from file with conversation transcription
25+
## Implement diarization from file with conversation transcription
26+
27+
Follow these steps to create a console application and install the Speech SDK.
2528

26-
Follow these steps to create a new console application and install the Speech SDK.
29+
1. Open a command prompt window in the folder where you want the new project. Run this command to create a console application with the .NET CLI.
2730

28-
1. Open a command prompt where you want the new project, and create a console application with the .NET CLI. The `Program.cs` file should be created in the project directory.
2931
```dotnetcli
3032
dotnet new console
3133
```
34+
35+
This command creates the *Program.cs* file in your project directory.
36+
3237
1. Install the Speech SDK in your new project with the .NET CLI.
38+
3339
```dotnetcli
3440
dotnet add package Microsoft.CognitiveServices.Speech
3541
```
36-
1. Replace the contents of `Program.cs` with the following code.
42+
43+
1. Replace the contents of `Program.cs` with the following code.
3744
3845
```csharp
3946
using Microsoft.CognitiveServices.Speech;
@@ -110,23 +117,27 @@ Follow these steps to create a new console application and install the Speech SD
110117
}
111118
```
112119
113-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
114-
> [!NOTE]
115-
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
116-
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
120+
1. Get the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) or use your own `.wav` file. Replace `katiesteve.wav` with the path and name of your `.wav` file.
117121
118-
Run your new console application to start conversation transcription:
122+
The application recognizes speech from multiple participants in the conversation. Your audio file should contain multiple speakers.
119123
120-
```console
121-
dotnet run
122-
```
124+
> [!NOTE]
125+
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
126+
127+
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
128+
129+
1. Run your console application to start conversation transcription:
130+
131+
```dotnetcli
132+
dotnet run
133+
```
123134

124135
> [!IMPORTANT]
125-
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` environment variables as described [above](#set-environment-variables). If you don't set these variables, the sample will fail with an error message.
136+
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` [environment variables](#set-environment-variables). If you don't set these variables, the sample fails with an error message.
126137
127-
The transcribed conversation should be output as text:
138+
The transcribed conversation should be output as text:
128139

129-
```console
140+
```output
130141
TRANSCRIBED: Text=Good morning, Steve. Speaker ID=Unknown
131142
TRANSCRIBED: Text=Good morning. Katie. Speaker ID=Unknown
132143
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1
@@ -142,4 +153,3 @@ Speakers are identified as Guest-1, Guest-2, and so on, depending on the number
142153
## Clean up resources
143154

144155
[!INCLUDE [Delete resource](../../common/delete-resource.md)]
145-

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/intro.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 05/08/2023
5+
ms.date: 01/30/2024
66
ms.author: eur
77
---
88

9-
In this quickstart, you run an application for speech to text transcription with real-time diarization. Here, diarization is distinguishing between the different speakers participating in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech.
9+
In this quickstart, you run an application for speech to text transcription with real-time diarization. Diarization distinguishes between the different speakers who participate in the conversation. The Speech service provides information about which speaker was speaking a particular part of transcribed speech.
1010

1111
> [!NOTE]
12-
> Real-time diarization is currently in public preview.
12+
> Real-time diarization is currently in public preview.
1313
14-
The speaker information is included in the result in the speaker ID field. The speaker ID is a generic identifier assigned to each conversation participant by the service during the recognition as different speakers are being identified from the provided audio content.
14+
The speaker information is included in the result in the speaker ID field. The speaker ID is a generic identifier assigned to each conversation participant by the service during the recognition as different speakers are being identified from the provided audio content.
1515

1616
> [!TIP]
17-
> You can try real-time speech to text in [Speech Studio](https://aka.ms/speechstudio/speechtotexttool) without signing up or writing any code. However, the Speech Studio doesn't yet support diarization.
17+
> You can try real-time speech to text in [Speech Studio](https://aka.ms/speechstudio/speechtotexttool) without signing up or writing any code. However, the Speech Studio doesn't yet support diarization.

articles/ai-services/speech-service/includes/quickstarts/stt-diarization/java.md

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 7/27/2023
5+
ms.date: 01/30/2024
66
ms.author: eur
77
---
88

@@ -16,10 +16,12 @@ ms.author: eur
1616

1717
## Set up the environment
1818

19-
Before you can do anything, you need to install the Speech SDK. The sample in this quickstart works with the [Java Runtime](~/articles/cognitive-services/speech-service/quickstarts/setup-platform.md?pivots=programming-language-java&tabs=jre).
19+
To set up your environment, [install the Speech SDK](~/articles/ai-services/speech-service/quickstarts/setup-platform.md?pivots=programming-language-java&tabs=jre). The sample in this quickstart works with the [Java Runtime](~/articles/cognitive-services/speech-service/quickstarts/setup-platform.md?pivots=programming-language-java&tabs=jre).
2020

2121
1. Install [Apache Maven](https://maven.apache.org/install.html). Then run `mvn -v` to confirm successful installation.
22+
2223
1. Create a new `pom.xml` file in the root of your project, and copy the following into it:
24+
2325
```xml
2426
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
2527
<modelVersion>4.0.0</modelVersion>
@@ -48,7 +50,9 @@ Before you can do anything, you need to install the Speech SDK. The sample in th
4850
</dependencies>
4951
</project>
5052
```
53+
5154
1. Install the Speech SDK and dependencies.
55+
5256
```console
5357
mvn clean dependency:copy-dependencies
5458
```
@@ -57,11 +61,12 @@ Before you can do anything, you need to install the Speech SDK. The sample in th
5761

5862
[!INCLUDE [Environment variables](../../common/environment-variables.md)]
5963

60-
## Diarization from file with conversation transcription
64+
## Implement diarization from file with conversation transcription
6165

62-
Follow these steps to create a new console application for conversation transcription.
66+
Follow these steps to create a console application for conversation transcription.
6367

6468
1. Create a new file named `ConversationTranscription.java` in the same project root directory.
69+
6570
1. Copy the following code into `ConversationTranscription.java`:
6671

6772
```java
@@ -139,24 +144,28 @@ Follow these steps to create a new console application for conversation transcri
139144
}
140145
```
141146

142-
1. Replace `katiesteve.wav` with the filepath and filename of your `.wav` file. The intent of this quickstart is to recognize speech from multiple participants in the conversation. Your audio file should contain multiple speakers. For example, you can use the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) provided in the Speech SDK samples repository on GitHub.
143-
> [!NOTE]
144-
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
145-
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
147+
1. Get the [sample audio file](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/sampledata/audiofiles/katiesteve.wav) or use your own `.wav` file. Replace `katiesteve.wav` with the path and name of your `.wav` file.
146148

147-
Run your new console application to start conversation transcription:
149+
The application recognizes speech from multiple participants in the conversation. Your audio file should contain multiple speakers.
148150

149-
```console
150-
javac ConversationTranscription.java -cp ".;target\dependency\*"
151-
java -cp ".;target\dependency\*" ConversationTranscription
152-
```
151+
> [!NOTE]
152+
> The service performs best with at least 7 seconds of continuous audio from a single speaker. This allows the system to differentiate the speakers properly. Otherwise the Speaker ID is returned as `Unknown`.
153+
154+
1. To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
155+
156+
1. Run your new console application to start conversation transcription:
157+
158+
```console
159+
javac ConversationTranscription.java -cp ".;target\dependency\*"
160+
java -cp ".;target\dependency\*" ConversationTranscription
161+
```
153162

154163
> [!IMPORTANT]
155-
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` environment variables as described [above](#set-environment-variables). If you don't set these variables, the sample will fail with an error message.
164+
> Make sure that you set the `SPEECH_KEY` and `SPEECH_REGION` [environment variables](#set-environment-variables). If you don't set these variables, the sample fails with an error message.
156165
157-
The transcribed conversation should be output as text:
166+
The transcribed conversation should be output as text:
158167

159-
```console
168+
```output
160169
TRANSCRIBED: Text=Good morning, Steve. Speaker ID=Unknown
161170
TRANSCRIBED: Text=Good morning. Katie. Speaker ID=Unknown
162171
TRANSCRIBED: Text=Have you tried the latest real time diarization in Microsoft Speech Service which can tell you who said what in real time? Speaker ID=Guest-1

0 commit comments

Comments
 (0)