Skip to content

Commit 8faf138

Browse files
Merge pull request #202590 from eric-urban/eur/samples-repo-root
user story 1907156
2 parents b8b3eb2 + 5145569 commit 8faf138

File tree

6 files changed

+572
-56
lines changed

6 files changed

+572
-56
lines changed

articles/cognitive-services/Speech-Service/includes/common/rest.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ ms.topic: include
77
ms.author: eur
88
---
99

10-
[Speech-to-text REST API v3.0 reference](https://westus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0) | [Speech-to-text REST API for short audio reference](../../rest-speech-to-text-short.md) | [Additional Samples on GitHub](https://github.com/Azure-Samples/cognitive-services-quickstart-code)
10+
[Speech-to-text REST API v3.0 reference](https://westus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0) | [Speech-to-text REST API for short audio reference](../../rest-speech-to-text-short.md) | [Additional Samples on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk)

articles/cognitive-services/Speech-Service/includes/how-to/recognize-speech/rest.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,22 @@ At a command prompt, run the following command. Insert the following values into
1717
- Your Speech service region.
1818
- The path for input audio files. You can generate audio files by using [text-to-speech](../../../get-started-text-to-speech.md).
1919

20-
:::code language="curl" source="~/cognitive-services-quickstart-code/curl/speech/speech-to-text.sh" id="request":::
21-
22-
You should receive a response like the following one:
23-
24-
:::code language="curl" source="~/cognitive-services-quickstart-code/curl/speech/speech-to-text.sh" id="response":::
20+
```curl
21+
curl --location --request POST 'https://INSERT_REGION_HERE.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US' \
22+
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
23+
--header 'Content-Type: audio/wav' \
24+
--data-binary @'INSERT_AUDIO_FILE_PATH_HERE'
25+
```
26+
27+
You should receive a response with a JSON body like the following one:
28+
29+
```json
30+
{
31+
"RecognitionStatus": "Success",
32+
"DisplayText": "My voice is my passport, verify me.",
33+
"Offset": 6600000,
34+
"Duration": 32100000
35+
}
36+
```
2537

2638
For more information, see the [speech-to-text REST API reference](../../../rest-speech-to-text.md).

articles/cognitive-services/Speech-Service/includes/how-to/speech-synthesis/rest.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,31 @@ You might also want to change the following values:
2525
- The output voice. To get a list of voices available for your Speech service endpoint, see the next section.
2626
- The output file. In this example, we direct the response from the server into a file named *output.mp3*.
2727

28-
:::code language="curl" source="~/cognitive-services-quickstart-code/curl/speech/text-to-speech.sh":::
28+
```curl
29+
curl --location --request POST 'https://INSERT_REGION_HERE.tts.speech.microsoft.com/cognitiveservices/v1' \
30+
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
31+
--header 'Content-Type: application/ssml+xml' \
32+
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \
33+
--header 'User-Agent: curl' \
34+
--data-raw '<speak version='\''1.0'\'' xml:lang='\''en-US'\''>
35+
<voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-JennyNeural'\''>
36+
my voice is my passport verify me
37+
</voice>
38+
</speak>' > output.mp3
39+
```
2940

3041
## List available voices for your Speech service endpoint
3142

3243
To list the available voices for your Speech service endpoint, run the following command:
3344

34-
:::code language="curl" source="~/cognitive-services-quickstart-code/curl/speech/get-voices.sh" id="request":::
45+
```curl
46+
curl --location --request GET 'https://INSERT_ENDPOINT_HERE.tts.speech.microsoft.com/cognitiveservices/voices/list' \
47+
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE'
48+
```
3549

36-
You should receive a response like the following one:
50+
You should receive a response with a JSON body like the following one:
3751

38-
```http
52+
```json
3953
[
4054
{
4155
"Name": "Microsoft Server Speech Text to Speech Voice (en-US, ChristopherNeural)",

articles/cognitive-services/Speech-Service/includes/quickstarts/speaker-recognition-basics/cpp.md

Lines changed: 142 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,34 @@ Before you start, you must install the Speech SDK. Depending on your platform, u
2727

2828
To run the examples in this article, add the following statements at the top of your .cpp file:
2929

30-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="dependencies":::
30+
```cpp
31+
#include <iostream>
32+
#include <stdexcept>
33+
// Note: Install the NuGet package Microsoft.CognitiveServices.Speech.
34+
#include <speechapi_cxx.h>
35+
36+
using namespace std;
37+
using namespace Microsoft::CognitiveServices::Speech;
38+
39+
// Note: Change the locale if desired.
40+
auto profile_locale = "en-us";
41+
auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();
42+
auto ticks_per_second = 10000000;
43+
```
3144
3245
## Create a speech configuration
3346
3447
To call the Speech service by using the Speech SDK, create a [`SpeechConfig`](/cpp/cognitive-services/speech/speechconfig) class. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
3548
36-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="get_speech_config":::
49+
```cpp
50+
shared_ptr<SpeechConfig> GetSpeechConfig()
51+
{
52+
auto subscription_key = 'PASTE_YOUR_SPEECH_SUBSCRIPTION_KEY_HERE';
53+
auto region = 'PASTE_YOUR_SPEECH_ENDPOINT_REGION_HERE';
54+
auto config = SpeechConfig::FromSubscription(subscription_key, region);
55+
return config;
56+
}
57+
```
3758

3859
## Text-dependent verification
3960

@@ -43,7 +64,19 @@ Speaker verification is the act of confirming that a speaker matches a known, or
4364

4465
Start by creating the `TextDependentVerification` function:
4566

46-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="text_dependent_verification":::
67+
```cpp
68+
void TextDependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
69+
{
70+
std::cout << "Text Dependent Verification:\n\n";
71+
// Create the profile.
72+
auto profile = client->CreateProfileAsync(VoiceProfileType::TextDependentVerification, profile_locale).get();
73+
std::cout << "Created profile ID: " << profile->GetId() << "\n";
74+
AddEnrollmentsToTextDependentProfile(client, profile);
75+
SpeakerVerify(profile, recognizer);
76+
// Delete the profile.
77+
client->DeleteProfileAsync(profile);
78+
}
79+
```
4780
4881
This function creates a [VoiceProfile](/cpp/cognitive-services/speech/speaker-voiceprofile) object with the [CreateProfileAsync](/cpp/cognitive-services/speech/speaker-voiceprofileclient#createprofileasync) method. There are three [types](/cpp/cognitive-services/speech/microsoft-cognitiveservices-speech-namespace#enum-voiceprofiletype) of `VoiceProfile`:
4982
@@ -59,15 +92,44 @@ You then call two helper functions that you'll define next, `AddEnrollmentsToTex
5992
6093
Define the following function to enroll a voice profile:
6194
62-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="add_enrollments_dependent":::
95+
```cpp
96+
void AddEnrollmentsToTextDependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
97+
{
98+
shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
99+
auto phraseResult = client->GetActivationPhrasesAsync(profile->GetType(), profile_locale).get();
100+
auto phrases = phraseResult->GetPhrases();
101+
while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) > 0)
102+
{
103+
if (phrases != nullptr && phrases->size() > 0)
104+
{
105+
std::cout << "Please say the passphrase, \"" << phrases->at(0) << "\"\n";
106+
enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
107+
std::cout << "Remaining enrollments needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsCount) << ".\n";
108+
}
109+
else
110+
{
111+
std::cout << "No passphrases received, enrollment not attempted.\n\n";
112+
}
113+
}
114+
std::cout << "Enrollment completed.\n\n";
115+
}
116+
```
63117

64118
In this function, you enroll audio samples in a `while` loop that tracks the number of samples remaining, and that are required, for enrollment. In each iteration, [EnrollProfileAsync](/cpp/cognitive-services/speech/speaker-voiceprofileclient#enrollprofileasync) prompts you to speak the passphrase into your microphone, and it adds the sample to the voice profile.
65119

66120
### SpeakerVerify function
67121

68122
Define `SpeakerVerify` as follows:
69123

70-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="speaker_verify":::
124+
```cpp
125+
void SpeakerVerify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
126+
{
127+
shared_ptr<SpeakerVerificationModel> model = SpeakerVerificationModel::FromProfile(profile);
128+
std::cout << "Speak the passphrase to verify: \"My voice is my passport, verify me.\"\n";
129+
shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
130+
std::cout << "Verified voice profile for speaker: " << result->ProfileId << ". Score is: " << result->GetScore() << ".\n\n";
131+
}
132+
```
71133
72134
In this function, you create a [SpeakerVerificationModel](/cpp/cognitive-services/speech/speaker-speakerverificationmodel) object with the [SpeakerVerificationModel::FromProfile](/cpp/cognitive-services/speech/speaker-speakerverificationmodel#fromprofile) method, passing in the [VoiceProfile](/cpp/cognitive-services/speech/speaker-voiceprofile) object you created earlier.
73135
@@ -81,7 +143,19 @@ In contrast to *text-dependent* verification, *text-independent* verification do
81143
82144
Start by creating the `TextIndependentVerification` function:
83145
84-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="text_independent_verification":::
146+
```cpp
147+
void TextIndependentVerification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
148+
{
149+
std::cout << "Text Independent Verification:\n\n";
150+
// Create the profile.
151+
auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentVerification, profile_locale).get();
152+
std::cout << "Created profile ID: " << profile->GetId() << "\n";
153+
AddEnrollmentsToTextIndependentProfile(client, profile);
154+
SpeakerVerify(profile, recognizer);
155+
// Delete the profile.
156+
client->DeleteProfileAsync(profile);
157+
}
158+
```
85159

86160
Like the `TextDependentVerification` function, this function creates a [VoiceProfile](/cpp/cognitive-services/speech/speaker-voiceprofile) object with the [CreateProfileAsync](/cpp/cognitive-services/speech/speaker-voiceprofileclient#createprofileasync) method.
87161

@@ -93,7 +167,28 @@ You then call two helper functions: `AddEnrollmentsToTextIndependentProfile`, wh
93167

94168
Define the following function to enroll a voice profile:
95169

96-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="add_enrollments_independent":::
170+
```cpp
171+
void AddEnrollmentsToTextIndependentProfile(shared_ptr<VoiceProfileClient> client, shared_ptr<VoiceProfile> profile)
172+
{
173+
shared_ptr<VoiceProfileEnrollmentResult> enroll_result = nullptr;
174+
auto phraseResult = client->GetActivationPhrasesAsync(profile->GetType(), profile_locale).get();
175+
auto phrases = phraseResult->GetPhrases();
176+
while (enroll_result == nullptr || enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) > 0)
177+
{
178+
if (phrases != nullptr && phrases->size() > 0)
179+
{
180+
std::cout << "Please say the activation phrase, \"" << phrases->at(0) << "\"\n";
181+
enroll_result = client->EnrollProfileAsync(profile, audio_config).get();
182+
std::cout << "Remaining audio time needed: " << enroll_result->GetEnrollmentInfo(EnrollmentInfoType::RemainingEnrollmentsSpeechLength) / ticks_per_second << " seconds.\n";
183+
}
184+
else
185+
{
186+
std::cout << "No activation phrases received, enrollment not attempted.\n\n";
187+
}
188+
}
189+
std::cout << "Enrollment completed.\n\n";
190+
}
191+
```
97192
98193
In this function, you enroll audio samples in a `while` loop that tracks the number of seconds of audio remaining, and that are required, for enrollment. In each iteration, [EnrollProfileAsync](/cpp/cognitive-services/speech/speaker-voiceprofileclient#enrollprofileasync) prompts you to speak into your microphone, and it adds the sample to the voice profile.
99194
@@ -105,7 +200,19 @@ Speaker identification is used to determine *who* is speaking from a given group
105200
106201
Start by creating the `TextIndependentIdentification` function:
107202
108-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="text_independent_indentification":::
203+
```cpp
204+
void TextIndependentIdentification(shared_ptr<VoiceProfileClient> client, shared_ptr<SpeakerRecognizer> recognizer)
205+
{
206+
std::cout << "Speaker Identification:\n\n";
207+
// Create the profile.
208+
auto profile = client->CreateProfileAsync(VoiceProfileType::TextIndependentIdentification, profile_locale).get();
209+
std::cout << "Created profile ID: " << profile->GetId() << "\n";
210+
AddEnrollmentsToTextIndependentProfile(client, profile);
211+
SpeakerIdentify(profile, recognizer);
212+
// Delete the profile.
213+
client->DeleteProfileAsync(profile);
214+
}
215+
```
109216

110217
Like the `TextDependentVerification` and `TextIndependentVerification` functions, this function creates a [VoiceProfile](/cpp/cognitive-services/speech/speaker-voiceprofile) object with the [CreateProfileAsync](/cpp/cognitive-services/speech/speaker-voiceprofileclient#createprofileasync) method.
111218

@@ -117,7 +224,16 @@ You then call two helper functions: `AddEnrollmentsToTextIndependentProfile`, wh
117224

118225
Define the `SpeakerIdentify` function as follows:
119226

120-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="speaker_identify":::
227+
```cpp
228+
void SpeakerIdentify(shared_ptr<VoiceProfile> profile, shared_ptr<SpeakerRecognizer> recognizer)
229+
{
230+
shared_ptr<SpeakerIdentificationModel> model = SpeakerIdentificationModel::FromProfiles({ profile });
231+
// Note: We need at least four seconds of audio after pauses are subtracted.
232+
std::cout << "Please speak for at least ten seconds to identify who it is from your list of enrolled speakers.\n";
233+
shared_ptr<SpeakerRecognitionResult> result = recognizer->RecognizeOnceAsync(model).get();
234+
std::cout << "The most similar voice profile is: " << result->ProfileId << " with similarity score: " << result->GetScore() << ".\n\n";
235+
}
236+
```
121237
122238
In this function, you create a [SpeakerIdentificationModel](/cpp/cognitive-services/speech/speaker-speakeridentificationmodel) object with the [SpeakerIdentificationModel::FromProfiles](/cpp/cognitive-services/speech/speaker-speakeridentificationmodel#fromprofiles) method. `SpeakerIdentificationModel::FromProfiles` accepts a list of [VoiceProfile](/cpp/cognitive-services/speech/speaker-voiceprofile) objects. In this case, you pass in the `VoiceProfile` object you created earlier. If you want, you can pass in multiple `VoiceProfile` objects, each enrolled with audio samples from a different voice.
123239
@@ -127,11 +243,23 @@ Next, [SpeechRecognizer::RecognizeOnceAsync](/cpp/cognitive-services/speech/spee
127243
128244
Finally, define the `main` function as follows:
129245
130-
:::code language="cpp" source="~/cognitive-services-quickstart-code/cpp/speech/speaker-recognition.cpp" id="main":::
246+
```cpp
247+
int main()
248+
{
249+
auto speech_config = GetSpeechConfig();
250+
auto client = VoiceProfileClient::FromConfig(speech_config);
251+
auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);
252+
TextDependentVerification(client, recognizer);
253+
TextIndependentVerification(client, recognizer);
254+
TextIndependentIdentification(client, recognizer);
255+
std::cout << "End of quickstart.\n";
256+
}
257+
```
131258

132259
This function calls the functions you defined previously. First, it creates a [VoiceProfileClient](/cpp/cognitive-services/speech/speaker-voiceprofileclient) object and a [SpeakerRecognizer](/cpp/cognitive-services/speech/speaker-speakerrecognizer) object.
133260

134-
```
261+
262+
```cpp
135263
auto speech_config = GetSpeechConfig();
136264
auto client = VoiceProfileClient::FromConfig(speech_config);
137265
auto recognizer = SpeakerRecognizer::FromConfig(speech_config, audio_config);
@@ -143,14 +271,14 @@ The `VoiceProfileClient` object is used to create, enroll, and delete voice prof
143271

144272
The examples in this article use the default device microphone as input for audio samples. In scenarios where you need to use audio files instead of microphone input, change the following line:
145273

146-
```
274+
```cpp
147275
auto audio_config = Audio::AudioConfig::FromDefaultMicrophoneInput();
148276
```
149277

150278
to:
151279

152-
```
153-
auto audio_config = Audio::AudioConfig::FromWavFileInput(path/to/your/file.wav);
280+
```cpp
281+
auto audio_config = Audio::AudioConfig::FromWavFileInput("path/to/your/file.wav");
154282
```
155283

156284
Or replace any use of `audio_config` with [Audio::AudioConfig::FromWavFileInput](/cpp/cognitive-services/speech/audio-audioconfig#fromwavfileinput). You can also have mixed inputs by using a microphone for enrollment and files for verification, for example.

0 commit comments

Comments
 (0)