You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/luis/faq.md
-4Lines changed: 0 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,10 +24,6 @@ LUIS has several limit areas. The first is the model limit, which controls inten
24
24
25
25
An authoring resource lets you create, manage, train, test, and publish your applications. A prediction resource lets you query your prediction endpoint beyond the 1,000 requests provided by the authoring resource. See [Authoring and query prediction endpoint keys in LUIS](luis-how-to-azure-subscription.md) to learn about the differences between the authoring key and the prediction runtime key.
26
26
27
-
## Does LUIS support speech to text?
28
-
29
-
Yes, [Speech](../speech-service/how-to-recognize-intents-from-speech-csharp.md#luis-and-speech) to text is provided as an integration with LUIS.
30
-
31
27
## What are Synonyms and word variations?
32
28
33
29
LUIS has little or no knowledge of the broader _NLP_ aspects, such as semantic similarity, without explicit identification in examples. For example, the following tokens (words) are three different things until they're used in similar contexts in the examples provided:
|[Individual utterances + matching transcript](#individual-utterances--matching-transcript)| A collection (.zip) of audio files (.wav) as individual utterances. Each audio file should be 15 seconds or less in length, paired with a formatted transcript (.txt). | Professional recordings with matching transcripts | Ready for training. |
31
-
|[Long audio + transcript](#long-audio--transcript-preview)| A collection (.zip) of long, unsegmented audio files (.wav or .mp3, longer than 20 seconds, at most 1000 audio files), paired with a collection (.zip) of transcripts that contains all spoken words. | You have audio files and matching transcripts, but they aren't segmented into utterances. | Segmentation (using batch transcription).<br>Audio format transformation wherever required. |
32
-
|[Audio only (Preview)](#audio-only-preview)| A collection (.zip) of audio files (.wav or .mp3, at most 1000 audio files) without a transcript. | You only have audio files available, without transcripts. | Segmentation + transcript generation (using batch transcription).<br>Audio format transformation wherever required.|
32
+
|[Long audio + transcript](#long-audio--transcript-preview)| A collection (.zip) of long, unsegmented audio files (.wav or .mp3, longer than 20 seconds, at most 1,000 audio files), paired with a collection (.zip) of transcripts that contains all spoken words. | You have audio files and matching transcripts, but they aren't segmented into utterances. | Segmentation (using batch transcription).<br>Audio format transformation wherever required. |
33
+
|[Audio only (Preview)](#audio-only-preview)| A collection (.zip) of audio files (.wav or .mp3, at most 1,000 audio files) without a transcript. | You only have audio files available, without transcripts. | Segmentation + transcript generation (using batch transcription).<br>Audio format transformation wherever required.|
33
34
34
35
Files should be grouped by type into a dataset and uploaded as a zip file. Each dataset can only contain a single data type.
35
36
@@ -107,12 +108,12 @@ Follow these guidelines when preparing audio for segmentation.
107
108
| Sample format |RIFF(.wav): PCM, at least 16-bit.<br/><br/>mp3: At least 256 KBps bit rate.|
108
109
| Audio length | Longer than 20 seconds |
109
110
| Archive format | .zip |
110
-
| Maximum archive size | 2048 MB, at most 1000 audio files included |
111
+
| Maximum archive size | 2048 MB, at most 1,000 audio files included |
111
112
112
113
> [!NOTE]
113
114
> The default sampling rate for a custom neural voice is 24,000 Hz. Audio files with a sampling rate lower than 16,000 Hz will be rejected. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It's recommended that you should use a sample rate of 24,000 Hz for your training data.
114
115
115
-
All audio files should be grouped into a zip file. It's OK to put .wav files and .mp3 files into the same zip file. For example, you can upload a 45second audio file named 'kingstory.wav' and a 200second long audio file named 'queenstory.mp3' in the same zip file. All .mp3 files will be transformed into the .wav format after processing.
116
+
All audio files should be grouped into a zip file. It's OK to put .wav files and .mp3 files into the same zip file. For example, you can upload a 45-second audio file named 'kingstory.wav' and a 200-second long audio file named 'queenstory.mp3' in the same zip file. All .mp3 files will be transformed into the .wav format after processing.
116
117
117
118
### Transcription data for Long audio + transcript
118
119
@@ -126,7 +127,7 @@ Transcripts must be prepared to the specifications listed in this table. Each au
126
127
| # of utterances per line | No limit |
127
128
| Maximum file size | 2048 MB |
128
129
129
-
All transcripts files in this data type should be grouped into a zip file. For example, you might upload a 45second audio file named 'kingstory.wav' and a 200second long audio file named 'queenstory.mp3' in the same zip file. You need to upload another zip file containing the corresponding two transcripts--one named 'kingstory.txt' and the other one named 'queenstory.txt'. Within each plain text file, you provide the full correct transcription for the matching audio.
130
+
All transcripts files in this data type should be grouped into a zip file. For example, you might upload a 45-second audio file named 'kingstory.wav' and a 200-second long audio file named 'queenstory.mp3' in the same zip file. You need to upload another zip file containing the corresponding two transcripts--one named 'kingstory.txt' and the other one named 'queenstory.txt'. Within each plain text file, you provide the full correct transcription for the matching audio.
130
131
131
132
After your dataset is successfully uploaded, we'll help you segment the audio file into utterances based on the transcript provided. You can check the segmented utterances and the matching transcripts by downloading the dataset. Unique IDs are assigned to the segmented utterances automatically. It's important that you make sure the transcripts you provide are 100% accurate. Errors in the transcripts can reduce the accuracy during the audio segmentation and further introduce quality loss in the training phase that comes later.
132
133
@@ -150,7 +151,7 @@ Follow these guidelines when preparing audio.
150
151
| Sample format |RIFF(.wav): PCM, at least 16-bit<br>mp3: At least 256 KBps bit rate.|
151
152
| Audio length | No limit |
152
153
| Archive format | .zip |
153
-
| Maximum archive size | 2048 MB, at most 1000 audio files included |
154
+
| Maximum archive size | 2048 MB, at most 1,000 audio files included |
154
155
155
156
> [!NOTE]
156
157
> The default sampling rate for a custom neural voice is 24,000 Hz. Your audio files with a sampling rate higher than 16,000 Hz and lower than 24,000 Hz will be up-sampled to 24,000 Hz to train a neural voice. It's recommended that you should use a sample rate of 24,000 Hz for your training data.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-get-speech-session-id.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,12 +2,14 @@
2
2
title: How to get speech to text session ID and transcription ID
3
3
titleSuffix: Azure AI services
4
4
description: Learn how to get speech to text session ID and transcription ID
5
-
author: alexeyo26
5
+
author: eric-urban
6
+
ms.author: eur
6
7
manager: nitinme
7
8
ms.service: azure-ai-speech
8
9
ms.topic: how-to
9
-
ms.date: 1/21/2024
10
-
ms.author: alexeyo
10
+
ms.date: 9/20/2024
11
+
ms.reviewer: alexeyo
12
+
#Customer intent: As a developer, I need to know how to get the session ID and transcription ID for speech to text so that I can debug issues with my application.
11
13
---
12
14
13
15
# How to get speech to text session ID and transcription ID
@@ -68,7 +70,7 @@ spx help translate log
68
70
69
71
Unlike Speech SDK, [Speech to text REST API for short audio](rest-speech-to-text-short.md) doesn't automatically generate a Session ID. You need to generate it yourself and provide it within the REST request.
70
72
71
-
Generate a GUID inside your code or using any standard tool. Use the GUID value *without dashes or other dividers*. As an example we'll use `9f4ffa5113a846eba289aa98b28e766f`.
73
+
Generate a GUID inside your code or using any standard tool. Use the GUID value *without dashes or other dividers*. As an example we use `9f4ffa5113a846eba289aa98b28e766f`.
72
74
73
75
As a part of your REST request use `X-ConnectionId=<GUID>` expression. For our example, a sample request looks like this:
#Customer intent: As a developer, I need to know how to lower speech synthesis latency using Speech SDK so that I can improve the performance of my application.
14
15
---
15
16
16
17
# Lower speech synthesis latency using Speech SDK
17
18
18
-
The synthesis latency is critical to your applications.
19
-
In this article, we'll introduce the best practices to lower the latency and bring the best performance to your end users.
19
+
In this article, we introduce the best practices to lower the text to speech synthesis latency and bring the best performance to your end users.
20
20
21
21
Normally, we measure the latency by `first byte latency` and `finish latency`, as follows:
> If the synthesize text is available, just call `SpeakTextAsync` to synthesize the audio. The SDK will handle the connection.
301
+
> If the text is available, just call `SpeakTextAsync` to synthesize the audio. The SDK will handle the connection.
302
302
303
303
### Reuse SpeechSynthesizer
304
304
305
305
Another way to reduce the connection latency is to reuse the `SpeechSynthesizer` so you don't need to create a new `SpeechSynthesizer` for each synthesis.
306
-
We recommend using object pool in service scenario, see our sample code for [C#](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_server_scenario_sample.cs) and [Java](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechSynthesisScenarioSamples.java).
306
+
We recommend using object pool in service scenario. See our sample code for [C#](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_synthesis_server_scenario_sample.cs) and [Java](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechSynthesisScenarioSamples.java).
307
307
308
308
309
309
## Transmit compressed audio over the network
@@ -313,10 +313,10 @@ Meanwhile, a compressed audio format helps to save the users' network bandwidth,
313
313
314
314
We support many compressed formats including `opus`, `webm`, `mp3`, `silk`, and so on, see the full list in [SpeechSynthesisOutputFormat](/cpp/cognitive-services/speech/microsoft-cognitiveservices-speech-namespace#speechsynthesisoutputformat).
315
315
For example, the bitrate of `Riff24Khz16BitMonoPcm` format is 384 kbps, while `Audio24Khz48KBitRateMonoMp3` only costs 48 kbps.
316
-
Our Speech SDK will automatically use a compressed format for transmission when a `pcm` output format is set.
316
+
The Speech SDK automatically uses a compressed format for transmission when a `pcm` output format is set.
317
317
For Linux and Windows, `GStreamer` is required to enable this feature.
318
318
Refer [this instruction](how-to-use-codec-compressed-audio-input-streams.md) to install and configure `GStreamer` for Speech SDK.
319
-
For Android, iOS and macOS, no extra configuration is needed starting version 1.20.
319
+
For Android, iOS, and macOS, no extra configuration is needed starting version 1.20.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-migrate-to-custom-neural-voice.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,9 @@ ms.author: eur
7
7
manager: nitinme
8
8
ms.service: azure-ai-speech
9
9
ms.topic: how-to
10
-
ms.date: 1/21/2024
10
+
ms.date: 9/20/2024
11
11
ms.reviewer: v-baolianzou
12
+
#Customer intent: As a developer, I need to know how to migrate from custom voice to custom neural voice so that I can use the latest technology in my applications.
12
13
---
13
14
14
15
# Migrate from custom voice to custom neural voice
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-migrate-to-prebuilt-neural-voice.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,8 +6,9 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: how-to
9
-
ms.date: 1/21/2024
9
+
ms.date: 9/20/2024
10
10
ms.author: eur
11
+
#Customer intent: As a developer, I need to know how to migrate from prebuilt standard voice to prebuilt neural voice so that I can use the latest technology in my applications.
11
12
---
12
13
13
14
# Migrate from prebuilt standard voice to prebuilt neural voice
0 commit comments