You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/audio-processing-overview.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ The Microsoft Audio Stack also powers a wide range of Microsoft products:
36
36
## Speech SDK integration
37
37
38
38
The Speech SDK integrates Microsoft Audio Stack (MAS), allowing any application or product to use its audio processing capabilities on input audio. Some of the key Microsoft Audio Stack features available via the Speech SDK include:
39
-
***Realtime microphone input & file input** - Microsoft Audio Stack processing can be applied to real-time microphone input, streams, and file-based input.
39
+
***Real-time microphone input & file input** - Microsoft Audio Stack processing can be applied to real-time microphone input, streams, and file-based input.
40
40
***Selection of enhancements** - To allow for full control of your scenario, the SDK allows you to disable individual enhancements like dereverberation, noise suppression, automatic gain control, and acoustic echo cancellation. For example, if your scenario does not include rendering output audio that needs to be suppressed from the input audio, you have the option to disable acoustic echo cancellation.
41
41
***Custom microphone geometries** - The SDK allows you to provide your own custom microphone geometry information, in addition to supporting preset geometries like linear two-mic, linear four-mic, and circular 7-mic arrays (see more information on supported preset geometries at [Microphone array recommendations](speech-sdk-microphone.md#microphone-geometry)).
42
42
***Beamforming angles** - Specific beamforming angles can be provided to optimize audio input originating from a predetermined location, relative to the microphones.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-synthesis.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ The Batch synthesis API (Preview) can synthesize a large volume of text input (l
19
19
> [!IMPORTANT]
20
20
> The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API will be deprecated. For more information, see [Migrate to batch synthesis API](migrate-to-batch-synthesis.md).
21
21
22
-
The batch synthesis API is asynchronous and doesn't return synthesized audio in realtime. You submit text files to be synthesized, poll for the status, and download the audio output when the status indicates success. The text inputs must be plain text or [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) text.
22
+
The batch synthesis API is asynchronous and doesn't return synthesized audio in real-time. You submit text files to be synthesized, poll for the status, and download the audio output when the status indicates success. The text inputs must be plain text or [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) text.
23
23
24
24
This diagram provides a high-level overview of the workflow.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/captioning-concepts.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ The following are aspects to consider when using captioning:
36
36
>
37
37
> Try the [Azure Video Indexer](../../azure-video-indexer/video-indexer-overview.md) as a demonstration of how you can get captions for videos that you upload.
38
38
39
-
Captioning can accompany realtime or pre-recorded speech. Whether you're showing captions in realtime or with a recording, you can use the [Speech SDK](speech-sdk.md) or [Speech CLI](spx-overview.md) to recognize speech and get transcriptions. You can also use the [Batch transcription API](batch-transcription.md) for pre-recorded video.
39
+
Captioning can accompany real-time or pre-recorded speech. Whether you're showing captions in real-time or with a recording, you can use the [Speech SDK](speech-sdk.md) or [Speech CLI](spx-overview.md) to recognize speech and get transcriptions. You can also use the [Batch transcription API](batch-transcription.md) for pre-recorded video.
40
40
41
41
## Caption output format
42
42
@@ -68,13 +68,13 @@ Welcome to applied Mathematics course 201.
68
68
69
69
## Input audio to the Speech service
70
70
71
-
For realtime captioning, use a microphone or audio input stream instead of file input. For examples of how to recognize speech from a microphone, see the [Speech to text quickstart](get-started-speech-to-text.md) and [How to recognize speech](how-to-recognize-speech.md) documentation. For more information about streaming, see [How to use the audio input stream](how-to-use-audio-input-streams.md).
71
+
For real-time captioning, use a microphone or audio input stream instead of file input. For examples of how to recognize speech from a microphone, see the [Speech to text quickstart](get-started-speech-to-text.md) and [How to recognize speech](how-to-recognize-speech.md) documentation. For more information about streaming, see [How to use the audio input stream](how-to-use-audio-input-streams.md).
72
72
73
73
For captioning of a prerecording, send file input to the Speech service. For more information, see [How to use compressed input audio](how-to-use-codec-compressed-audio-input-streams.md).
74
74
75
75
## Caption and speech synchronization
76
76
77
-
You'll want to synchronize captions with the audio track, whether it's done in realtime or with a prerecording.
77
+
You'll want to synchronize captions with the audio track, whether it's done in real-time or with a prerecording.
78
78
79
79
The Speech service returns the offset and duration of the recognized speech.
80
80
@@ -91,7 +91,7 @@ Consider when to start displaying captions, and how many words to show at a time
91
91
92
92
For captioning of prerecorded speech or wherever latency isn't a concern, you could wait for the complete transcription of each utterance before displaying any words. Given the final offset and duration of each word in an utterance, you know when to show subsequent words at pace with the soundtrack.
93
93
94
-
Realtime captioning presents tradeoffs with respect to latency versus accuracy. You could show the text from each `Recognizing` event as soon as possible. However, if you can accept some latency, you can improve the accuracy of the caption by displaying the text from the `Recognized` event. There's also some middle ground, which is referred to as "stable partial results".
94
+
Real-time captioning presents tradeoffs with respect to latency versus accuracy. You could show the text from each `Recognizing` event as soon as possible. However, if you can accept some latency, you can improve the accuracy of the caption by displaying the text from the `Recognized` event. There's also some middle ground, which is referred to as "stable partial results".
95
95
96
96
You can request that the Speech service return fewer `Recognizing` events that are more accurate. This is done by setting the `SpeechServiceResponse_StablePartialResultThreshold` property to a value between `0` and `2147483647`. The value that you set is the number of times a word has to be recognized before the Speech service returns a `Recognizing` event. For example, if you set the `SpeechServiceResponse_StablePartialResultThreshold` property value to `5`, the Speech service will affirm recognition of a word at least five times before returning the partial results to you with a `Recognizing` event.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/conversation-transcription.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ See the real-time conversation transcription [quickstart](how-to-use-conversatio
41
41
42
42
## Use cases
43
43
44
-
To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in realtime. Conversation transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay.
44
+
To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in real-time. Conversation transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay.
45
45
46
46
Meeting participants can focus on the meeting and leave note-taking to conversation transcription. Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,7 @@ Here's an overview of the steps to create a custom neural voice in Speech Studio
48
48
1.[Test your voice](how-to-custom-voice-create-voice.md#test-your-voice-model). Prepare test scripts for your voice model that cover the different use cases for your apps. It’s a good idea to use scripts within and outside the training dataset, so you can test the quality more broadly for different content.
49
49
1.[Deploy and use your voice model](how-to-deploy-and-use-endpoint.md) in your apps.
50
50
51
-
You can tune, adjust, and use your custom voice, similarly as you would use a prebuilt neural voice. Convert text into speech in realtime, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
51
+
You can tune, adjust, and use your custom voice, similarly as you would use a prebuilt neural voice. Convert text into speech in real-time, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
52
52
53
53
The style and the characteristics of the trained voice model depend on the style and the quality of the recordings from the voice talent used for training. However, you can make several adjustments by using [SSML (Speech Synthesis Markup Language)](./speech-synthesis-markup.md?tabs=csharp) when you make the API calls to your voice model to generate synthetic speech. SSML is the markup language used to communicate with the text-to-speech service to convert text into audio. The adjustments you can make include change of pitch, rate, intonation, and pronunciation correction. If the voice model is built with multiple styles, you can also use SSML to switch the styles.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/gaming-concepts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ It's not unusual that players in the same game session natively speak different
58
58
For an example, see the [Speech translation quickstart](get-started-speech-translation.md).
59
59
60
60
> [!NOTE]
61
-
> Besides the Speech service, you can also use the [Translator service](../translator/translator-overview.md). To execute text translation between supported source and target languages in realtime see [Text translation](../translator/text-translation-overview.md).
61
+
> Besides the Speech service, you can also use the [Translator service](../translator/translator-overview.md). To execute text translation between supported source and target languages in real-time see [Text translation](../translator/text-translation-overview.md).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-async-conversation-transcription.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,9 +19,9 @@ In this article, asynchronous Conversation Transcription is demonstrated using t
19
19
20
20
## Asynchronous vs. real-time + asynchronous
21
21
22
-
With asynchronous transcription, you stream the conversation audio, but don't need a transcription returned in realtime. Instead, after the audio is sent, use the `conversationId` of `Conversation` to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you'll get a `RemoteConversationTranscriptionResult`.
22
+
With asynchronous transcription, you stream the conversation audio, but don't need a transcription returned in real-time. Instead, after the audio is sent, use the `conversationId` of `Conversation` to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you'll get a `RemoteConversationTranscriptionResult`.
23
23
24
-
With real-time plus asynchronous, you get the transcription in realtime, but also get the transcription by querying with the `conversationId` (similar to asynchronous scenario).
24
+
With real-time plus asynchronous, you get the transcription in real-time, but also get the transcription by querying with the `conversationId` (similar to asynchronous scenario).
25
25
26
26
Two steps are required to accomplish asynchronous transcription. The first step is to upload the audio, choosing either asynchronous only or real-time plus asynchronous. The second step is to get the transcription results.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ You can use the [Audio Content Creation](https://speech.microsoft.com/portal/aud
18
18
19
19
Build highly natural audio content for a variety of scenarios, such as audiobooks, news broadcasts, video narrations, and chat bots. With Audio Content Creation, you can efficiently fine-tune text-to-speech voices and design customized audio experiences.
20
20
21
-
The tool is based on [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). It allows you to adjust text-to-speech output attributes in realtime or batch synthesis, such as voice characters, voice styles, speaking speed, pronunciation, and prosody.
21
+
The tool is based on [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). It allows you to adjust text-to-speech output attributes in real-time or batch synthesis, such as voice characters, voice styles, speaking speed, pronunciation, and prosody.
22
22
23
23
- No-code approach: You can use the Audio Content Creation tool for text-to-speech synthesis without writing any code. The output audio might be the final deliverable that you want. For example, you can use the output audio for a podcast or a video narration.
24
24
- Developer-friendly: You can listen to the output audio and adjust the SSML to improve speech synthesis. Then you can use the [Speech SDK](speech-sdk.md) or [Speech CLI](spx-basics.md) to integrate the SSML into your applications. For example, you can use the SSML for building a chat bot.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/includes/how-to/conversation-transcription/real-time-csharp.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,7 @@ Running the function `GetVoiceSignatureString()` returns a voice signature strin
89
89
90
90
## Transcribe conversations
91
91
92
-
The following sample code demonstrates how to transcribe conversations in realtime for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
92
+
The following sample code demonstrates how to transcribe conversations in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
93
93
94
94
If you don't use pre-enrolled user profiles, it will take a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/includes/how-to/conversation-transcription/real-time-javascript.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,7 +58,7 @@ Running this script returns a voice signature string in the variable `voiceSigna
58
58
59
59
## Transcribe conversations
60
60
61
-
The following sample code demonstrates how to transcribe conversations in realtime for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
61
+
The following sample code demonstrates how to transcribe conversations in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
62
62
63
63
If you don't use pre-enrolled user profiles, it will take a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc.
0 commit comments