MicrosoftDocs
diff --git a/‎articles/cognitive-services/Speech-Service/audio-processing-overview.md
Lines changed: 1 addition & 1 deletion b/‎articles/cognitive-services/Speech-Service/audio-processing-overview.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/cognitive-services/Speech-Service/batch-synthesis.md
Lines changed: 1 addition & 1 deletion b/‎articles/cognitive-services/Speech-Service/batch-synthesis.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/cognitive-services/Speech-Service/batch-transcription-create.md
Lines changed: 6 additions & 3 deletions b/‎articles/cognitive-services/Speech-Service/batch-transcription-create.md
Lines changed: 6 additions & 3 deletions
diff --git a/‎articles/cognitive-services/Speech-Service/captioning-concepts.md
Lines changed: 4 additions & 4 deletions b/‎articles/cognitive-services/Speech-Service/captioning-concepts.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/cognitive-services/Speech-Service/conversation-transcription.md
Lines changed: 2 additions & 2 deletions b/‎articles/cognitive-services/Speech-Service/conversation-transcription.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/cognitive-services/Speech-Service/custom-neural-voice.md
Lines changed: 1 addition & 1 deletion b/‎articles/cognitive-services/Speech-Service/custom-neural-voice.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/cognitive-services/Speech-Service/custom-speech-overview.md
Lines changed: 6 additions & 5 deletions b/‎articles/cognitive-services/Speech-Service/custom-speech-overview.md
Lines changed: 6 additions & 5 deletions
diff --git a/‎articles/cognitive-services/Speech-Service/gaming-concepts.md
Lines changed: 1 addition & 1 deletion b/‎articles/cognitive-services/Speech-Service/gaming-concepts.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/cognitive-services/Speech-Service/how-to-async-conversation-transcription.md
Lines changed: 2 additions & 2 deletions b/‎articles/cognitive-services/Speech-Service/how-to-async-conversation-transcription.md
Lines changed: 2 additions & 2 deletions
@@ -36,7 +36,7 @@ The Microsoft Audio Stack also powers a wide range of Microsoft products:
 ## Speech SDK integration
 
 The Speech SDK integrates Microsoft Audio Stack (MAS), allowing any application or product to use its audio processing capabilities on input audio. Some of the key Microsoft Audio Stack features available via the Speech SDK include:
-* **Realtime microphone input & file input** - Microsoft Audio Stack processing can be applied to real-time microphone input, streams, and file-based input. 
+* **Real-time microphone input & file input** - Microsoft Audio Stack processing can be applied to real-time microphone input, streams, and file-based input. 
 * **Selection of enhancements** - To allow for full control of your scenario, the SDK allows you to disable individual enhancements like dereverberation, noise suppression, automatic gain control, and acoustic echo cancellation. For example, if your scenario does not include rendering output audio that needs to be suppressed from the input audio, you have the option to disable acoustic echo cancellation.
 * **Custom microphone geometries** - The SDK allows you to provide your own custom microphone geometry information, in addition to supporting preset geometries like linear two-mic, linear four-mic, and circular 7-mic arrays (see more information on supported preset geometries at [Microphone array recommendations](speech-sdk-microphone.md#microphone-geometry)).
 * **Beamforming angles** - Specific beamforming angles can be provided to optimize audio input originating from a predetermined location, relative to the microphones.
 
@@ -19,7 +19,7 @@ The Batch synthesis API (Preview) can synthesize a large volume of text input (l
 > [!IMPORTANT]
 > The Batch synthesis API is currently in public preview. Once it's generally available, the Long Audio API will be deprecated. For more information, see [Migrate to batch synthesis API](migrate-to-batch-synthesis.md).
 
-The batch synthesis API is asynchronous and doesn't return synthesized audio in real time. You submit text files to be synthesized, poll for the status, and download the audio output when the status indicates success. The text inputs must be plain text or [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) text.
+The batch synthesis API is asynchronous and doesn't return synthesized audio in real-time. You submit text files to be synthesized, poll for the status, and download the audio output when the status indicates success. The text inputs must be plain text or [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md) text.
 
 This diagram provides a high-level overview of the workflow.
 
 
@@ -113,7 +113,7 @@ To create a transcription, use the `spx batch transcription create` command. Con
 Here's an example Speech CLI command that creates a transcription job:
 
 ```azurecli-interactive
-spx batch transcription create --api-version v3.1 --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav
+spx batch transcription create --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav
 ```
 
 You should receive a response body in the following format:
@@ -223,12 +223,15 @@ curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-
 ::: zone pivot="speech-cli"
 
 ```azurecli-interactive
-spx batch transcription create --api-version v3.1 --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav --model "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/1aae1070-7972-47e9-a977-87e3b05c457d"
+spx batch transcription create --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav --model "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/1aae1070-7972-47e9-a977-87e3b05c457d"
 ```
 
 ::: zone-end
 
-To use a Custom Speech model for batch transcription, you need the model's URI. You can retrieve the model location when you create or get a model. The top-level `self` property in the response body is the model's URI. For an example, see the JSON response example in the [Create a model](how-to-custom-speech-train-model.md?pivots=rest-api#create-a-model) guide. A [custom model deployment endpoint](how-to-custom-speech-deploy-model.md) isn't needed for the batch transcription service.
+To use a Custom Speech model for batch transcription, you need the model's URI. You can retrieve the model location when you create or get a model. The top-level `self` property in the response body is the model's URI. For an example, see the JSON response example in the [Create a model](how-to-custom-speech-train-model.md?pivots=rest-api#create-a-model) guide. 
+
+> [!TIP]
+> A [hosted deployment endpoint](how-to-custom-speech-deploy-model.md) isn't required to use custom speech with the batch transcription service. You can conserve resources if the [custom speech model](how-to-custom-speech-train-model.md) is only used for batch transcription.
 
 Batch transcription requests for expired models will fail with a 4xx error. You'll want to set the `model` property to a base model or custom model that hasn't yet expired. Otherwise don't include the `model` property to always use the latest base model. For more information, see [Choose a model](how-to-custom-speech-create-project.md#choose-your-model) and [Custom Speech model lifecycle](how-to-custom-speech-model-and-endpoint-lifecycle.md).
 
 
@@ -36,7 +36,7 @@ The following are aspects to consider when using captioning:
 > 
 > Try the [Azure Video Indexer](../../azure-video-indexer/video-indexer-overview.md) as a demonstration of how you can get captions for videos that you upload. 
 
-Captioning can accompany real time or pre-recorded speech. Whether you're showing captions in real time or with a recording, you can use the [Speech SDK](speech-sdk.md) or [Speech CLI](spx-overview.md) to recognize speech and get transcriptions. You can also use the [Batch transcription API](batch-transcription.md) for pre-recorded video. 
+Captioning can accompany real-time or pre-recorded speech. Whether you're showing captions in real-time or with a recording, you can use the [Speech SDK](speech-sdk.md) or [Speech CLI](spx-overview.md) to recognize speech and get transcriptions. You can also use the [Batch transcription API](batch-transcription.md) for pre-recorded video. 
 
 ## Caption output format
 
@@ -68,13 +68,13 @@ Welcome to applied Mathematics course 201.
 
 ## Input audio to the Speech service
 
-For real time captioning, use a microphone or audio input stream instead of file input. For examples of how to recognize speech from a microphone, see the [Speech to text quickstart](get-started-speech-to-text.md) and [How to recognize speech](how-to-recognize-speech.md) documentation. For more information about streaming, see [How to use the audio input stream](how-to-use-audio-input-streams.md).
+For real-time captioning, use a microphone or audio input stream instead of file input. For examples of how to recognize speech from a microphone, see the [Speech to text quickstart](get-started-speech-to-text.md) and [How to recognize speech](how-to-recognize-speech.md) documentation. For more information about streaming, see [How to use the audio input stream](how-to-use-audio-input-streams.md).
 
 For captioning of a prerecording, send file input to the Speech service. For more information, see [How to use compressed input audio](how-to-use-codec-compressed-audio-input-streams.md).
 
 ## Caption and speech synchronization 
 
-You'll want to synchronize captions with the audio track, whether it's done in real time or with a prerecording. 
+You'll want to synchronize captions with the audio track, whether it's done in real-time or with a prerecording. 
 
 The Speech service returns the offset and duration of the recognized speech. 
 
@@ -91,7 +91,7 @@ Consider when to start displaying captions, and how many words to show at a time
 
 For captioning of prerecorded speech or wherever latency isn't a concern, you could wait for the complete transcription of each utterance before displaying any words. Given the final offset and duration of each word in an utterance, you know when to show subsequent words at pace with the soundtrack.
 
-Real time captioning presents tradeoffs with respect to latency versus accuracy. You could show the text from each `Recognizing` event as soon as possible. However, if you can accept some latency, you can improve the accuracy of the caption by displaying the text from the `Recognized` event. There's also some middle ground, which is referred to as "stable partial results". 
+Real-time captioning presents tradeoffs with respect to latency versus accuracy. You could show the text from each `Recognizing` event as soon as possible. However, if you can accept some latency, you can improve the accuracy of the caption by displaying the text from the `Recognized` event. There's also some middle ground, which is referred to as "stable partial results". 
 
 You can request that the Speech service return fewer `Recognizing` events that are more accurate. This is done by setting the `SpeechServiceResponse_StablePartialResultThreshold` property to a value between `0` and `2147483647`. The value that you set is the number of times a word has to be recognized before the Speech service returns a `Recognizing` event. For example, if you set the `SpeechServiceResponse_StablePartialResultThreshold` property value to `5`, the Speech service will affirm recognition of a word at least five times before returning the partial results to you with a `Recognizing` event.
 
 
@@ -41,7 +41,7 @@ See the real-time conversation transcription [quickstart](how-to-use-conversatio
 
 ## Use cases
 
-To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in real time. Conversation transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay.
+To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in real-time. Conversation transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay.
 
 Meeting participants can focus on the meeting and leave note-taking to conversation transcription. Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting.
 
@@ -90,4 +90,4 @@ Currently, conversation transcription supports [all speech-to-text languages](la
 ## Next steps
 
 > [!div class="nextstepaction"]
-> [Transcribe conversations in real time](how-to-use-conversation-transcription.md)
+> [Transcribe conversations in real-time](how-to-use-conversation-transcription.md)
@@ -48,7 +48,7 @@ Here's an overview of the steps to create a custom neural voice in Speech Studio
 1. [Test your voice](how-to-custom-voice-create-voice.md#test-your-voice-model). Prepare test scripts for your voice model that cover the different use cases for your apps. It’s a good idea to use scripts within and outside the training dataset, so you can test the quality more broadly for different content.
 1. [Deploy and use your voice model](how-to-deploy-and-use-endpoint.md) in your apps.
 
-You can tune, adjust, and use your custom voice, similarly as you would use a prebuilt neural voice. Convert text into speech in real time, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
+You can tune, adjust, and use your custom voice, similarly as you would use a prebuilt neural voice. Convert text into speech in real-time, or generate audio content offline with text input. You can do this by using the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
 
 The style and the characteristics of the trained voice model depend on the style and the quality of the recordings from the voice talent used for training. However, you can make several adjustments by using [SSML (Speech Synthesis Markup Language)](./speech-synthesis-markup.md?tabs=csharp) when you make the API calls to your voice model to generate synthetic speech. SSML is the markup language used to communicate with the text-to-speech service to convert text into audio. The adjustments you can make include change of pitch, rate, intonation, and pronunciation correction. If the voice model is built with multiple styles, you can also use SSML to switch the styles.
 
 
@@ -15,15 +15,12 @@ ms.custom: contperf-fy21q2, references_regions
 
 # What is Custom Speech?
 
-With Custom Speech, you can evaluate and improve the Microsoft speech-to-text accuracy for your applications and products. 
+With Custom Speech, you can evaluate and improve the accuracy of speech recognition for your applications and products. A custom speech model can be used for [real-time speech-to-text](speech-to-text.md), [speech translation](speech-translation.md), and [batch transcription](batch-transcription.md).
 
-Out of the box, speech to text utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works very well in most speech recognition scenarios.
+Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works very well in most speech recognition scenarios.
 
 A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.  
 
-> [!NOTE]
-> You pay to use Custom Speech models, but you are not charged for training a model. Usage includes hosting of your deployed custom endpoint in addition to using the endpoint for speech-to-text. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
-
 ## How does it work?
 
 With Custom Speech, you can upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint.
@@ -37,7 +34,11 @@ Here's more information about the sequence of steps shown in the previous diagra
 1. [Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data. 
 1. [Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech-to-text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if additional training is required. 
 1. [Train a model](how-to-custom-speech-train-model.md). Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
+    > [!NOTE]
+    > You pay for Custom Speech model usage and endpoint hosting, but you are not charged for training a model.
 1. [Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
+    > [!TIP]
+    > A hosted deployment endpoint isn't required to use Custom Speech with the [Batch transcription API](batch-transcription.md). You can conserve resources if the custom speech model is only used for batch transcription. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
 
 ## Next steps
 
 
@@ -58,7 +58,7 @@ It's not unusual that players in the same game session natively speak different
 For an example, see the [Speech translation quickstart](get-started-speech-translation.md).
 
 > [!NOTE]
-> Besides the Speech service, you can also use the [Translator service](../translator/translator-overview.md). To execute text translation between supported source and target languages in real time see [Text translation](../translator/text-translation-overview.md). 
+> Besides the Speech service, you can also use the [Translator service](../translator/translator-overview.md). To execute text translation between supported source and target languages in real-time see [Text translation](../translator/text-translation-overview.md). 
 
 ## Next steps
 
 
@@ -19,9 +19,9 @@ In this article, asynchronous Conversation Transcription is demonstrated using t
 
 ## Asynchronous vs. real-time + asynchronous
 
-With asynchronous transcription, you stream the conversation audio, but don't need a transcription returned in real time. Instead, after the audio is sent, use the `conversationId` of `Conversation` to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you'll get a `RemoteConversationTranscriptionResult`.
+With asynchronous transcription, you stream the conversation audio, but don't need a transcription returned in real-time. Instead, after the audio is sent, use the `conversationId` of `Conversation` to query for the status of the asynchronous transcription. When the asynchronous transcription is ready, you'll get a `RemoteConversationTranscriptionResult`.
 
-With real-time plus asynchronous, you get the transcription in real time, but also get the transcription by querying with the `conversationId` (similar to asynchronous scenario).
+With real-time plus asynchronous, you get the transcription in real-time, but also get the transcription by querying with the `conversationId` (similar to asynchronous scenario).
 
 Two steps are required to accomplish asynchronous transcription. The first step is to upload the audio, choosing either asynchronous only or real-time plus asynchronous. The second step is to get the transcription results.