[CogSvcs] Speech: a few more doc fixes

mahilleb-msft · mahilleb-msft · commit 6d38ede342fe · 2019-06-12T11:44:53.000+02:00
diff --git a/articles/cognitive-services/Speech-Service/batch-transcription.md b/articles/cognitive-services/Speech-Service/batch-transcription.md
@@ -61,8 +61,8 @@ Configuration parameters are provided as JSON:
 {
   "recordingsUrl": "<URL to the Azure blob to transcribe>",
   "models": [{"Id":"<optional acoustic model ID>"},{"Id":"<optional language model ID>"}],
-  "locale": "<local to us, for example en-US>",
-  "name": "<user define name of the transcription batch>",
+  "locale": "<locale to us, for example en-US>",
+  "name": "<user defined name of the transcription batch>",
   "description": "<optional description of the transcription>",
   "properties": {
     "ProfanityFilterMode": "Masked",
@@ -78,12 +78,14 @@ Configuration parameters are provided as JSON:
 
 ### Configuration properties
 
-| Parameter | Description | Required / Optional |
-|-----------|-------------|---------------------|
-| `ProfanityFilterMode` | Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. | Optional |
-| `PunctuationMode` | Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. | Optional |
- | `AddWordLevelTimestamps` | Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. | Optional |
- | `AddSentiment` | Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. | Optional |
+Use these optional properties to configure transcription:
+
+| Parameter | Description |
+|-----------|-------------|
+| `ProfanityFilterMode` | Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. |
+| `PunctuationMode` | Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. |
+ | `AddWordLevelTimestamps` | Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. |
+ | `AddSentiment` | Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. |
 
 ### Storage
 
@@ -148,7 +150,7 @@ The features uses a Sentiment model which is currently in Beta.
 
 ## Sample code
 
-The complete sample is available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
+Complete samples are available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
 
 You have to customize the sample code with your subscription information, the service region, the SAS URI pointing to the audio file to transcribe, and model IDs in case you want to use a custom acoustic or language model. 
 
diff --git a/articles/cognitive-services/Speech-Service/call-center-transcription.md b/articles/cognitive-services/Speech-Service/call-center-transcription.md
@@ -29,7 +29,7 @@ Let's review some of the technology and related features Azure Speech Services o
 
 ## Azure Technology for Call Centers
 
-Beyond, the functional aspect of the Speech Services their primary purpose -when applied to the call center- is to improve the customer experience. Three clear domains exist in this regard 
+Beyond the functional aspect of the Speech Services their primary purpose – when applied to the call center – is to improve the customer experience. Three clear domains exist in this regard:
 
 * Post-call analytics that is, batch processing of call recordings 
 * Real-time analytics processing of the audio signal to extract various insights as the call is taking place (with sentiment being a prominent use case) and
@@ -44,21 +44,21 @@ Whether the domain is post-call or real-time, Azure offers a set of mature and e
 
 ### Speech to text (STT) 
 
-[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example – other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
+[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
 
 ### Sentiment
 Gauging whether the customer had a good experience is one of the most important areas of Speech analytics when applied to the call center space. Our [Batch Transcription API](batch-transcription.md) offers sentiment analysis per utterance. You can aggregate the set of values obtained as part of a call transcript to determine the sentiment of the call for both your agents and the customer.
 
 ### Silence (non-talk)
-it is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
+It is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
 
 ### Translation
 Some companies are experimenting with providing translated transcripts from foreign languages support calls so that delivery managers can understand the world-wide experience of their customers. Our [translation](translation.md) capabilities are unsurpassed. We can translate audio to audio or audio to text from a large number of locales.
 
 ### Text to Speech
 [Text-to-speech](text-to-speech.md) is another important area in implementing bots that interact with the customers. The typical pathway is that the customer speaks, their voice is transcribed to text, the text is analyzed for intents, a response is synthesized based on the recognized intent, and then an asset is either surfaced to the customer or a synthesized voice response is generated. Of course all of this has to occur quickly – thus latency is an important component in the success of these systems. 
 
-Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [Luis](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md). 
+Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [LUIS](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md). 
 
 Our new voices are also indistinguishable from human voices. You can use out voices to give your bot its unique personality.
 
@@ -75,10 +75,10 @@ Let's now have a look at the batch processing and the real-time pipelines for sp
 For transcribing bulk of audio we developed the [Batch Transcription API](batch-transcription.md). The Batch Transcription API was developed to transcribe large amounts of audio data asynchronously. With regards to transcribing call center data, our solution is based on these pillars:
 
 * **Accuracy**: With fourth-generation Unified models, we offer unsurpassed transcription quality.
-* **Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job is executed it's performed faster than real-time transcription.
+* **Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job starts running it's performed faster than real-time transcription.
 * **Security**: We understand that calls may contain sensitive data. Rest assured that security is one of our highest priorities. Our service has obtained ISO, SOC, HIPAA, PCI certifications.
 
-Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API]((batch-transcription.md) to asynchronously request and receive transcriptions.
+Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API](batch-transcription.md) to asynchronously request and receive transcriptions.
 
 A typical solution uses these services:
 
@@ -94,7 +94,7 @@ Internally we are using the above technologies to support Microsoft customer cal
 
 Some businesses are required to transcribe conversations in real-time. Real-time transcription can be used to identify key-words and trigger searches for content and resources relevant to the conversation, for monitoring sentiment, to improve accessibility, or to provide translations for customers and agents who aren't native speakers.
 
-For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, and Javascript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
+For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, Objective-C, and JavaScript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
 
 Internally we are using the above technologies to analyze in real-time Microsoft customer calls as they happen.
 
@@ -106,7 +106,7 @@ Speech Services can be easily integrated in any solution by using either the [Sp
 
 Several IVR or telephony service products (such as Genesys or AudioCodes) offer integration capabilities that can be leveraged to enable inbound and outbound audio passthrough to an Azure Service. Basically, a custom Azure service might provide a specific interface to define phone call sessions (such as Call Start or Call End) and expose a WebSocket API to receive inbound stream audio that is used with the Speech Services. Outbound responses, such as conversation transcription or connections with the Bot Framework, can be synthesized with Microsoft's text-to-speech service and returned to the IVR for playback.
 
-Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozieki SDK, or [The Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
+Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozeki SDK, or [the Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
 
 ## Customize existing experiences
 
diff --git a/articles/cognitive-services/Speech-Service/how-to-custom-speech-human-labeled-transcriptions.md b/articles/cognitive-services/Speech-Service/how-to-custom-speech-human-labeled-transcriptions.md
@@ -73,7 +73,7 @@ Here are a few examples of normalization automatically performed on the transcri
 | Pi is about 3.14 | pi is about three point one four |
 It costs $3.14| it costs three fourteen |
 
-## Mandarin Chinese (zh-cn)
+## Mandarin Chinese (zh-CN)
 
 Human-labeled transcriptions for Mandarin Chinese audio must be UTF-8 encoded with a byte-order marker. Avoid the use of half-width punctuation characters. These characters can be included inadvertently when you prepare the data in a word-processing program or scrape data from web pages. If these characters are present, make sure to update them with the appropriate full-width substitution.
 
diff --git a/articles/cognitive-services/Speech-Service/how-to-custom-speech-inspect-data.md b/articles/cognitive-services/Speech-Service/how-to-custom-speech-inspect-data.md
@@ -41,7 +41,7 @@ When the test status is *Succeeded*, click in the test item name to see details
 
 To help inspect the side-by-side comparison, you can toggle various error types including insertion, deletion, and substitution. By listening to the audio and comparing recognition results in each column (showing human-labeled transcription and the results of two speech-to-text models), you can decide which model meets your needs and where improvements are needed.
 
-Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application.  For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in Testing: Evaluate Accuracy.
+Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application.  For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in [Evaluate Accuracy](how-to-custom-speech-evaluate-data.md).
 
 ## Next steps
 
diff --git a/articles/cognitive-services/Speech-Service/how-to-custom-speech-test-data.md b/articles/cognitive-services/Speech-Service/how-to-custom-speech-test-data.md
@@ -39,7 +39,7 @@ Each dataset you upload must meet the requirements for the data type that you ch
 After your dataset is uploaded, you have a few options:
 
 * You can navigate to the **Testing** tab and visually inspect audio only or audio + human-labeled transcription data.
-* You can navigate to the **Training** tab and us audio + human transcription data or related text data to train a custom model.
+* You can navigate to the **Training** tab and use audio + human transcription data or related text data to train a custom model.
 
 ## Audio data for testing
 
diff --git a/articles/cognitive-services/Speech-Service/how-to-custom-speech-train-model.md b/articles/cognitive-services/Speech-Service/how-to-custom-speech-train-model.md
@@ -22,7 +22,7 @@ If you're encountering recognition issues with your model, using human-labeled t
 
 | Use case | Data type | Data quantity |
 |----------|-----------|---------------|
-| Proper names are misrecognized | Relate text (sentences/utterances) | 10 MB to 500 MB |
+| Proper names are misrecognized | Related text (sentences/utterances) | 10 MB to 500 MB |
 | Words are misrecognized because of an accent | Related text (pronunciation) | Provide the misrecognized words |
 | Common words are deleted or misrecognized | Audio + human-labeled transcripts | 10 to 1,000 transcription hours |
 
diff --git a/articles/cognitive-services/Speech-Service/how-to-custom-speech.md b/articles/cognitive-services/Speech-Service/how-to-custom-speech.md
@@ -24,7 +24,7 @@ This diagram highlights the pieces that make up the Custom Speech portal. Use th
 
 ![Highlights the different components that make up the Custom Speech portal.](./media/custom-speech/custom-speech-overview.png)
 
-1. [Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the custom speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
+1. [Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the Custom Speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
 
 2. [Upload test data](how-to-custom-speech-test-data.md) - Upload test data (audio files) to evaluate Microsoft's speech-to-text offering for your applications, tools, and products.
 
@@ -47,7 +47,7 @@ Once you've created an Azure account and a Speech Services subscription, you'll
 
 1. Get your Speech Services subscription key from the Azure portal.
 2. Sign-in to the [Custom Speech portal](https://aka.ms/custom-speech).
-3. Select the subscription you need to work on and creat a speech project.
+3. Select the subscription you need to work on and create a speech project.
 4. If you'd like to modify your subscription, use the **cog** icon located in the top navigation.
 
 ## How to create a project
diff --git a/articles/cognitive-services/Speech-Service/how-to-use-conversation-transcription-service.md b/articles/cognitive-services/Speech-Service/how-to-use-conversation-transcription-service.md
@@ -38,7 +38,7 @@ The first step is to create voice signatures for the conversation participants.
 * The input audio wave file for creating voice signatures shall be in 16-bit samples, 16 kHz sample rate, and a single channel (Mono) format.
 * The recommended length for each audio sample is between 30 seconds and two minutes.
 
-The following example shows two different ways to create voice signature by [using the REST API.] (https://aka.ms/cts/signaturegenservice) from C#:
+The following example shows two different ways to create voice signature by [using the REST API](https://aka.ms/cts/signaturegenservice) from C#:
 
 ```csharp
 class Program
diff --git a/articles/cognitive-services/Speech-Service/speech-container-howto.md b/articles/cognitive-services/Speech-Service/speech-container-howto.md