Skip to content

Commit 6d38ede

Browse files
committed
[CogSvcs] Speech: a few more doc fixes
1 parent 7a17b3d commit 6d38ede

9 files changed

+28
-26
lines changed

articles/cognitive-services/Speech-Service/batch-transcription.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ Configuration parameters are provided as JSON:
6161
{
6262
"recordingsUrl": "<URL to the Azure blob to transcribe>",
6363
"models": [{"Id":"<optional acoustic model ID>"},{"Id":"<optional language model ID>"}],
64-
"locale": "<local to us, for example en-US>",
65-
"name": "<user define name of the transcription batch>",
64+
"locale": "<locale to us, for example en-US>",
65+
"name": "<user defined name of the transcription batch>",
6666
"description": "<optional description of the transcription>",
6767
"properties": {
6868
"ProfanityFilterMode": "Masked",
@@ -78,12 +78,14 @@ Configuration parameters are provided as JSON:
7878
7979
### Configuration properties
8080

81-
| Parameter | Description | Required / Optional |
82-
|-----------|-------------|---------------------|
83-
| `ProfanityFilterMode` | Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. | Optional |
84-
| `PunctuationMode` | Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. | Optional |
85-
| `AddWordLevelTimestamps` | Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. | Optional |
86-
| `AddSentiment` | Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. | Optional |
81+
Use these optional properties to configure transcription:
82+
83+
| Parameter | Description |
84+
|-----------|-------------|
85+
| `ProfanityFilterMode` | Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. |
86+
| `PunctuationMode` | Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. |
87+
| `AddWordLevelTimestamps` | Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. |
88+
| `AddSentiment` | Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. |
8789

8890
### Storage
8991

@@ -148,7 +150,7 @@ The features uses a Sentiment model which is currently in Beta.
148150

149151
## Sample code
150152

151-
The complete sample is available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
153+
Complete samples are available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
152154

153155
You have to customize the sample code with your subscription information, the service region, the SAS URI pointing to the audio file to transcribe, and model IDs in case you want to use a custom acoustic or language model.
154156

articles/cognitive-services/Speech-Service/call-center-transcription.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Let's review some of the technology and related features Azure Speech Services o
2929
3030
## Azure Technology for Call Centers
3131

32-
Beyond, the functional aspect of the Speech Services their primary purpose -when applied to the call center- is to improve the customer experience. Three clear domains exist in this regard
32+
Beyond the functional aspect of the Speech Services their primary purpose when applied to the call centeris to improve the customer experience. Three clear domains exist in this regard:
3333

3434
* Post-call analytics that is, batch processing of call recordings
3535
* Real-time analytics processing of the audio signal to extract various insights as the call is taking place (with sentiment being a prominent use case) and
@@ -44,21 +44,21 @@ Whether the domain is post-call or real-time, Azure offers a set of mature and e
4444

4545
### Speech to text (STT)
4646

47-
[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
47+
[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
4848

4949
### Sentiment
5050
Gauging whether the customer had a good experience is one of the most important areas of Speech analytics when applied to the call center space. Our [Batch Transcription API](batch-transcription.md) offers sentiment analysis per utterance. You can aggregate the set of values obtained as part of a call transcript to determine the sentiment of the call for both your agents and the customer.
5151

5252
### Silence (non-talk)
53-
it is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
53+
It is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
5454

5555
### Translation
5656
Some companies are experimenting with providing translated transcripts from foreign languages support calls so that delivery managers can understand the world-wide experience of their customers. Our [translation](translation.md) capabilities are unsurpassed. We can translate audio to audio or audio to text from a large number of locales.
5757

5858
### Text to Speech
5959
[Text-to-speech](text-to-speech.md) is another important area in implementing bots that interact with the customers. The typical pathway is that the customer speaks, their voice is transcribed to text, the text is analyzed for intents, a response is synthesized based on the recognized intent, and then an asset is either surfaced to the customer or a synthesized voice response is generated. Of course all of this has to occur quickly – thus latency is an important component in the success of these systems.
6060

61-
Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [Luis](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md).
61+
Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [LUIS](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md).
6262

6363
Our new voices are also indistinguishable from human voices. You can use out voices to give your bot its unique personality.
6464

@@ -75,10 +75,10 @@ Let's now have a look at the batch processing and the real-time pipelines for sp
7575
For transcribing bulk of audio we developed the [Batch Transcription API](batch-transcription.md). The Batch Transcription API was developed to transcribe large amounts of audio data asynchronously. With regards to transcribing call center data, our solution is based on these pillars:
7676

7777
* **Accuracy**: With fourth-generation Unified models, we offer unsurpassed transcription quality.
78-
* **Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job is executed it's performed faster than real-time transcription.
78+
* **Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job starts running it's performed faster than real-time transcription.
7979
* **Security**: We understand that calls may contain sensitive data. Rest assured that security is one of our highest priorities. Our service has obtained ISO, SOC, HIPAA, PCI certifications.
8080

81-
Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API]((batch-transcription.md) to asynchronously request and receive transcriptions.
81+
Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API](batch-transcription.md) to asynchronously request and receive transcriptions.
8282

8383
A typical solution uses these services:
8484

@@ -94,7 +94,7 @@ Internally we are using the above technologies to support Microsoft customer cal
9494

9595
Some businesses are required to transcribe conversations in real-time. Real-time transcription can be used to identify key-words and trigger searches for content and resources relevant to the conversation, for monitoring sentiment, to improve accessibility, or to provide translations for customers and agents who aren't native speakers.
9696

97-
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, and Javascript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
97+
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, Objective-C, and JavaScript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
9898

9999
Internally we are using the above technologies to analyze in real-time Microsoft customer calls as they happen.
100100

@@ -106,7 +106,7 @@ Speech Services can be easily integrated in any solution by using either the [Sp
106106

107107
Several IVR or telephony service products (such as Genesys or AudioCodes) offer integration capabilities that can be leveraged to enable inbound and outbound audio passthrough to an Azure Service. Basically, a custom Azure service might provide a specific interface to define phone call sessions (such as Call Start or Call End) and expose a WebSocket API to receive inbound stream audio that is used with the Speech Services. Outbound responses, such as conversation transcription or connections with the Bot Framework, can be synthesized with Microsoft's text-to-speech service and returned to the IVR for playback.
108108

109-
Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozieki SDK, or [The Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
109+
Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozeki SDK, or [the Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
110110

111111
## Customize existing experiences
112112

articles/cognitive-services/Speech-Service/how-to-custom-speech-human-labeled-transcriptions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Here are a few examples of normalization automatically performed on the transcri
7373
| Pi is about 3.14 | pi is about three point one four |
7474
It costs $3.14| it costs three fourteen |
7575

76-
## Mandarin Chinese (zh-cn)
76+
## Mandarin Chinese (zh-CN)
7777

7878
Human-labeled transcriptions for Mandarin Chinese audio must be UTF-8 encoded with a byte-order marker. Avoid the use of half-width punctuation characters. These characters can be included inadvertently when you prepare the data in a word-processing program or scrape data from web pages. If these characters are present, make sure to update them with the appropriate full-width substitution.
7979

articles/cognitive-services/Speech-Service/how-to-custom-speech-inspect-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ When the test status is *Succeeded*, click in the test item name to see details
4141

4242
To help inspect the side-by-side comparison, you can toggle various error types including insertion, deletion, and substitution. By listening to the audio and comparing recognition results in each column (showing human-labeled transcription and the results of two speech-to-text models), you can decide which model meets your needs and where improvements are needed.
4343

44-
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in Testing: Evaluate Accuracy.
44+
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in [Evaluate Accuracy](how-to-custom-speech-evaluate-data.md).
4545

4646
## Next steps
4747

articles/cognitive-services/Speech-Service/how-to-custom-speech-test-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Each dataset you upload must meet the requirements for the data type that you ch
3939
After your dataset is uploaded, you have a few options:
4040

4141
* You can navigate to the **Testing** tab and visually inspect audio only or audio + human-labeled transcription data.
42-
* You can navigate to the **Training** tab and us audio + human transcription data or related text data to train a custom model.
42+
* You can navigate to the **Training** tab and use audio + human transcription data or related text data to train a custom model.
4343

4444
## Audio data for testing
4545

articles/cognitive-services/Speech-Service/how-to-custom-speech-train-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If you're encountering recognition issues with your model, using human-labeled t
2222

2323
| Use case | Data type | Data quantity |
2424
|----------|-----------|---------------|
25-
| Proper names are misrecognized | Relate text (sentences/utterances) | 10 MB to 500 MB |
25+
| Proper names are misrecognized | Related text (sentences/utterances) | 10 MB to 500 MB |
2626
| Words are misrecognized because of an accent | Related text (pronunciation) | Provide the misrecognized words |
2727
| Common words are deleted or misrecognized | Audio + human-labeled transcripts | 10 to 1,000 transcription hours |
2828

articles/cognitive-services/Speech-Service/how-to-custom-speech.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This diagram highlights the pieces that make up the Custom Speech portal. Use th
2424

2525
![Highlights the different components that make up the Custom Speech portal.](./media/custom-speech/custom-speech-overview.png)
2626

27-
1. [Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the custom speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
27+
1. [Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the Custom Speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
2828

2929
2. [Upload test data](how-to-custom-speech-test-data.md) - Upload test data (audio files) to evaluate Microsoft's speech-to-text offering for your applications, tools, and products.
3030

@@ -47,7 +47,7 @@ Once you've created an Azure account and a Speech Services subscription, you'll
4747

4848
1. Get your Speech Services subscription key from the Azure portal.
4949
2. Sign-in to the [Custom Speech portal](https://aka.ms/custom-speech).
50-
3. Select the subscription you need to work on and creat a speech project.
50+
3. Select the subscription you need to work on and create a speech project.
5151
4. If you'd like to modify your subscription, use the **cog** icon located in the top navigation.
5252

5353
## How to create a project

articles/cognitive-services/Speech-Service/how-to-use-conversation-transcription-service.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ The first step is to create voice signatures for the conversation participants.
3838
* The input audio wave file for creating voice signatures shall be in 16-bit samples, 16 kHz sample rate, and a single channel (Mono) format.
3939
* The recommended length for each audio sample is between 30 seconds and two minutes.
4040

41-
The following example shows two different ways to create voice signature by [using the REST API.] (https://aka.ms/cts/signaturegenservice) from C#:
41+
The following example shows two different ways to create voice signature by [using the REST API](https://aka.ms/cts/signaturegenservice) from C#:
4242

4343
```csharp
4444
class Program

0 commit comments

Comments
 (0)