You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-transcription.md
+11-9Lines changed: 11 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,8 +61,8 @@ Configuration parameters are provided as JSON:
61
61
{
62
62
"recordingsUrl": "<URL to the Azure blob to transcribe>",
63
63
"models": [{"Id":"<optional acoustic model ID>"},{"Id":"<optional language model ID>"}],
64
-
"locale": "<local to us, for example en-US>",
65
-
"name": "<user define name of the transcription batch>",
64
+
"locale": "<locale to us, for example en-US>",
65
+
"name": "<user defined name of the transcription batch>",
66
66
"description": "<optional description of the transcription>",
67
67
"properties": {
68
68
"ProfanityFilterMode": "Masked",
@@ -78,12 +78,14 @@ Configuration parameters are provided as JSON:
78
78
79
79
### Configuration properties
80
80
81
-
| Parameter | Description | Required / Optional |
82
-
|-----------|-------------|---------------------|
83
-
|`ProfanityFilterMode`| Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. | Optional |
84
-
|`PunctuationMode`| Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. | Optional |
85
-
|`AddWordLevelTimestamps`| Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. | Optional |
86
-
|`AddSentiment`| Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. | Optional |
81
+
Use these optional properties to configure transcription:
82
+
83
+
| Parameter | Description |
84
+
|-----------|-------------|
85
+
|`ProfanityFilterMode`| Specifies how to handle profanity in recognition results. Accepted values are `none` which disables profanity filtering, `masked` which replaces profanity with asterisks, `removed` which removes all profanity from the result, or `tags` which adds "profanity" tags. The default setting is `masked`. |
86
+
|`PunctuationMode`| Specifies how to handle punctuation in recognition results. Accepted values are `none` which disables punctuation, `dictated` which implies explicit punctuation, `automatic` which lets the decoder deal with punctuation, or `dictatedandautomatic` which implies dictated punctuation marks or automatic. |
87
+
|`AddWordLevelTimestamps`| Specifies if word level timestamps should be added to the output. Accepted values are `true` which enables word level timestamps and `false` (the default value) to disable it. |
88
+
|`AddSentiment`| Specifies sentiment should be added to the utterance. Accepted values are `true` which enables sentiment per utterance and `false` (the default value) to disable it. |
87
89
88
90
### Storage
89
91
@@ -148,7 +150,7 @@ The features uses a Sentiment model which is currently in Beta.
148
150
149
151
## Sample code
150
152
151
-
The complete sample is available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
153
+
Complete samples are available in the [GitHub sample repository](https://aka.ms/csspeech/samples) inside the `samples/batch` subdirectory.
152
154
153
155
You have to customize the sample code with your subscription information, the service region, the SAS URI pointing to the audio file to transcribe, and model IDs in case you want to use a custom acoustic or language model.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/call-center-transcription.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ Let's review some of the technology and related features Azure Speech Services o
29
29
30
30
## Azure Technology for Call Centers
31
31
32
-
Beyond, the functional aspect of the Speech Services their primary purpose -when applied to the call center- is to improve the customer experience. Three clear domains exist in this regard
32
+
Beyond the functional aspect of the Speech Services their primary purpose – when applied to the call center – is to improve the customer experience. Three clear domains exist in this regard:
33
33
34
34
* Post-call analytics that is, batch processing of call recordings
35
35
* Real-time analytics processing of the audio signal to extract various insights as the call is taking place (with sentiment being a prominent use case) and
@@ -44,21 +44,21 @@ Whether the domain is post-call or real-time, Azure offers a set of mature and e
44
44
45
45
### Speech to text (STT)
46
46
47
-
[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example – other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
47
+
[Speech-to-text](speech-to-text.md) is the most sought after feature in any call center solution. Since many of the downstream analytics processes rely on transcribed text, the word error rate (WER) is of utmost importance. One of the key challenges in call center transcription is the noise that’s prevalent in the call center (for example other agents speaking in the background), the rich variety of language locales and dialects as well as the low quality of the actual telephone signal. WER is highly correlated with how well the acoustic and language models are trained for a given locale, thus being able to customize the model to your locale is important. Our latest Unified version 4.x models are the solution to both transcription accuracy and latency. Trained with tens of thousands of hours of acoustic data and billions of lexical information Unified models are the most accurate models in the market to transcribe call center data.
48
48
49
49
### Sentiment
50
50
Gauging whether the customer had a good experience is one of the most important areas of Speech analytics when applied to the call center space. Our [Batch Transcription API](batch-transcription.md) offers sentiment analysis per utterance. You can aggregate the set of values obtained as part of a call transcript to determine the sentiment of the call for both your agents and the customer.
51
51
52
52
### Silence (non-talk)
53
-
it is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
53
+
It is not uncommon for 35 percent of a support call to be what we call non-talk time. Some scenarios which non-talk occurs are: agents looking up prior case history with a customer, agents using tools which allow them to access the customer's desktop and perform functions, customers sitting on hold waiting for a transfer and so on. It is extremely important to can gauge when silence is occurring in a call as there are number of important customer sensitivities that occur around these types of scenarios and where they occur in the call.
54
54
55
55
### Translation
56
56
Some companies are experimenting with providing translated transcripts from foreign languages support calls so that delivery managers can understand the world-wide experience of their customers. Our [translation](translation.md) capabilities are unsurpassed. We can translate audio to audio or audio to text from a large number of locales.
57
57
58
58
### Text to Speech
59
59
[Text-to-speech](text-to-speech.md) is another important area in implementing bots that interact with the customers. The typical pathway is that the customer speaks, their voice is transcribed to text, the text is analyzed for intents, a response is synthesized based on the recognized intent, and then an asset is either surfaced to the customer or a synthesized voice response is generated. Of course all of this has to occur quickly – thus latency is an important component in the success of these systems.
60
60
61
-
Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [Luis](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md).
61
+
Our end-to-end latency is pretty low considering the various technologies involved such as [Speech-to-text](speech-to-text.md), [LUIS](https://azure.microsoft.com/services/cognitive-services/language-understanding-intelligent-service/), [Bot Framework](https://dev.botframework.com/), [Text-to-Speech](text-to-speech.md).
62
62
63
63
Our new voices are also indistinguishable from human voices. You can use out voices to give your bot its unique personality.
64
64
@@ -75,10 +75,10 @@ Let's now have a look at the batch processing and the real-time pipelines for sp
75
75
For transcribing bulk of audio we developed the [Batch Transcription API](batch-transcription.md). The Batch Transcription API was developed to transcribe large amounts of audio data asynchronously. With regards to transcribing call center data, our solution is based on these pillars:
76
76
77
77
***Accuracy**: With fourth-generation Unified models, we offer unsurpassed transcription quality.
78
-
***Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job is executed it's performed faster than real-time transcription.
78
+
***Latency**: We understand that when doing bulk transcriptions, the transcriptions are needed quickly. The transcription jobs initiated via the [Batch Transcription API](batch-transcription.md) will be queued immediately, and once the job starts running it's performed faster than real-time transcription.
79
79
***Security**: We understand that calls may contain sensitive data. Rest assured that security is one of our highest priorities. Our service has obtained ISO, SOC, HIPAA, PCI certifications.
80
80
81
-
Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API]((batch-transcription.md) to asynchronously request and receive transcriptions.
81
+
Call Centers generate large volumes of audio data on a daily basis. If your business stores telephony data in a central location, such as Azure Storage, you can use the [Batch Transcription API](batch-transcription.md) to asynchronously request and receive transcriptions.
82
82
83
83
A typical solution uses these services:
84
84
@@ -94,7 +94,7 @@ Internally we are using the above technologies to support Microsoft customer cal
94
94
95
95
Some businesses are required to transcribe conversations in real-time. Real-time transcription can be used to identify key-words and trigger searches for content and resources relevant to the conversation, for monitoring sentiment, to improve accessibility, or to provide translations for customers and agents who aren't native speakers.
96
96
97
-
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, and Javascript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
97
+
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, Node.js, Objective-C, and JavaScript. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
98
98
99
99
Internally we are using the above technologies to analyze in real-time Microsoft customer calls as they happen.
100
100
@@ -106,7 +106,7 @@ Speech Services can be easily integrated in any solution by using either the [Sp
106
106
107
107
Several IVR or telephony service products (such as Genesys or AudioCodes) offer integration capabilities that can be leveraged to enable inbound and outbound audio passthrough to an Azure Service. Basically, a custom Azure service might provide a specific interface to define phone call sessions (such as Call Start or Call End) and expose a WebSocket API to receive inbound stream audio that is used with the Speech Services. Outbound responses, such as conversation transcription or connections with the Bot Framework, can be synthesized with Microsoft's text-to-speech service and returned to the IVR for playback.
108
108
109
-
Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozieki SDK, or [The Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
109
+
Another scenario is Direct SIP integration. An Azure service connects to a SIP Server, thus getting an inbound stream and an outbound stream, which is used for the speech-to-text and text-to-speech phases. To connect to a SIP Server there are commercial software offerings, such as Ozeki SDK, or [the Teams calling and meetings API](https://docs.microsoft.com/graph/api/resources/calls-api-overview?view=graph-rest-beta) (currently in beta), that are designed to support this type of scenario for audio calls.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-human-labeled-transcriptions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ Here are a few examples of normalization automatically performed on the transcri
73
73
| Pi is about 3.14 | pi is about three point one four |
74
74
It costs $3.14| it costs three fourteen |
75
75
76
-
## Mandarin Chinese (zh-cn)
76
+
## Mandarin Chinese (zh-CN)
77
77
78
78
Human-labeled transcriptions for Mandarin Chinese audio must be UTF-8 encoded with a byte-order marker. Avoid the use of half-width punctuation characters. These characters can be included inadvertently when you prepare the data in a word-processing program or scrape data from web pages. If these characters are present, make sure to update them with the appropriate full-width substitution.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-inspect-data.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ When the test status is *Succeeded*, click in the test item name to see details
41
41
42
42
To help inspect the side-by-side comparison, you can toggle various error types including insertion, deletion, and substitution. By listening to the audio and comparing recognition results in each column (showing human-labeled transcription and the results of two speech-to-text models), you can decide which model meets your needs and where improvements are needed.
43
43
44
-
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in Testing: Evaluate Accuracy.
44
+
Inspecting quality testing is useful to validate if the quality of a speech recognition endpoint is enough for an application. For an objective measure of accuracy, requiring transcribed audio, follow the instructions found in [Evaluate Accuracy](how-to-custom-speech-evaluate-data.md).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ This diagram highlights the pieces that make up the Custom Speech portal. Use th
24
24
25
25

26
26
27
-
1.[Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the custom speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
27
+
1.[Subscribe and create a project](#set-up-your-azure-account) - Create an Azure account and subscribe the Speech Services. This unified subscription gives you access to speech-to-text, text-to-speech, speech translation, and the Custom Speech portal. Then, using your Speech Services subscription, create your first Custom Speech project.
28
28
29
29
2.[Upload test data](how-to-custom-speech-test-data.md) - Upload test data (audio files) to evaluate Microsoft's speech-to-text offering for your applications, tools, and products.
30
30
@@ -47,7 +47,7 @@ Once you've created an Azure account and a Speech Services subscription, you'll
47
47
48
48
1. Get your Speech Services subscription key from the Azure portal.
49
49
2. Sign-in to the [Custom Speech portal](https://aka.ms/custom-speech).
50
-
3. Select the subscription you need to work on and creat a speech project.
50
+
3. Select the subscription you need to work on and create a speech project.
51
51
4. If you'd like to modify your subscription, use the **cog** icon located in the top navigation.
0 commit comments