You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speech-to-text offers an array of formatting features to ensure that the transcribed text is clear and legible. Below is an overview of these features and how each one is used to improve the overall clarity of the final text output.
19
19
20
-
## Standard formatting
20
+
## ITN
21
21
22
-
### ITN
22
+
Inverse Text Normalization (ITN) is a process that converts spoken words into their written form. For example, the spoken word "four" is converted to the written form "4". This process is performed by the speech-to-text service and isn't configurable. Some of the supported text formats include dates, times, decimals, currencies, addresses, emails, and phone numbers. This allows Speech users to speak naturally into their device, and the service formats text as expected. The following table shows the ITN rules that are applied to the text output.
23
23
24
-
Inverse Text Normalization (ITN) is a process that converts spoken words into their written form. For example, the spoken word "four" is converted to the written form "4". This process is performed by the speech-to-text service and is not configurable by the user. Some of the supported text formats include dates, times, decimals, currencies, addresses, emails, and phone numbers. This allows Speech users to speak naturally into their device, and the service formats text as expected. The following table shows the ITN rules that are applied to the text output.
25
-
26
-
|Spoken Form|Display Text|
24
+
|Recognized speech|Display text|
27
25
|---|---|
28
-
|"that will cost nine hundred dollars"|That will cost $900.|
29
-
|"my phone number is one eight hundred, four five six, eight nine ten"|My phone number is 1-800-456-8910.|
30
-
|"the time is six forty five p m"|The time is 6:45 PM.|
31
-
|"I live on thirty five lexington avenue"|I live on 35 Lexington Ave.|
32
-
|"the answer is six point five"|The answer is 6.5.|
33
-
|"send it to support at help dot com"|Send it to [email protected].|
34
-
26
+
|that will cost nine hundred dollars|That will cost $900.|
27
+
|my phone number is one eight hundred, four five six, eight nine ten|My phone number is 1-800-456-8910.|
28
+
|the time is six forty five p m|The time is 6:45 PM.|
29
+
|I live on thirty five lexington avenue|I live on 35 Lexington Ave.|
Speech-to-text’s model knows which words should be capitalized and automatically does so in order to make the text more readable. It will capitalize proper nouns and words at the beginning of a sentence, as well as anywhere else where it makes the overall text more grammatically correct. Some examples are listed below:
35
+
Speech-to-text models recognize words that should be capitalized to improve readability, accuracy, and grammar. For example, the Speech service will automatically capitalize proper nouns and words at the beginning of a sentence. Some examples are shown in this table.
39
36
40
-
|Spoken Form|Display Text|
37
+
|Recognized speech|Display text|
41
38
|---|---|
42
-
|"i got an iphone x r"|I got an iPhone XR.|
43
-
|"my name is jennifer smith"|My name is Jennifer Smith.|
44
-
|"i want to visit new york city"|I want to visit New York City.|
45
-
|"i need to service my toyota"|I need to service my Toyota.|
46
-
39
+
|i got an iphone x r|I got an iPhone XR.|
40
+
|my name is jennifer smith|My name is Jennifer Smith.|
41
+
|i want to visit new york city|I want to visit New York City.|
42
+
|i need to service my toyota|I need to service my Toyota.|
47
43
48
-
###Disfluency removal
44
+
## Disfluency removal
49
45
50
46
When speaking, it is common for someone to stutter, duplicate words, and say filler words like "uhm" or "uh". Speech-to-text can recognize these disfluencies and remove them from the transcribed text so that it is cleaner. This is great for transcribing live unscripted speeches to read them back later. Some examples are shown in this table.
51
47
52
-
|Spoken Form|Display Text|
48
+
|Recognized speech|Display text|
53
49
|---|---|
54
-
|"i uh said that we can go to the uhmm movies"|I said that we can go to the movies.|
55
-
|"its its not that big of uhm a deal"|It's not that big of a deal.|
56
-
|"umm i think tomorrow should work"|I think tomorrow should work.|
50
+
|i uh said that we can go to the uhmm movies|I said that we can go to the movies.|
51
+
|its its not that big of uhm a deal|It's not that big of a deal.|
52
+
|umm i think tomorrow should work|I think tomorrow should work.|
57
53
58
-
59
-
## Configurable formatting options
60
-
61
-
### Explicit Punctuation
54
+
## Explicit Punctuation
62
55
63
56
When using Speech-to-text, you have the option to speak aloud any punctuation you may want to use in order to make your text more legible. This is especially useful in a situation where you need to use complex punctuation or want your transcribed text to be read by someone else, as it allows you to include grammar with your voice rather than having to retroactively input it later. Some examples are shown in this table.
64
57
65
-
|Spoken Form|Display Text|
58
+
|Recognized speech|Display text|
66
59
|---|---|
67
-
|"they entered the room dot dot dot"|They entered the room...|
68
-
|"i heart emoji you period"|I <3 you.|
69
-
|"the options are apple forward slash banana forward slash orange period"|The options are apple/banana/orange.|
70
-
|"are you sure question mark"|Are you sure?|
71
-
60
+
|they entered the room dot dot dot|They entered the room...|
61
+
|i heart emoji you period|I <3 you.|
62
+
|the options are apple forward slash banana forward slash orange period|The options are apple/banana/orange.|
63
+
|are you sure question mark|Are you sure?|
72
64
73
-
###Auto Punctuation
65
+
## Auto Punctuation
74
66
75
67
Sometimes it may seem tedious to have to speak every punctuation mark out loud, so Speech-to-text also offers the ability to automatically punctuate your text and improve clarity. This is great option when you want to transcribe a call or conversation to read it later. Some examples are shown in this table.
76
68
77
-
|Spoken Form|Display Text|
69
+
|Recognized speech|Display text|
78
70
|---|---|
79
-
|"how are you"|How are you?|
80
-
|"we can go to the mall park or beach"|We can go to the mall, park, or beach.|
71
+
|how are you|How are you?|
72
+
|we can go to the mall park or beach|We can go to the mall, park, or beach.|
81
73
82
-
###Profanity filter
74
+
## Profanity filter
83
75
84
76
You can specify whether to mask, remove, or show profanity in the final transcribed text. Masking replaces profane words with asterisk (*) characters so that you can keep the original sentiment of your text while making it more appropriate for certain situations
0 commit comments