initial draft

eric-urban · eric-urban · commit b1b0aa57eaf5 · 2022-09-09T17:00:56.000-07:00
diff --git a/articles/cognitive-services/Speech-Service/captioning-concepts.md b/articles/cognitive-services/Speech-Service/captioning-concepts.md
@@ -40,6 +40,9 @@ Captioning can accompany real time or pre-recorded speech. Whether you're showin
 
 The Speech service supports output formats such as SRT (SubRip Text) and WebVTT (Web Video Text Tracks). These can be loaded onto most video players such as VLC, automatically adding the captions on to your video.
 
+> [!TIP]
+> The Speech service provides [profanity filter](display-text-format.md#profanity-filter) options. You can specify whether to mask, remove, or show profanity. 
+
 The [SRT](https://docs.fileformat.com/video/srt/) (SubRip Text) timespan output format is `hh:mm:ss,fff`. 
 
 ```srt
@@ -161,68 +164,6 @@ RECOGNIZING: Text=welcome to applied mathematics
 RECOGNIZED: Text=Welcome to applied Mathematics course 201.
 ```
 
-## Profanity filter 
-
-You can specify whether to mask, remove, or show profanity in recognition results. 
-
-> [!NOTE]
-> Microsoft also reserves the right to mask or remove any word that is deemed inappropriate. Such words will not be returned by the Speech service, whether or not you enabled profanity filtering.
-
-The profanity filter options are:
-- `Masked`: Replaces letters in profane words with asterisk (*) characters. This is the default option.
-- `Raw`: Include the profane words verbatim.
-- `Removed`: Removes profane words.
-
-For example, to remove profane words from the speech recognition result, set the profanity filter to `Removed` as shown here:
-
-::: zone pivot="programming-language-csharp"
-```csharp
-speechConfig.SetProfanity(ProfanityOption.Removed);
-```
-::: zone-end
-::: zone pivot="programming-language-cpp"
-```cpp
-speechConfig->SetProfanity(ProfanityOption::Removed);
-```
-::: zone-end
-::: zone pivot="programming-language-go"
-```go
-speechConfig.SetProfanity(common.Removed)
-```
-::: zone-end
-::: zone pivot="programming-language-java"
-```java
-speechConfig.setProfanity(ProfanityOption.Removed);
-```
-::: zone-end
-::: zone pivot="programming-language-javascript"
-```javascript
-speechConfig.setProfanity(sdk.ProfanityOption.Removed);
-```
-::: zone-end
-::: zone pivot="programming-language-objectivec"
-```objective-c
-[self.speechConfig setProfanityOptionTo:SPXSpeechConfigProfanityOption.SPXSpeechConfigProfanityOption_ProfanityRemoved];
-```
-::: zone-end
-::: zone pivot="programming-language-swift"
-```swift
-self.speechConfig!.setProfanityOptionTo(SPXSpeechConfigProfanityOption_ProfanityRemoved)
-```
-::: zone-end
-::: zone pivot="programming-language-python"
-```python
-speech_config.set_profanity(speechsdk.ProfanityOption.Removed)
-```
-::: zone-end
-::: zone pivot="programming-language-cli"
-```console
-spx recognize --file caption.this.mp4 --format any --profanity masked --output vtt file - --output srt file -
-```
-::: zone-end
-
-Profanity filter is applied to the result `Text` and `MaskedNormalizedForm` properties. Profanity filter isn't applied to the result `LexicalForm` and `NormalizedForm` properties. Neither is the filter applied to the word level results.
-
 ## Language identification
 
 If the language in the audio could change, use continuous [language identification](language-identification.md). Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification). You provide up to 10 candidate languages, at least one of which is expected be in the audio. The Speech service returns the most likely language in the audio. 
diff --git a/articles/cognitive-services/Speech-Service/display-text-format.md b/articles/cognitive-services/Speech-Service/display-text-format.md
@@ -0,0 +1,148 @@
+---
+title: Display text formatting with speech to text - Speech service
+titleSuffix: Azure Cognitive Services
+description: An overview of key concepts for display text formatting with speech to text.
+services: cognitive-services
+author: eric-urban
+manager: nitinme
+ms.service: cognitive-services
+ms.subservice: speech-service
+ms.topic: conceptual
+ms.date: 09/09/2022
+ms.author: eur
+zone_pivot_groups: programming-languages-speech-sdk-cli
+---
+
+# Display text formatting with speech to text
+
+Speech-to-text offers an array of formatting features to ensure that the transcribed text is clear and legible. Below is an overview of these features and how each one is used to improve the overall clarity of the final text output.
+
+## Standard formatting
+
+### ITN
+
+Inverse Text Normalization (ITN) is a process that converts spoken words into their written form. For example, the spoken word "four" is converted to the written form "4". This process is performed by the speech-to-text service and is not configurable by the user. Some of the supported text formats include dates, times, decimals, currencies, addresses, emails, and phone numbers. This allows Speech users to speak naturally into their device, and the service formats text as expected. The following table shows the ITN rules that are applied to the text output.
+
+|Spoken Form|Display Text|
+|---|---|
+|"that will cost nine hundred dollars"|That will cost $900.|
+|"my phone number is one eight hundred, four five six, eight nine ten"|My phone number is 1-800-456-8910.|
+|"the time is six forty five p m"|The time is 6:45 PM.|
+|"I live on thirty five lexington avenue"|I live on 35 Lexington Ave.|
+|"the answer is six point five"|The answer is 6.5.|
+|"send it to support at help dot com"|Send it to support@help.com.|
+
+
+### Capitalization
+
+Speech-to-text’s model knows which words should be capitalized and automatically does so in order to make the text more readable. It will capitalize proper nouns and words at the beginning of a sentence, as well as anywhere else where it makes the overall text more grammatically correct. Some examples are listed below:
+
+|Spoken Form|Display Text|
+|---|---|
+|"i got an iphone x r"|I got an iPhone XR.|
+|"my name is jennifer smith"|My name is Jennifer Smith.|
+|"i want to visit new york city"|I want to visit New York City.|
+|"i need to service my toyota"|I need to service my Toyota.|
+
+
+### Disfluency removal
+
+When speaking, it is common for someone to stutter, duplicate words, and say filler words like "uhm" or "uh". Speech-to-text can recognize these disfluencies and remove them from the transcribed text so that it is cleaner. This is great for transcribing live unscripted speeches to read them back later. Some examples are shown in this table.
+
+|Spoken Form|Display Text|
+|---|---|
+|"i uh said that we can go to the uhmm movies"|I said that we can go to the movies.|
+|"its its not that big of uhm a deal"|It's not that big of a deal.|
+|"umm i think tomorrow should work"|I think tomorrow should work.|
+
+
+## Configurable formatting options
+
+### Explicit Punctuation 
+
+When using Speech-to-text, you have the option to speak aloud any punctuation you may want to use in order to make your text more legible. This is especially useful in a situation where you need to use complex punctuation or want your transcribed text to be read by someone else, as it allows you to include grammar with your voice rather than having to retroactively input it later. Some examples are shown in this table.
+
+|Spoken Form|Display Text|
+|---|---|
+|"they entered the room dot dot dot"|They entered the room...|
+|"i heart emoji you period"|I <3 you.|
+|"the options are apple forward slash banana forward slash orange period"|The options are apple/banana/orange.|
+|"are you sure question mark"|Are you sure?|
+
+
+### Auto Punctuation 
+
+Sometimes it may seem tedious to have to speak every punctuation mark out loud, so Speech-to-text also offers the ability to automatically punctuate your text and improve clarity. This is great option when you want to transcribe a call or conversation to read it later. Some examples are shown in this table.
+
+|Spoken Form|Display Text|
+|---|---|
+|"how are you"|How are you?|
+|"we can go to the mall park or beach"|We can go to the mall, park, or beach.|
+
+### Profanity filter 
+
+You can specify whether to mask, remove, or show profanity in the final transcribed text. Masking replaces profane words with asterisk (*) characters so that you can keep the original sentiment of your text while making it more appropriate for certain situations 
+
+> [!NOTE]
+> Microsoft also reserves the right to mask or remove any word that is deemed inappropriate. Such words will not be returned by the Speech service, whether or not you enabled profanity filtering.
+
+The profanity filter options are:
+- `Masked`: Replaces letters in profane words with asterisk (*) characters. This is the default option.
+- `Raw`: Include the profane words verbatim.
+- `Removed`: Removes profane words.
+
+For example, to remove profane words from the speech recognition result, set the profanity filter to `Removed` as shown here:
+
+::: zone pivot="programming-language-csharp"
+```csharp
+speechConfig.SetProfanity(ProfanityOption.Removed);
+```
+::: zone-end
+::: zone pivot="programming-language-cpp"
+```cpp
+speechConfig->SetProfanity(ProfanityOption::Removed);
+```
+::: zone-end
+::: zone pivot="programming-language-go"
+```go
+speechConfig.SetProfanity(common.Removed)
+```
+::: zone-end
+::: zone pivot="programming-language-java"
+```java
+speechConfig.setProfanity(ProfanityOption.Removed);
+```
+::: zone-end
+::: zone pivot="programming-language-javascript"
+```javascript
+speechConfig.setProfanity(sdk.ProfanityOption.Removed);
+```
+::: zone-end
+::: zone pivot="programming-language-objectivec"
+```objective-c
+[self.speechConfig setProfanityOptionTo:SPXSpeechConfigProfanityOption.SPXSpeechConfigProfanityOption_ProfanityRemoved];
+```
+::: zone-end
+::: zone pivot="programming-language-swift"
+```swift
+self.speechConfig!.setProfanityOptionTo(SPXSpeechConfigProfanityOption_ProfanityRemoved)
+```
+::: zone-end
+::: zone pivot="programming-language-python"
+```python
+speech_config.set_profanity(speechsdk.ProfanityOption.Removed)
+```
+::: zone-end
+::: zone pivot="programming-language-cli"
+```console
+spx recognize --file caption.this.mp4 --format any --profanity masked --output vtt file - --output srt file -
+```
+::: zone-end
+
+Profanity filter is applied to the result `Text` and `MaskedNormalizedForm` properties. Profanity filter isn't applied to the result `LexicalForm` and `NormalizedForm` properties. Neither is the filter applied to the word level results.
+
+
+## Next steps
+
+* [Speech-to-text quickstart](get-started-speech-to-text.md)
+* [Get speech recognition results](get-speech-recognition-results.md)
diff --git a/articles/cognitive-services/Speech-Service/includes/quickstarts/captioning/cli.md b/articles/cognitive-services/Speech-Service/includes/quickstarts/captioning/cli.md
@@ -63,7 +63,7 @@ Here are details about the optional arguments from the previous command:
 - `--output vtt file -` and `--output srt file -`: Outputs WebVTT and SRT captions to standard output. For more information about SRT and WebVTT caption file formats, see [Caption output format](~/articles/cognitive-services/speech-service/captioning-concepts.md#caption-output-format). For more information about the `--output` argument, see [Speech CLI output options](~/articles/cognitive-services/speech-service/spx-output-options.md).
 - `@output.each.detailed`: Outputs event results with text, offset, and duration. For more information, see [Get speech recognition results](~/articles/cognitive-services/speech-service/get-speech-recognition-results.md).
 - `--property SpeechServiceResponse_StablePartialResultThreshold=5`: You can request that the Speech service return fewer `Recognizing` events that are more accurate. In this example, the Speech service must affirm recognition of a word at least five times before returning the partial results to you. For more information, see [Get partial results](~/articles/cognitive-services/speech-service/captioning-concepts.md#get-partial-results) concepts.
-- `--profanity masked`: You can specify whether to mask, remove, or show profanity in recognition results. For more information, see [Profanity filter](~/articles/cognitive-services/speech-service/captioning-concepts.md#profanity-filter) concepts.
+- `--profanity masked`: You can specify whether to mask, remove, or show profanity in recognition results. For more information, see [Profanity filter](~/articles/cognitive-services/speech-service/display-text-format.md#profanity-filter) concepts.
 - `--phrases "Constoso;Jessie;Rehaan"`: You can specify a list of phrases to be recognized, such as Contoso, Jessie, and Rehaan. For more information, see [Improve recognition with phrase list](~/articles/cognitive-services/speech-service/improve-accuracy-phrase-list.md).
 
 ## Clean up resources
diff --git a/articles/cognitive-services/Speech-Service/includes/quickstarts/captioning/usage-arguments.md b/articles/cognitive-services/Speech-Service/includes/quickstarts/captioning/usage-arguments.md
@@ -34,5 +34,5 @@ Output options include:
 - `--output FILE`: Output captions to the specified `file`. This flag is required.
 - `--srt`: Output captions in SRT (SubRip Text) format. The default format is WebVTT (Web Video Text Tracks). For more information about SRT and WebVTT caption file formats, see [Caption output format](~/articles/cognitive-services/speech-service/captioning-concepts.md#caption-output-format).
 - `--quiet`: Suppress console output, except errors.
-- `--profanity OPTION`: Valid values: raw, remove, mask. For more information, see [Profanity filter](~/articles/cognitive-services/speech-service/captioning-concepts.md#profanity-filter) concepts.
+- `--profanity OPTION`: Valid values: raw, remove, mask. For more information, see [Profanity filter](~/articles/cognitive-services/speech-service/display-text-format.md#profanity-filter) concepts.
 - `--threshold NUMBER`: Set stable partial result threshold. The default value with this code example is `3`. For more information, see [Get partial results](~/articles/cognitive-services/speech-service/captioning-concepts.md#get-partial-results) concepts.
diff --git a/articles/cognitive-services/Speech-Service/toc.yml b/articles/cognitive-services/Speech-Service/toc.yml
@@ -58,6 +58,8 @@ items:
       href: how-to-recognize-speech.md
     - name: Get speech recognition results
       href: get-speech-recognition-results.md
+    - name: Display text formatting
+      href: display-text-format.md
     - name: How to use batch transcription
       href: batch-transcription.md
     - name: Improve recognition with Custom Speech