You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/capabilities/overview.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,8 +45,8 @@ The following section details the content extraction capabilities and optional a
45
45
# [Document](#tab/document)
46
46
47
47
|Content Extraction|Add-on Capabilities|
48
-
|--------|-------------|
49
-
|•**Optical Character Recognition (OCR)**: Extract printed and handwritten text from documents in various file formats, converting it into structured data. </br>|•**Layout**:Extracts layout information such as paragraphs, sections, tables, and more.. </br>•**Barcode**: Identifies and decodes all barcodes in the documents. </br> •**Formula**: Recognizes all identified mathematical equations from the documents. </br> |
48
+
|-------------|-------------|
49
+
|•**Optical Character Recognition (OCR)**: Extract printed and handwritten text from </br> documents in various file formats, converting it into structured data. </br>|•**Layout**:Extracts layout information such as paragraphs, sections, tables, and more.. </br>•**Barcode**: Identifies and decodes all barcodes in the documents. </br> •**Formula**: Recognizes all identified mathematical equations from the documents. </br> |
50
50
51
51
# [Image](#tab/image)
52
52
> [!NOTE]
@@ -55,14 +55,14 @@ The following section details the content extraction capabilities and optional a
55
55
# [Audio](#tab/audio)
56
56
57
57
|Content Extraction|Add-on Capabilities|
58
-
|--------|-------------|
59
-
|•**Transcription**:Converts conversational audio into searchable and analyzable text-based transcripts in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level timestamps are available upon request. </br> •**Diarization**: Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers. </br> •**Language detection**: Automatically detects the language spoken in the audio to be processed.</br>|•**Speaker role detection**: Identifies speaker roles based on diarization results and replaces generic labels like "Speaker 1" with specific role names, such as "Agent" or "Customer." </br>|
58
+
|-------------|-------------|
59
+
|•**Transcription**:Converts conversational audio into searchable and analyzable text-based transcripts </br> in WebVTT format. Customizable fields can be generated from transcription data. Sentence-level and word-level </br> timestamps are available upon request. </br> •**Diarization**: Distinguishes between speakers in a conversation, attributing parts of the transcript to specific speakers. </br> •**Language detection**: Automatically detects the language spoken in the audio to be processed.</br>|•**Speaker role detection**: Identifies speaker roles based on diarization results and replaces generic </br> labels like "Speaker 1" with specific role names, such as "Agent" or "Customer." </br>|
60
60
61
61
# [Video](#tab/video)
62
62
63
63
|Content Extraction|Add-on Capabilities|
64
-
|--------|-------------|
65
-
|•**Transcription**: Converts speech to structured, searchable text via Azure AI Speech, allowing users to specify recognition languages. </br>•**Shot Detection**: Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and repackaging of content with breaks exactly on shot boundaries. </br> •**Key Frame Extraction**: Extracts key frames from videos to represent each shot completely, ensuring each shot has enough key frames to enable Field Extraction to work effectively.</br> |**Face Grouping**: Grouped faces appearing in a video to extract one representative face image for each person and provides segments where each one is present. The grouped face data is available as metadata and can be used to generate customized metadata fields.This feature is limited access and involves face identification and grouping; customers need to register for access at Face Recognition. |
64
+
|-------------|-------------|
65
+
|•**Transcription**: Converts speech to structured, searchable text via Azure AI Speech, allowing users to specify recognition languages. </br>•**Shot Detection**: Identifies segments of the video aligned with shot boundaries where possible, allowing for precise editing and </br> repackaging of content with breaks exactly on shot boundaries. </br> •**Key Frame Extraction**: Extracts key frames from videos to represent each shot completely, ensuring each </br> shot has enough key frames to enable Field Extraction to work effectively.</br> |•**Face Grouping**: Grouped faces appearing in a video to extract one representative face image for each person and provides segments where each one is present. </br> The grouped face data is available as metadata and can be used to generate customized metadata fields.</br> This feature is limited access and involves face identification and grouping; customers need to register for access at Face Recognition. |
66
66
67
67
----
68
68
### Field Extraction
@@ -81,7 +81,7 @@ Each modality supports specific generation approaches optimized for that content
81
81
82
82
|Supported generation methods|
83
83
|--------------|
84
-
|**Extract**: In document, users can extract field values from input content, such as dates from receipts or item details from invoices. |
84
+
|•**Extract**: In document, users can extract field values from input content, such as dates from receipts or item details from invoices. |
85
85
86
86
:::image type="content" source="../media/capabilities/documentextraction.gif" alt-text="Illustration of Document extraction method workflow.":::
0 commit comments