Skip to content

Commit 61005e1

Browse files
committed
resolve build issues
1 parent 65ce124 commit 61005e1

File tree

5 files changed

+110
-118
lines changed

5 files changed

+110
-118
lines changed

articles/ai-services/content-understanding/concepts/analyzers-overview.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ Analyzers are the core processing units in Content Understanding that define how
2727
* Content extraction configurations - determining what foundational elements to extract.
2828
* Field extraction schemas - specifying how to get the fields(extract/generate/classify) from the content.
2929

30-
Key benefits of analyzers include:
30+
:::image type="content" source="../media/concepts/analyzer-architecture.png" alt-text="Screenshot of analyzer architecture.":::
31+
32+
Key features of analyzers include the following benefits:
3133

3234
* **Consistency**: Analyzers ensure uniform processing across all content by applying the same extraction rules and schemas, delivering reliable and predictable results.
3335

articles/ai-services/content-understanding/concepts/prebuilt-analyzers

Lines changed: 0 additions & 116 deletions
This file was deleted.
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
title: Azure AI Content Understanding Prebuilt analyzers
3+
titleSuffix: Azure AI services
4+
description: Learn about prebuilt analyzers, their scenarios, customization options, billing, roadmap in Azure AI Content Understanding.
5+
author: laujan
6+
ms.author: additi
7+
manager: nitinme
8+
ms.service: azure-ai-content-understanding
9+
ms.topic: overview
10+
ms.date: 05/19/2025
11+
---
12+
13+
# Prebuilt analyzers in Azure AI Content Understanding
14+
15+
## Overview
16+
17+
Azure AI Content Understanding employs analyzers to derive structured insights from unstructured content, spanning documents, images, audio, and video files. Its prebuilt analyzers are ready-to-use solutions tailored for common content processing tasks, including document ingestion, search indexing, and retrieval-augmented generation (`RAG`).
18+
19+
These analyzers streamline trial experiences and can be adapted by extending their functionality to meet specific workflow requirements. Key offerings include:
20+
21+
* **[Content parsers](#content-parsers-for-search-and-ingestion)** for general search and ingestion scenarios.
22+
* **[Scenario-specific predefined analyzers](#scenario-specific-predefined-analyzers)** for targeted use cases like invoices or call center transcripts.
23+
* **[Inheritance from prebuilt analyzers](#inheriting-and-customizing-prebuilt-analyzers)** to customize configuration and fields.
24+
25+
## Content parsers for search and ingestion
26+
27+
To streamline common content ingestion scenarios, Azure AI Content Understanding offers general purpose **prebuilt content analyzers**. These analyzers extract text, layout, and metadata from various content types.
28+
29+
30+
| Analyzer | Description | Supported File Types |
31+
|:-------------------------|:-----------------------------------------------------------------------------|:--------------------|
32+
| `prebuilt-documentAnalyzer` | Extracts text, layout, and metadata using `OCR` for images and rendered files. Users can customize prebuilt content analyzers to modify configuration and add/remove fields. | `.pdf`, `.tiff`, `image`, `.docx`, `.rtf`, `.html`, `.md`, `.json`, `.xml`, `.csv`, `.tsv`, and `.txt` |
33+
| `prebuilt-imageAnalyzer` | Generates a descriptive caption of an image and `OCR` is conceptually disabled. Users refine the description and/or add new fields by creating analyzer with baseAnalyzerId=prebuilt-imageAnalyzer. | image |
34+
| `prebuilt-audioAnalyzer` | Produces a transcript, speaker diarization, and a summary for audio files. Users can add new fields by creating analyzer with baseAnalyzerId=prebuilt-audioAnalyzer. | audio |
35+
| `prebuilt-videoAnalyzer` | Extracts keyframes, transcript, and video segmentation. Segmentation is enabled by default. Users can disable/customize segmentation by creating an analyzer with baseAnalyzerId=prebuilt-videoAnalyzer and changing segmentationMode property. | video |
36+
37+
Analyzers are optimized for `RAG` ingestion and search workflows, offering default behaviors suitable for indexing and summarizing large volumes of content.
38+
39+
> [!NOTE]
40+
>
41+
> * Currently, `OCR` is supported for `.pdf` and `.tiff` image files. Content elements from such files include span properties and bounding boxes via their source properties.
42+
> * For unsupported files, contents are extracted digitally. Content elements from these files include span properties to indicate their position in the returned markdown.
43+
> * There are no prebuilt models for `agentic` mode. Instead, users can create an analyzer with mode=pro starting from any document base analyzer to test out `agentic` behavior.
44+
45+
## Scenario-specific predefined analyzers
46+
47+
In addition to general content analyzers, Azure AI Content Understanding provides **prebuilt analyzers for specific business scenarios** to target common scenarios. They can be further customized by setting them as the `baseAnalyzerId`:
48+
49+
| Analyzer | Description | Supported File Types |
50+
|:--------------------|:----------------------------------------------------------------|:--------------------|
51+
| `prebuilt-callCenter` | Extracts summary, sentiment, topics, and insights from call center transcripts. | audio |
52+
| `prebuilt-invoice` | Extracts structured fields such as InvoiceId, Date, and Vendor from invoices. | `.pdf`, `.tiff`, and `image` files.|
53+
54+
These analyzers bundle best practices and hidden configurations to deliver accurate extractions for their intended use cases while simplifying deployment by abstracting internal implementation details.
55+
56+
57+
## Inheritance and customizing prebuilt analyzers
58+
59+
With the **`2025-05-01-preview`**, any prebuilt analyzer can be inherited using `baseAnalyzerId` to create a custom analyzer. Inheritance allows for modification of existing fields, descriptions, types, and methods. Additionally, configuration settings such as `enableFormula`, `segmentationMode`, and others can be customized.
60+
61+
***Example***
62+
63+
64+
### Inherit from `prebuilt-documentAnalyzer`
65+
66+
```json
67+
{
68+
"baseAnalyzerId": "prebuilt-documentAnalyzer",
69+
"fields": [
70+
{ "name": "InvoiceId", "type": "string", "method": "regex" },
71+
{ "name": "TotalAmount", "type": "currency", "method": "extractive" }
72+
],
73+
"configuration": {
74+
"enableFormula": true,
75+
"tableFormat": "markdown"
76+
}
77+
}
78+
```
79+
80+
> [!IMPORTANT]
81+
> With the `2025-05-01-preview`, modifying a field description overwrites the internal refined description, potentially reducing extraction quality.
82+
> The `baseAnalyzerId` must be a prebuilt analyzer. Custom analyzers can't currently inherit from other custom analyzers.
83+
84+
## Analyzer details and configurations
85+
86+
* **Document Analyzer**: Uses `OCR` for `.pdf`,`.tiff`, and `image` files.
87+
* **Image Analyzer**: Doesn't use `OCR`but generates image descriptions.
88+
* **Audio Analyzer**: Returns transcript and summary extraction.
89+
* **Video Analyzer**: Returns keyframes, transcript, and segmentation.
90+
* **Call Center Analyzer**: Summarizes and extracts insights from audio. Supports audio text.
91+
* **Invoice Analyzer**: Returns structured field extraction from invoices. Supports `.pdf`, `.tiff`, and `image` files.
92+
93+
94+
## Billing and limits
95+
96+
* **Documents**: Billing is calculated per page, slide, or sheet. For`.docx`, `.rtf`, `.html`, `.md`, `.msg`, `.eml`, `.json`, `.xml`, `.csv`, `.tsv`, and `.txt`, we count every 3k `UTF16 `characters as a page. Field extraction has a `fixed-per-1k` page rate
97+
* **Images**: There's no cost for image content extraction, however, generating a description invokes image field extraction charges.
98+
* **Audio/Video**: Billing is calculated on a per hour basis with 1-minute granularity. Charges are calculated for both audio/video content extraction and field extraction.
99+
* Maximum field limit: Currently there are 90 user-defined fields with 100 total to include reserved fields.
100+
101+
## Next steps
102+
103+
* [Analyzer templates](analyzer-templates.md)
104+
* [Analyzers overview](analyzers-overview.md)
105+
106+

articles/ai-services/content-understanding/how-to/create-multi-service-resource.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: azure-ai-content-understanding
88
ms.topic: how-to
99
ms.date: 05/19/2025
10-
ms.custom: ignite-2024-understanding-release, references_regions
10+
ms.custom: references_regions
1111
ms.author: lajanuar
1212
---
1313

337 KB
Loading

0 commit comments

Comments
 (0)