Skip to content

Commit b1ae5fc

Browse files
authored
Document overview/elements, Language/region support, Service limits
1 parent ed2d718 commit b1ae5fc

File tree

1 file changed

+38
-25
lines changed
  • articles/ai-services/content-understanding/document

1 file changed

+38
-25
lines changed

articles/ai-services/content-understanding/document/overview.md

Lines changed: 38 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,8 @@ ms.date: 05/19/2025
1818
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
1919
> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
2020
21-
Content Understanding is a cloud-based [Azure AI Service](../../what-are-ai-services.md) designed to efficiently extract content and structured fields from documents and forms. It provides a comprehensive suite of APIs and an intuitive UX experience for optimal efficiency.
22-
23-
Content Understanding enables organization to streamline data collection and processing, enhance operational efficiency, optimize data-driven decision making, and empower innovation. With customizable analyzers, Content Understanding allows for easy extraction of content or fields from documents and forms, tailored to specific business needs.
24-
25-
## April updates
26-
27-
* **Invoice prebuilt template**: Extract predefined schemas from various invoice formats. The out-of-the-box schema can be customized by adding or removing fields to suit your specific needs.
28-
29-
* **Generative and classify methods**: Added support for both generative and classification-based methods, enabling you to create generative fields such as summaries or categorize document details into multiple classes using the classify method.
21+
Azure AI Content Understanding delivers advanced document analysis capabilities that empower organizations to transform unstructured content into actionable, structured data.
22+
By leveraging [customizable analyzers](../concepts/prebuilt-analyzers.md), Content Understanding can intelligently extract key information, fields, and relationships from a wide variety of documents and forms.
3023

3124
## Business use cases
3225

@@ -37,33 +30,53 @@ Document analyzers can process complex documents in various formats and template
3730
* **Financial services**: Analyze complex documents like financial reports and asset management reports.
3831
* **Expense management**: Parse receipts and invoices from various retailers to validate expenses across different formats and templates.
3932

40-
4133
## Document analyzer capabilities
4234

43-
:::image type="content" source="../media/document/extraction-overview.png" alt-text="Screenshot of document extraction flow.":::
35+
:::image type="content" source="../media/document/document-capabilities.png" alt-text="Screenshot of document extraction flow.":::
36+
37+
### Content Extraction
4438

45-
Content extraction enables the extraction of both printed and handwritten text from forms and documents, delivering business-ready content that is immediately actionable, usable, or adaptable for further development within your organization.
39+
Content extraction forms the foundation of Azure AI Content Understanding's document analysis capabilities, transforming unstructured documents into structured, machine-readable data.
40+
It precisely captures both printed and handwritten text while preserving the document's structure through advanced layout analysis.
4641

47-
### Add-on capabilities
42+
- Content Analysis
43+
- **Text**: Processes multilingual content, including both machine-printed and handwritten text from hundreds of languages.
44+
- **Selection marks**: Identifies and extracts selection indicators such as checkboxes, radio buttons, and similar markers.
45+
- **Barcode detection**: Scans and decodes information from over a dozen types of linear and two-dimensional barcodes.
46+
- **Mathematical formulas**: Captures and preserves complex mathematical expressions in LaTeX format.
47+
- **Image elements**: Locates and extracts images, diagrams, and charts along with their related captions and annotations.
48+
- Structure Analysis
49+
- **Paragraphs**: Detects and categorizes text segments based on their document context and role.
50+
- **Tabular data**: Recognizes and extracts table structures, including complex formats with spanning cells and multi-page layouts.
51+
- **Hierarchical sections**: Maps content organization through section headers and nested content relationships.
4852

49-
Enhance your document extraction with optional add-on features, which can incur added costs. These features can be enabled or disabled based on your needs. Currently supported add-ons include:
53+
### Field extraction
5054

51-
* **Layout**: Extracts layout information such as paragraphs, sections, tables, and more.
52-
* **Barcode**: Identifies and decodes all barcodes in the documents.
53-
* **Formula**: Recognizes all identified mathematical equations from the documents.
55+
Field extraction enables the extraction, classification, and generation of structured data from various forms and documents tailored to your specific needs. By converting unstructured document content into structured, actionable information, field extraction streamlines data organization, enhances searchability, and facilitates automated processing workflows. For example, you can efficiently extract customer details, billing addresses, and itemized charges from invoices, or identify contractual parties, renewal dates, and payment terms from legal agreements. To achieve optimal results, you can leverage prebuilt analyzer templates—such as those designed for invoices—or create customized analyzers from scratch, further refining accuracy by labeling additional sample documents.
5456

57+
### Field extraction methods
5558

56-
### Field extraction
59+
Azure AI Content Understanding provides versatile methods for field extraction, enabling precise and tailored processing of document content:
60+
61+
- **Extract**: Define and retrieve specific data fields from your documents, such as transaction dates from receipts or detailed line items from invoices, ensuring targeted and accurate data capture.
62+
63+
- **Classify**: Categorize document content into predefined categories, such as classifying sentiment in customer call transcript or classifying hotel receipt items.
64+
65+
- **Generate**: Produce new insights or summaries from your documents, including document summaries, chapter overviews enhancing content accessibility and comprehension.
66+
67+
## Key benefits
68+
69+
Content Understanding delivers powerful document analysis capabilities designed to address critical enterprise and business scenarios such as Retrieval-Augmented Generation (RAG) and Robotic Process Automation (RPA). Key benefits include:
70+
71+
- **Intelligent search enablement:** Transform unstructured documents into structured, searchable data assets, significantly improving information discoverability and accessibility across your organization.
72+
73+
- **Grounded data extraction:** Maintain clear traceability and localization of extracted data, facilitating efficient human-in-the-loop review processes and ensuring transparency and compliance.
5774

58-
Field extraction enables the extraction of structured data from various forms and documents tailored to your specific needs. For instance, you can extract customer names, billing addresses, and line items from invoices; or parties, renewal date, and payment clause from contracts. You can start field extraction right after defining the schema or enhance it by labeling more sample documents to improve extraction quality.
75+
- **Confidence-driven automation:** Leverage built-in confidence scoring to intelligently automate document processing tasks, optimizing resource allocation, reducing operational costs, and enhancing decision-making accuracy.
5976

60-
## Key Benefits
77+
- **Flexible customization:** Easily adapt and tailor document analyzers to align with specific business processes and workflows, enabling precise extraction and classification tailored to your organization's unique requirements.
6178

62-
* **Accuracy and reliability:** Ensure precise data extraction, reducing errors and boosting efficiency.
63-
* **Scalability:** Seamlessly scale out document processing to meet business demands.
64-
* **Customizable:** Adapt document analyzer to fit specific workflows.
65-
* **Grounding source:** Localize extracted data for human review workflows.
66-
* **Confidence scores:** Enhance automation with estimated confidence scores to maximize efficiency and minimize costs.
79+
- **Enhanced accuracy and reliability:** Achieve precise extraction and classification of critical business data, significantly reducing errors and improving operational efficiency across automated workflows.
6780

6881
## Input requirements
6982
For detailed information on supported input document formats, refer to our [Service quotas and limits](../service-limits.md) page.

0 commit comments

Comments
 (0)