You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Azure AI Content Understanding classifier overview
2
+
title: Azure AI Content Understanding Classifier Overview
3
3
titleSuffix: Azure AI services
4
4
description: Learn about Azure AI Content Understanding classifier solutions.
5
5
author: PatrickFarley
@@ -16,71 +16,68 @@ ms.custom:
16
16
17
17
> [!IMPORTANT]
18
18
>
19
-
> * The classifier API is only available for documents with the `2025-05-01-preview` release.
20
-
> * Azure AI Content Understanding classifier is available in `2025-05-01-preview` release. Public preview releases provide early access to features that are in active development.
21
-
> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
22
-
> * For more information, *see*[**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
19
+
> The classifier API is available only for documents with the `2025-05-01-preview` release. The Azure AI Content Understanding classifier is available in the `2025-05-01-preview` release. Public preview releases provide early access to features that are in active development. Features, approaches, and processes can change or have limited capabilities before general availability. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
23
20
24
-
Azure AI Content Understanding classifier enables you to detect and identify documents you process within your application. Content Understanding classifier can perform classification of an input file as a whole, or identify multiple documents or multiple instances of a single document within an input file.
21
+
You can use the Azure AI Content Understanding classifier to detect and identify documents that you process within your application. The Content Understanding classifier can perform classification of an input file as a whole. The classifier can also identify multiple documents or multiple instances of a single document within an input file.
25
22
26
23
## Business use cases
27
24
28
-
Classifier can process complex documents in various formats and templates:
29
-
30
-
***Invoices**: Categorize invoices from multiple vendors to process each category with a different Content Understanding analyzer if needed.
31
-
***Tax documents**: Categorize multiple tax documents into different types of tax forms such as 1040, 1099, etc.
32
-
***Contracts**: Long, unstructured contracts can now be categorized to streamline operations to understand different types of agreements and their specific legal implications.
25
+
The classifier can process complex documents in various formats and templates:
33
26
27
+
***Invoices**: Categorize invoices from multiple vendors to process each category with a different Content Understanding analyzer, if needed.
28
+
***Tax documents**: Categorize multiple tax documents into different types of tax forms, such as 1040 and 1099.
29
+
***Contracts**: Categorize long, unstructured contracts to streamline operations to understand different types of agreements and their specific legal implications.
34
30
35
31
## Content Understanding classifier capabilities
36
32
37
-
Content Understanding classifier can analyze a single- or multi-file documents to identify if an input file can be classified into a category as defined. Here are the currently supported scenarios:
38
-
39
-
* A single file containing one document type, such as a loan application form.
40
-
* A single file containing multiple document types. For instance, a loan application package that contains a loan application form, payslip, and bank statement.
41
-
* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices.
42
-
* By default, there's an `$OTHER` class as well, which we utilize for cases where any of the defined categories doesn't seem suitable.
33
+
The Content Understanding classifier can analyze single or multifile documents to identify if an input file can be classified into a category as defined. The following scenarios are supported:
43
34
35
+
* A single file that contains one document type, such as a loan application form.
36
+
* A single file that contains multiple document types. An example is a loan application package that contains a loan application form, pay slip, and bank statement.
37
+
* A single file that contains multiple instances of the same document. An example is a collection of scanned invoices.
38
+
* By default, an `$OTHER` class is used for cases where none of the defined categories seems suitable.
44
39
45
-
### How to use Content Understanding classifier
40
+
### Use the Content Understanding classifier
46
41
47
-
A Content Understanding classifier doesn't require any training dataset. Define up to 50 category name and description and create a classifier. By default, the entire file is treated as a single content object, meaning the file/object is associated to a single category.
42
+
A Content Understanding classifier doesn't require any training dataset. You can define up to 50 category names and descriptions and create a classifier. By default, the entire file is treated as a single content object, which means the file or object is associated to a single category.
48
43
49
-
However, when you have more than one document in a file, the classifier can identify the different document types contained within the input file with splitting capability. The classifier response contains the page ranges for each of the identified document types contained within a file. This response can include multiple instances of the same document type.
44
+
When you have more than one document in a file, the classifier can identify the different document types that are contained within the input file with splitting capability. The classifier response contains the page ranges for each of the identified document types that are contained within a file. This response can include multiple instances of the same document type.
50
45
51
-
When you call the classifier, the `analyze` operation includes a `splitMode` property that gives you granular control over the splitting behavior. You can also specify the page numbers to analyze only certain pages of the input document.
46
+
When you call the classifier, the `analyze` operation includes a `splitMode` property that gives you granular control over the splitting behavior. You can also specify the page numbers to analyze only certain pages of the input document:
52
47
53
-
* To treat the entire input file as a single document for classification set the `splitMode` to `none`. When you do so, the service returns just one category for the entire input file.
54
-
* To classify each page of the input file, set the `splitMode` to `perPage`. The service attempts to classify each page as an individual document.
55
-
*Set the `splitMode` to `auto` and the service identifies the documents and associated page ranges.
48
+
* To treat the entire input file as a single document for classification, set `splitMode` to `none`. When you do so, the service returns one category for the entire input file.
49
+
* To classify each page of the input file, set `splitMode` to `perPage`. The service attempts to classify each page as an individual document.
50
+
*To identify the documents and associated page ranges, set `splitMode` to `auto`.
56
51
57
52
### Optional analysis
58
53
59
-
For a complete end to end flow, you may link classifier categories with existing analyzers. For each content object classified to categories with linked analyzers, the service automatically invokes analysis on the content object using the corresponding analyzer. As an example, this linking can be used to create classifiers that identify and analyze only invoices from a PDF that may contain multiple types of forms in a document.
54
+
For a complete end-to-end flow, you can link classifier categories with existing analyzers. For each content object classified to categories with linked analyzers, the service automatically invokes analysis on the content object by using the corresponding analyzer.
60
55
61
-
* Set the`analyzerId` to an existing analyzer to route and perform field extraction from the classified documents or pages.
56
+
For example, you can use this linking to create classifiers that identify and analyze only invoices from a PDF that contains multiple types of forms in a document. Set`analyzerId` to an existing analyzer to route and perform field extraction from the classified documents or pages.
62
57
63
58
### Classifier limits
64
59
65
-
For information on supported input document formats and classifier limits, refer to our [Service quotas and limits](../service-limits.md#classifier) page.
66
-
60
+
For information on supported input document formats and classifier limits, see [Service quotas and limits](../service-limits.md#classifier).
67
61
68
62
### Best practices
69
63
70
-
To improve classification and splitting quality, it's important to give a good category name and description so the model can understand the categories with some context. For more information on category names and descriptions, *see*[Best practices](../concepts/best-practices.md#classifier-category-names-and-descriptions).
64
+
To improve classification and splitting quality, use a good category name and description so that the model can understand the categories with some context. For more information on category names and descriptions, see [Best practices](../concepts/best-practices.md#classifier-category-names-and-descriptions).
71
65
72
66
## Key benefits
73
67
74
-
***Accuracy and reliability:** Ensure precise document classification, reducing errors and boosting efficiency.
75
-
***Scalability:** Seamlessly scale out document processing to meet business demands.
76
-
***Customizable:** Adapt document classifier to fit specific workflows.
68
+
***Accuracy and reliability**: Ensure precise document classification to reduce errors and boost efficiency.
69
+
***Scalability**: Scale out document processing to meet business demands.
70
+
***Customizable**: Adapt the document classifier to fit specific workflows.
77
71
78
72
## Supported languages and regions
79
-
For a detailed list of supported languages and regions, visit our [Language and region support](../language-region-support.md) page.
73
+
74
+
For a list of supported languages and regions, see [Language and region support](../language-region-support.md).
80
75
81
76
## Data privacy and security
82
-
Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
83
77
84
-
## Next step
85
-
* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
86
-
* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md).
78
+
Developers who use Content Understanding should review Microsoft policies on customer data. For more information, see [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy).
79
+
80
+
## Related content
81
+
82
+
* Try processing your document content by using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
83
+
* Learn to analyze document content [analyzer templates](../quickstart/use-ai-foundry.md).
0 commit comments