|
| 1 | +--- |
| 2 | +title: Azure AI Content Understanding classifier overview |
| 3 | +titleSuffix: Azure AI services |
| 4 | +description: Learn about Azure AI Content Understanding classifier solutions. |
| 5 | +author: laujan |
| 6 | +ms.author: lajanuar |
| 7 | +manager: nitinme |
| 8 | +ms.service: azure-ai-content-understanding |
| 9 | +ms.topic: overview |
| 10 | +ms.date: 05/19/2025 |
| 11 | +--- |
| 12 | + |
| 13 | +# Content Understanding classifier |
| 14 | + |
| 15 | +> [!IMPORTANT] |
| 16 | +> |
| 17 | +> * Classifier is only available for documents with the `2025-05-01-preview` release. |
| 18 | +> * Azure AI Content Understanding classifier is available in `2025-05-01-preview` release. Public preview releases provide early access to features that are in active development. |
| 19 | +> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA). |
| 20 | +> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms). |
| 21 | +
|
| 22 | +Azure AI Content Understanding classifier enables you to detect and identify documents you process within your application. Content Understanding classifier performs classification of an input file one page at a time to identify the documents within and can also identify multiple documents or multiple instances of a single document within an input file. |
| 23 | + |
| 24 | +## Business use cases |
| 25 | + |
| 26 | +Classifier can process complex documents in various formats and templates: |
| 27 | + |
| 28 | +* **Invoices**: Categorize invoices from multiple vendors to process each category with a different Content Understanding analyzer if needed. |
| 29 | +* **Tax documents**: Categorize multiple tax documents into different types of tax forms such as 1040, 1099, etc. |
| 30 | +* **Contracts**: Long, unstructured contracts can now be categorized to streamline operations to understand different types of agreements and their specific legal implications. |
| 31 | + |
| 32 | + |
| 33 | +## Content Understanding classifier capabilities |
| 34 | + |
| 35 | +Content Understanding classifier can analyze a single- or multi-file documents to identify if an input file can be classified into a category as defined. Here are the currently supported scenarios: |
| 36 | + |
| 37 | +* A single file containing one document type, such as a loan application form. |
| 38 | +* A single file containing multiple document types. For instance, a loan application package that contains a loan application form, payslip, and bank statement. |
| 39 | +* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices. |
| 40 | + |
| 41 | +### How to use Content Understanding classifier |
| 42 | + |
| 43 | +Content Understanding classifier doesn't require any training dataset. Define up to 50 category name and description and create a classifier. By default, the entire file is treated as a single content object, meaning the file/object is associated to a single category. |
| 44 | + |
| 45 | +However, when you have more than one document in a file, the classifier can identify the different document types contained within the input file with splitting capability. The classifier response contains the page ranges for each of the identified document types contained within a file. This response can include multiple instances of the same document type. |
| 46 | + |
| 47 | +When you call the classifier, the `analyze` operation includes a `splitMode` property that gives you granular control over the splitting behavior. |
| 48 | + |
| 49 | +* To treat the entire input file as a single document for classification set the `splitMode` to `none`. When you do so, the service returns just one category for the entire input file. |
| 50 | +* To classify each page of the input file, set the `splitMode` to `perPage`. The service attempts to classify each page as an individual document. |
| 51 | +* Set the `splitMode` to `auto` and the service identifies the documents and associated page ranges. |
| 52 | + |
| 53 | +### Optional analysis |
| 54 | + |
| 55 | +For a complete end to end flow, you may link classifier categories with existing analyzers. For each content object classified to categories with linked analyzers, the service automatically invokes analysis on the content object using the corresponding analyzer. As an example, this linking can be used to create classifiers that identify and analyze only invoices from a PDF that may contain multiple types of forms in a document. |
| 56 | + |
| 57 | +* Set the `analyzerId` to an existing analyzer to route and perform field extraction from the classified documents or pages. |
| 58 | + |
| 59 | +### Classifier limits |
| 60 | + |
| 61 | +* Classifier requires at least one distinct category to be defined. Response contains the page ranges for each of the categories of documents identified. |
| 62 | + |
| 63 | +* The maximum allowed number of categories is 50. |
| 64 | + |
| 65 | +* The maximum length of input file is 300 pages. |
| 66 | + |
| 67 | +* For each category name and description, there's a limit of 120 characters combined. |
| 68 | + |
| 69 | +* By default, there's an `$other` class as well, which we utilize to categorize the pages into for cases where any of the defined categories doesn't seem suitable. |
| 70 | + |
| 71 | +Classifier categorizes each page of the input document, unless specified, to one of the defined categories. You can specify the page numbers to analyze in the input document as well. |
| 72 | + |
| 73 | +For detailed information on supported input document formats, refer to our [Service quotas and limits](../service-limits.md) page. |
| 74 | + |
| 75 | + |
| 76 | +### Best practices |
| 77 | + |
| 78 | +To improve classification and splitting quality, it's important to give a good category name and description so the model can understand the categories with some context. For more information on category names and descriptions, *see* [Best practices](../concepts/best-practices.md#classifier-category-names-and-descriptions). |
| 79 | + |
| 80 | +## Key benefits |
| 81 | + |
| 82 | +* **Accuracy and reliability:** Ensure precise document classification, reducing errors and boosting efficiency. |
| 83 | +* **Scalability:** Seamlessly scale out document processing to meet business demands. |
| 84 | +* **Customizable:** Adapt document classifier to fit specific workflows. |
| 85 | + |
| 86 | +## Supported languages and regions |
| 87 | +For a detailed list of supported languages and regions, visit our [Language and region support](../language-region-support.md) page. |
| 88 | + |
| 89 | +## Data privacy and security |
| 90 | +Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page. |
| 91 | + |
| 92 | +## Next step |
| 93 | +* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing). |
| 94 | +* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md). |
0 commit comments