Merge pull request #4712 from laujan/jp-4661-4662-4667-4666-pr-updates

JamesJBarnett · web-flow · commit 6126a665f77c · 2025-05-09T16:45:12.000-07:00
Jp 4661 4662 4667 4666 pr updates
diff --git a/articles/ai-services/content-understanding/concepts/analyzer-templates.md b/articles/ai-services/content-understanding/concepts/analyzer-templates.md
@@ -8,7 +8,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 ---
 
 # Analyzer templates offered with Content Understanding
diff --git a/articles/ai-services/content-understanding/concepts/best-practices.md b/articles/ai-services/content-understanding/concepts/best-practices.md
@@ -87,4 +87,12 @@ When you're working with audio and video content, selecting a narrow set of lang
 
 By default, Content Extraction information such as speech transcripts, document text extracted by `OCR`, and video key frames can be accessed directly from the analyzer output for immediate review or custom processing. There's no need to define a field in the schema for these items. Fields can be used when more processing is needed, for example, summarizing transcripts, identifying entities, or extracting specific items from `OCR`. Each field can instruct the system to extract or generate the content you need.
 
+## Classifier category names and descriptions
+
+To improve the classifier and splitting accuracy, it's important to give a good category name and description with context. For example:
+
+* Common titles for category names (ex. Annual Financial Report, SEC Form 10-K)
+* Semantic definition of the category for descriptions (ex. receipts for expense reporting)
+* Common layout of the initial page in the description (two-column form)
+* Key content that uniquely identifies a category as a description ("2025" on the upper right)
 
diff --git a/articles/ai-services/content-understanding/concepts/classifiers.md b/articles/ai-services/content-understanding/concepts/classifiers.md
@@ -0,0 +1,94 @@
+---
+title: Azure AI Content Understanding classifier overview
+titleSuffix: Azure AI services
+description: Learn about Azure AI Content Understanding classifier solutions.
+author: laujan
+ms.author: lajanuar
+manager: nitinme
+ms.service: azure-ai-content-understanding
+ms.topic: overview
+ms.date: 05/19/2025
+---
+
+# Content Understanding classifier
+
+> [!IMPORTANT]
+>
+> * Classifier is only available for documents with the `2025-05-01-preview` release.
+> * Azure AI Content Understanding classifier is available in `2025-05-01-preview` release. Public preview releases provide early access to features that are in active development.
+> * Features, approaches, and processes can change or have limited capabilities, before General Availability (GA).
+> * For more information, *see* [**Supplemental Terms of Use for Microsoft Azure Previews**](https://azure.microsoft.com/support/legal/preview-supplemental-terms).
+
+Azure AI Content Understanding classifier enables you to detect and identify documents you process within your application. Content Understanding classifier performs classification of an input file one page at a time to identify the documents within and can also identify multiple documents or multiple instances of a single document within an input file.
+
+## Business use cases
+
+Classifier can process complex documents in various formats and templates:
+
+* **Invoices**: Categorize invoices from multiple vendors to process each category with a different Content Understanding analyzer if needed.
+* **Tax documents**: Categorize multiple tax documents into different types of tax forms such as 1040, 1099, etc.
+* **Contracts**: Long, unstructured contracts can now be categorized to streamline operations to understand different types of agreements and their specific legal implications.
+
+
+## Content Understanding classifier capabilities
+
+Content Understanding classifier can analyze a single- or multi-file documents to identify if an input file can be classified into a category as defined. Here are the currently supported scenarios:
+
+* A single file containing one document type, such as a loan application form.
+* A single file containing multiple document types. For instance, a loan application package that contains a loan application form, payslip, and bank statement.
+* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices.
+
+### How to use Content Understanding classifier
+
+Content Understanding classifier doesn't require any training dataset. Define up to 50 category name and description and create a classifier. By default, the entire file is treated as a single content object, meaning the file/object is associated to a single category.
+
+However, when you have more than one document in a file, the classifier can identify the different document types contained within the input file with splitting capability. The classifier response contains the page ranges for each of the identified document types contained within a file. This response can include multiple instances of the same document type.
+
+When you call the classifier, the `analyze` operation includes a `splitMode` property that gives you granular control over the splitting behavior.
+
+* To treat the entire input file as a single document for classification set the `splitMode` to `none`. When you do so, the service returns just one category for the entire input file.
+* To classify each page of the input file, set the `splitMode` to `perPage`. The service attempts to classify each page as an individual document.
+* Set the `splitMode` to `auto` and the service identifies the documents and associated page ranges.
+
+### Optional analysis
+
+For a complete end to end flow, you may link classifier categories with existing analyzers. For each content object classified to categories with linked analyzers, the service automatically invokes analysis on the content object using the corresponding analyzer. As an example, this linking can be used to create classifiers that identify and analyze only invoices from a PDF that may contain multiple types of forms in a document.
+
+* Set the `analyzerId` to an existing analyzer to route and perform field extraction from the classified documents or pages.
+
+### Classifier limits
+
+* Classifier requires at least one distinct category to be defined. Response contains the page ranges for each of the categories of documents identified.
+
+* The maximum allowed number of categories is 50.
+
+* The maximum length of input file is 300 pages.
+
+* For each category name and description, there's a limit of 120 characters combined.
+
+* By default, there's an `$other` class as well, which we utilize to categorize the pages into for cases where any of the defined categories doesn't seem suitable.
+
+Classifier categorizes each page of the input document, unless specified, to one of the defined categories. You can specify the page numbers to analyze in the input document as well.
+
+For detailed information on supported input document formats, refer to our [Service quotas and limits](../service-limits.md) page.
+
+
+### Best practices
+
+To improve classification and splitting quality, it's important to give a good category name and description so the model can understand the categories with some context. For more information on category names and descriptions, *see* [Best practices](../concepts/best-practices.md#classifier-category-names-and-descriptions).
+
+## Key benefits
+
+* **Accuracy and reliability:** Ensure precise document classification, reducing errors and boosting efficiency.
+* **Scalability:** Seamlessly scale out document processing to meet business demands.
+* **Customizable:** Adapt document classifier to fit specific workflows.
+
+## Supported languages and regions
+For a detailed list of supported languages and regions, visit our [Language and region support](../language-region-support.md) page.
+
+## Data privacy and security
+Developers using Content Understanding should review Microsoft's policies on customer data. For more information, visit our [Data, protection, and privacy](https://www.microsoft.com/trust-center/privacy) page.
+
+## Next step
+* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
+* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md).
diff --git a/articles/ai-services/content-understanding/document/overview.md b/articles/ai-services/content-understanding/document/overview.md
@@ -8,7 +8,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 ---
 
 # Content Understanding document solutions (preview)
diff --git a/articles/ai-services/content-understanding/glossary.md b/articles/ai-services/content-understanding/glossary.md
@@ -26,3 +26,4 @@ ms.author: lajanuar
 | **Span** | A reference indicating the location of an element (for example, field, word) within the extracted Markdown content. A character offset and length represent a span. Different programming languages use various character encodings, which can affect the exact offset and length values for Unicode text. To avoid confusion, spans are only returned if the desired encoding is explicitly specified in the request. Some elements can map to multiple spans if they aren't contiguous in the markdown (for example, page). |
 | **Grounding source** | The specific regions in content where a value was generated. It has different representations depending on the file type: <br>&bullet; **Image** - A polygon in the image, often an axis-aligned rectangle (bounding box). <br>&bullet; **PDF/TIFF** - A polygon on a specific page, often a quadrilateral. <br>&bullet; **Audio** - A start and end time range. <br>&bullet; **Video** - A start and end time range with an optional polygon in each frame, often a bounding box.|
 | **Confidence score** | The level of certainty that the extracted data is accurate. |
+| **Category** | A distinct class within a classifier used to group similar input files based on shared characteristics or features. |
diff --git a/articles/ai-services/content-understanding/image/overview.md b/articles/ai-services/content-understanding/image/overview.md
@@ -8,7 +8,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: how-to
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 ---
 
 # Content Understanding image solutions (preview)
diff --git a/articles/ai-services/content-understanding/overview.md b/articles/ai-services/content-understanding/overview.md
@@ -8,7 +8,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: overview
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 
 #customer intent: As a user, I want to learn more about Content Understanding solutions.
 ---
diff --git a/articles/ai-services/content-understanding/quickstart/use-ai-foundry.md b/articles/ai-services/content-understanding/quickstart/use-ai-foundry.md
@@ -7,7 +7,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: quickstart
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 ---
 
 # Use Content Understanding in the Azure AI Foundry
diff --git a/articles/ai-services/content-understanding/service-limits.md b/articles/ai-services/content-understanding/service-limits.md
@@ -7,7 +7,6 @@ manager: nitinme
 ms.service: azure-ai-content-understanding
 ms.topic: conceptual
 ms.date: 05/19/2025
-ms.custom: ignite-2024-understanding-release
 ms.author: lajanuar
 ---
 
@@ -89,15 +88,33 @@ The following limits apply as of version 2024-12-01-preview.
 
 ### Classification fields
 
+   > [!NOTE]
+   > This classification field is the one within the extraction capability and not the separate [Content Understanding classifier](concepts/classifiers.md) itself.
+
 Classification fields can be defined to return either a single category (single-label classification) or multiple categories (multi-label classification).
 
 * **Single-label classification**: Defined using a string field with the `classify` method. It can be a top-level basic field or a subfield within a group or table.
 * **Multi-label classification**: Represented as a list of string fields with the `classify` method. In the [REST API](/rest/api/contentunderstanding/operation-groups?view=rest-contentunderstanding-2024-12-01-preview&preserve-view=true), `method=classify` and `enum` are specified on the inner string field and can only be a top-level field.
 
-*Note: Document analyzers currently don't support classification fields.*
-
 
 ## Training limits
 | File type| Max training data |
 | ---| --- |
 | Document | 1 GB total<br>50k pages/images |
+
+## Classifier limits
+
+The following limits apply as of version 2025-05-01-preview.
+
+### Input File Limits (Documents only)
+
+| Supported File Types | File Size | Length |
+| --- | --- | --- |
+| ✓ `.pdf`<br> ✓ `.tiff`<br> ✓ `.jpg`<br> ✓ `.png`<br> ✓ `.bmp`<br> ✓ `.heif` | ≤ 200 MB | ≤ 300 pages |
+| ✓ `.txt`  | ≤ 1 MB | ≤ 1M characters |
+
+### Category Limits
+
+* **Category Name and Description**: Limit of total 120 characters for each category name and description combined.
+* **Category Name**: Category name can't start with `$`.
+* **Number of categories**: Minimum 1 to maximum 50 categories per classifier.
diff --git a/articles/ai-services/content-understanding/toc.yml b/articles/ai-services/content-understanding/toc.yml
@@ -65,6 +65,9 @@ items:
     - name: Accuracy and confidence
       displayName: accuracy, confidence, analyzers, optimization, fields, scores
       href: concepts/accuracy-confidence.md
+    - name: 🆕 Classifiers
+      displayName: classifiers, text, images, video, audio, multimodal, visual, structured, content, field, extraction
+      href: concepts/classifiers.md  
     - name: Retrieval-augmented generation (RAG)
       displayName: RAG, retrieval, augmented, generation, knowledge, base, search, index, vector
       href: concepts/retrieval-augmented-generation.md