MicrosoftDocs
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-add-on-capabilities.md
Lines changed: 113 additions & 0 deletions b/‎articles/applied-ai-services/form-recognizer/concept-add-on-capabilities.md
Lines changed: 113 additions & 0 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-business-card.md
Lines changed: 6 additions & 6 deletions b/‎articles/applied-ai-services/form-recognizer/concept-business-card.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-composed-models.md
Lines changed: 8 additions & 4 deletions b/‎articles/applied-ai-services/form-recognizer/concept-composed-models.md
Lines changed: 8 additions & 4 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-custom-classifier.md
Lines changed: 135 additions & 0 deletions b/‎articles/applied-ai-services/form-recognizer/concept-custom-classifier.md
Lines changed: 135 additions & 0 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md
Lines changed: 1 addition & 1 deletion b/‎articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,113 @@
+---
+title: Add-on capabilities - Form Recognizer
+titleSuffix: Azure Applied AI Services
+description: How to increase service limit capacity with add-on capabilities.
+author: jaep3347
+manager: nitinme
+ms.service: applied-ai-services
+ms.subservice: forms-recognizer
+ms.topic: conceptual
+ms.date: 03/03/2023
+ms.author: lajanuar
+monikerRange: 'form-recog-3.0.0'
+recommendations: false
+---
+<!-- markdownlint-disable MD033 -->
+
+# Azure Form Recognizer add-on capabilities
+
+**This article applies to:** ![Form Recognizer v3.0 checkmark](media/yes-icon.png) **Form Recognizer v3.0**.
+
+> [!NOTE]
+>
+> Add-on capabilities for Form Recognizer Studio are only available within the Read and Layout models for the `2023-02-28-preview` release.
+
+Form Recognizer now supports more sophisticated analysis capabilities. These optional capabilities can be enabled and disabled depending on the scenario of the document extraction. There are three add-on capabilities available for the `2023-02-28-preview`:
+
+* [`ocr.highResolution`](#high-resolution-extraction)
+
+* [`ocr.formula`](#formula-extraction)
+
+* [`ocr.font`](#font-property-extraction)
+
+## High resolution extraction
+
+The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes and orientations. Moreover, the text may be broken into separate parts or connected with other symbols. Form Recognizer now supports extracting content from these types of documents with the `ocr.highResolution` capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.
+
+## Formula extraction
+
+The `ocr.formula` capability extracts all identified formulas, such as mathematical equations, in the `formulas` collection as a top level object under `content`. Inside `content`, detected formulas are represented as `:formula:`. Each entry in this collection represents a formula that includes the formula type as `inline` or `display`, and its LaTeX representation as `value` along with its `polygon` coordinates. Initially, formulas appear at the end of each page.
+
+   > [!NOTE]
+   > The `confidence` score is hard-coded for the `2023-02-28` public preview release.
+
+   ```json
+   "content": ":formula:",
+     "pages": [
+       {
+         "pageNumber": 1,
+         "formulas": [
+           {
+             "kind": "inline",
+             "value": "\\frac { \\partial a } { \\partial b }",
+             "polygon": [...],
+             "span": {...},
+             "confidence": 0.99
+           },
+           {
+             "kind": "display",
+             "value": "y = a \\times b + a \\times c",
+             "polygon": [...],
+             "span": {...},
+             "confidence": 0.99
+           }
+         ]
+       }
+     ]
+   ```
+
+## Font property extraction
+
+The `ocr.font` capability extracts all font properties of text extracted in the `styles` collection as a top-level object under `content`. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such as `similarFontFamily` for the font of the text, `fontStyle` for styles such as italic and normal, `fontWeight` for bold or normal, `color` for color of the text, and `backgroundColor` for color of the text bounding box.
+
+   ```json
+   "content": "Foo bar",
+   "styles": [
+       {
+         "similarFontFamily": "Arial, sans-serif",
+         "spans": [ { "offset": 0, "length": 3 } ],
+         "confidence": 0.98
+       },
+       {
+         "similarFontFamily": "Times New Roman, serif",
+         "spans": [ { "offset": 4, "length": 3 } ],
+         "confidence": 0.98
+       },
+       {
+         "fontStyle": "italic",
+         "spans": [ { "offset": 1, "length": 2 } ],
+         "confidence": 0.98
+       },
+       {
+         "fontWeight": "bold",
+         "spans": [ { "offset": 2, "length": 3 } ],
+         "confidence": 0.98
+       },
+       {
+         "color": "#FF0000",
+         "spans": [ { "offset": 4, "length": 2 } ],
+         "confidence": 0.98
+       },
+       {
+         "backgroundColor": "#00FF00",
+         "spans": [ { "offset": 5, "length": 2 } ],
+         "confidence": 0.98
+       }
+     ]
+   ```
+
+## Next steps
+
+> [!div class="nextstepaction"]
+> Learn more:
+> [**Read model**](concept-read.md) [**Layout model**](concept-layout.md).
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: applied-ai-services
 ms.subservice: forms-recognizer
 ms.topic: conceptual
-ms.date: 11/14/2022
+ms.date: 03/03/2023
 ms.author: lajanuar
 recommendations: false
 ---
@@ -23,7 +23,7 @@ recommendations: false
 [!INCLUDE [applies to v2.1](includes/applies-to-v2-1.md)]
 ::: moniker-end
 
-The Form Recognizer business card model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to analyze and extract key information from business card images. The API analyzes printed business cards; extracts key information such as first name, last name, company name, email address, and phone number;  and returns a structured JSON data representation.
+The Form Recognizer business card model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to analyze and extract data from business card images. The API analyzes printed business cards; extracts key information such as first name, last name, company name, email address, and phone number;  and returns a structured JSON data representation.
 
 ## Business card data extraction
 
@@ -48,7 +48,7 @@ Business cards are a great way to represent a business or a professional. The co
 
 ::: moniker range="form-recog-3.0.0"
 
-The following tools are supported by Form Recognizer v3.0:
+Form Recognizer v3.0 supports the following tools:
 
 | Feature | Resources | Model ID |
 |----------|-------------|-----------|
@@ -58,7 +58,7 @@ The following tools are supported by Form Recognizer v3.0:
 
 ::: moniker range="form-recog-2.1.0"
 
-The following tools are supported by Form Recognizer v2.1:
+Form Recognizer v2.1 supports the following tools:
 
 | Feature | Resources |
 |----------|-------------------------|
@@ -68,7 +68,7 @@ The following tools are supported by Form Recognizer v2.1:
 
 ### Try business card data extraction
 
-See how data, including name, job title, address, email, and company name, is extracted from business cards. You'll need the following resources:
+See how data, including name, job title, address, email, and company name, is extracted from business cards. You need the following resources:
 
 * An Azure subscription—you can [create one for free](https://azure.microsoft.com/free/cognitive-services/)
 
@@ -125,7 +125,7 @@ See how data, including name, job title, address, email, and company name, is ex
 
     :::image type="content" source="media/fott-select-form-type.png" alt-text="Screenshot of the select-form-type dropdown menu.":::
 
-1. Select **Run analysis**. The Form Recognizer Sample Labeling tool will call the Analyze Prebuilt API and analyze the document.
+1. Select **Run analysis**. The Form Recognizer Sample Labeling tool calls the Analyze Prebuilt API and analyze the document.
 
 1. View the results - see the key-value pairs extracted, line items, highlighted text extracted and tables detected.
 
 
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: applied-ai-services
 ms.subservice: forms-recognizer
 ms.topic: conceptual
-ms.date: 02/28/2023
+ms.date: 03/03/2023
 ms.author: lajanuar
 recommendations: false
 ---
@@ -38,7 +38,11 @@ With composed models, you can assign multiple custom models to a composed model
 
 * For ```Custom neural``` models the best practice is to add all the different variations of a single document type into a single training dataset and train on custom neural model. Model compose is best suited for scenarios when you have documents of different types being submitted for analysis.
 
-* Pricing is the same whether you're using a composed model or selecting a specific model. One model analyzes each document. With composed models, the system performs a classification to check which of the composed custom models should be invoked and invokes the single best model for the document.
+::: moniker-end
+
+::: moniker range="form-recog-3.0.0"
+
+With the introduction of [****custom classifier models****](./concept-custom-classifier.md), you can choose to use [**composed models**](./concept-composed-models.md) or the classifier model as an explicit step before analysis. For a deeper understanding  of when to use a classifier or composed model, _see_ [**Custom classifier models**](concept-custom-classifier.md).
 
 ## Compose model limits
 
@@ -57,7 +61,7 @@ With composed models, you can assign multiple custom models to a composed model
 
 * To compose a model trained with a prior version of the API (v2.1 or earlier), train a model with the v3.0 API using the same labeled dataset. That addition ensures that the v2.1 model can be composed with other models.
 
-* Models composed with v2.1 of the API continue to be supported, requiring no updates.
+* Models composed with v2.1 of the API continues to be supported, requiring no updates.
 
 * The limit for maximum number of custom models that can be composed is 100.
 
@@ -90,4 +94,4 @@ Learn to create and compose custom models:
 
 > [!div class="nextstepaction"]
 > [**Build a custom model**](how-to-guides/build-a-custom-model.md)
-> [**Compose custom models**](how-to-guides/compose-custom-models.md)
+> [**Compose custom models**](how-to-guides/compose-custom-models.md)
@@ -0,0 +1,135 @@
+---
+title: Custom classifier model - Form Recognizer
+titleSuffix: Azure Applied AI Services
+description: Use the custom classifier model to train a model to identify and split the documents you process within your application.
+author: vkurpad
+manager: nitinme
+ms.service: applied-ai-services
+ms.subservice: forms-recognizer
+ms.topic: conceptual
+ms.date: 03/03/2023
+ms.author: lajanuar
+ms.custom: references_regions
+monikerRange: 'form-recog-3.0.0'
+recommendations: false
+---
+
+# Custom classifier model
+
+**This article applies to:** ![Form Recognizer v3.0 checkmark](media/yes-icon.png) **Form Recognizer v3.0**.
+
+Custom classifier models are deep-learning-model types that combine layout and language features to accurately detect and identify documents you process within your application. Custom classifier models can classify each page in an input file to identify the document(s) within and can also identify multiple documents or multiple instances of a single document within an input file.
+
+## Model capabilities
+
+Custom classifier models can analyze a single- or multi-file documents to identify if any of the trained document types are contained within an input file. Here are the currently supported scenarios:
+
+* A single file containing one document. For instance, a loan application form.
+
+* A single file containing multiple documents. For instance, a loan application package containing a loan application form, payslip, and bank statement.
+
+* A single file containing multiple instances of the same document. For instance, a collection of scanned invoices.
+
+Training a custom classifier model requires at least two distinct classes and a minimum of five samples per class.
+
+### Compare custom classifier and composed models
+
+A custom classifier model can replace [a composed model](concept-composed-models.md) in some scenarios but there are a few differences to be aware of:
+
+| Capability | Custom classifier process | Composed model process |
+|--|--|--|
+|Analyze a single document of unknown type belonging to one of the types trained for extraction model processing.| &#9679; Requires multiple calls. </br> &#9679; Call the classifier models based on the document class. This step allows for a confidence-based check before invoking the extraction model analysis.</br> &#9679; Invoke the extraction model. | &#9679; Requires a single call to a composed model containing the model corresponding to the input document type. |
+ |Analyze a single document of unknown type belonging to several types trained for extraction model processing.| &#9679;Requires multiple calls.</br> &#9679; Make a call to the classifier that ignores documents not matching a designated type for extraction.</br> &#9679; Invoke the extraction model. | &#9679;  Requires a single call to a composed model. The service selects a custom model within the composed model with the highest match.</br> &#9679; A composed model can't ignore documents.|
+|Analyze a file containing multiple documents of known or unknown type belonging to one of the types trained for extraction model processing.| &#9679; Requires multiple calls. </br> &#9679; Call the extraction model for each identified document in the input file.</br> &#9679; Invoke the extraction model. | &#9679;  Requires a single call to a composed model.</br> &#9679; The composed model invokes the component model once on the first instance of the document. </br> &#9679;The remaining documents are ignored. |
+
+## Language support
+
+Classifier models currently only support English language documents.
+
+## Best practices
+
+Custom classifier models require a minimum of five samples per class to train. If the classes are similar, adding extra training samples improves model accuracy.
+
+## Training a model
+
+Custom classifier models are only available in the [v3.0 API](v3-migration-guide.md) starting with API version ```2023-02-28-preview```. [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier.
+
+When using the REST API, if you've organized your documents by folders, you can use the ```azureBlobSource``` property of the request to train a classifier model.
+
+```rest
+https://{endpoint}/formrecognizer/documentClassifiers:build?api-version=2023-02-28-preview
+
+{
+  "classifierId": "demo2.1",
+  "description": "",
+  "docTypes": {
+    "car-maint": {
+        "azureBlobSource": {
+            "containerUrl": "SAS URL to container",
+            "prefix": "sample1/car-maint/"
+            }
+    },
+    "cc-auth": {
+        "azureBlobSource": {
+            "containerUrl": "SAS URL to container",
+            "prefix": "sample1/cc-auth/"
+            }
+    },
+    "deed-of-trust": {
+        "azureBlobSource": {
+            "containerUrl": "SAS URL to container",
+            "prefix": "sample1/deed-of-trust/"
+            }
+    }
+  }
+}
+
+```
+
+Alternatively, if you have a flat list of files or only plan to use a few select files within each folder to train the model, you can use the ```azureBlobFileListSource``` property to train the model. This step requires a ```file list``` in [JSON Lines](https://jsonlines.org/) format. For each class, add a new file with a list of files to be submitted for training.
+
+```rest
+{
+  "classifierId": "demo2",
+  "description": "",
+  "docTypes": {
+    "car-maint": {
+      "azureBlobFileListSource": {
+        "containerUrl": "SAS URL to container",
+        "fileList": "sample1/car-maint.jsonl"
+      }
+    },
+    "cc-auth": {
+      "azureBlobFileListSource": {
+        "containerUrl": "SAS URL to container",
+        "fileList": "sample1/cc-auth.jsonl"
+      }
+    },
+    "deed-of-trust": {
+      "azureBlobFileListSource": {
+        "containerUrl": "SAS URL to container",
+        "fileList": "sample1/deed-of-trust.jsonl"
+      }
+    }
+  }
+}
+
+```
+
+File list `car-maint.jsonl` contains the following files.
+
+```json
+{"file":"sample1/car-maint/Commercial Motor Vehicle - Adatum.pdf"}
+{"file":"sample1/car-maint/Commercial Motor Vehicle - Fincher.pdf"}
+{"file":"sample1/car-maint/Commercial Motor Vehicle - Lamna.pdf"}
+{"file":"sample1/car-maint/Commercial Motor Vehicle - Liberty.pdf"}
+{"file":"sample1/car-maint/Commercial Motor Vehicle - Trey.pdf"}
+```
+
+## Next steps
+
+Learn to create custom classifier models:
+
+> [!div class="nextstepaction"]
+> [**Build a custom classifier model**](how-to-guides/build-a-custom-classifier.md)
+> [**Custom models overview**](concept-custom.md)
@@ -21,7 +21,7 @@ This article highlights the best methods for labeling custom model datasets in t
 
 * The following video is the second of two presentations intended to help you build custom models with higher accuracy (the first presentation explores [How to create a balanced data set](concept-custom-label.md#video-custom-label-tips-and-pointers)).
 
-* Here, we'll examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
+* Here, we examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
 
   > [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE5fZKB ]