MicrosoftDocs
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-analyze-document-response.md
Lines changed: 275 additions & 0 deletions b/‎articles/applied-ai-services/form-recognizer/concept-analyze-document-response.md
Lines changed: 275 additions & 0 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-composed-models.md
Lines changed: 3 additions & 3 deletions b/‎articles/applied-ai-services/form-recognizer/concept-composed-models.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md
Lines changed: 59 additions & 0 deletions b/‎articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md
Lines changed: 59 additions & 0 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-custom-label.md
Lines changed: 119 additions & 0 deletions b/‎articles/applied-ai-services/form-recognizer/concept-custom-label.md
Lines changed: 119 additions & 0 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/concept-custom-neural.md
Lines changed: 13 additions & 5 deletions b/‎articles/applied-ai-services/form-recognizer/concept-custom-neural.md
Lines changed: 13 additions & 5 deletions
diff --git a/‎articles/applied-ai-services/form-recognizer/media/bounding-regions.png
8.73 KB b/‎articles/applied-ai-services/form-recognizer/media/bounding-regions.png
8.73 KB
diff --git a/‎articles/applied-ai-services/form-recognizer/media/key-value-pair.png
11.7 KB b/‎articles/applied-ai-services/form-recognizer/media/key-value-pair.png
11.7 KB
diff --git a/‎articles/applied-ai-services/form-recognizer/media/lines.png
12.2 KB b/‎articles/applied-ai-services/form-recognizer/media/lines.png
12.2 KB
diff --git a/‎articles/applied-ai-services/form-recognizer/media/paragraph.png
37.5 KB b/‎articles/applied-ai-services/form-recognizer/media/paragraph.png
37.5 KB
diff --git a/‎articles/applied-ai-services/form-recognizer/media/selection-marks.png
13.3 KB b/‎articles/applied-ai-services/form-recognizer/media/selection-marks.png
13.3 KB
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: applied-ai-services
 ms.subservice: forms-recognizer
 ms.topic: conceptual
-ms.date: 10/20/2022
+ms.date: 12/15/2022
 ms.author: lajanuar
 recommendations: false
 ---
@@ -41,12 +41,12 @@ With composed models, you can assign multiple custom models to a composed model
 
 ### Composed model compatibility
 
-|Custom model type|Models trained with v2.1 and v2.0| Custom template models v3.0 |Custom neural models v3.0 |Custom neural models 3.0 (GA)|
+|Custom model type|Models trained with v2.1 and v2.0 | Custom template models v3.0 |Custom neural models v3.0 (preview) |Custom neural models 3.0 (GA)|
 |--|--|--|--|--|
 |**Models trained with version 2.1 and v2.0** |Supported|Supported|Not Supported|Not Supported|
 |**Custom template models v3.0** |Supported|Supported|Not Supported|NotSupported|
 |**Custom template models v3.0 (GA)** |Not Supported|Not Supported|Supported|Not Supported|
-|**Custom neural models v3.0**|Not Supported|Not Supported|Supported|Not Supported|
+|**Custom neural models v3.0 (preview)**|Not Supported|Not Supported|Supported|Not Supported|
 |**Custom Neural models v3.0 (GA)**|Not Supported|Not Supported|Not Supported|Supported|
 
 * To compose a model trained with a prior version of the API (v2.1 or earlier), train a model with the v3.0 API using the same labeled dataset. That addition will ensure that the v2.1 model can be composed with other models.
 
@@ -0,0 +1,59 @@
+---
+title: Labeling tips for custom models in the Form Recognizer Studio
+titleSuffix: Azure Applied AI Services
+description: Label tips and tricks for Form Recognizer Studio
+author: laujan
+manager: nitinme
+ms.service: applied-ai-services
+ms.subservice: forms-recognizer
+ms.topic: conceptual
+ms.date: 12/15/2022
+ms.author: vikurpad
+ms.custom: references_regions
+recommendations: false
+---
+
+# Tips for labeling custom model datasets
+
+This article highlights the best methods for labeling custom model datasets in the Form Recognizer Studio. Labeling documents can be time consuming when you have a large number of labels, long documents, or documents with varying structure. These tips should help you label documents more efficiently.
+
+## Search
+
+The Studio now includes a search box for instances when you know you need to find specific words to label, but just don't know where they're located in the document. Simply search for the word or phrase and navigate to the specific section in the document to label the occurrence.
+
+## Auto label tables
+
+Tables can be challenging to label, when they have many rows or dense text. If the layout table extracts the result you need, you should just use that result and skip the labeling process. In instances where the layout table isn't exactly what you need, you can start with generating the table field from the values layout extracts. Start by selecting the table icon on the page and select on the auto label button. You can then edit the values as needed. Auto label currently only supports single page tables.
+
+## Shift select
+
+When labeling a large span of text, rather than mark each word in the span, hold down the shift key as you're selecting the words to speed up labeling and ensure you don't miss any words in the span of text.
+
+## Region labeling
+
+A second option for labeling larger spans of text is to use region labeling. When region labeling is used, the OCR results are populated in the value at training time. The difference between the shift select and region labeling is only in the visual feedback the shift labeling approach provides.
+
+## Field subtypes
+
+When creating a field, select the right subtype to minimize post processing, for instance select the ```dmy``` option for dates to extract the values in a ```dd-mm-yyyy``` format.
+
+## Batch layout
+
+When creating a project, select the batch layout option to prepare all documents in your dataset for labeling. This feature ensures that you no longer have to select on each document and wait for the layout results before you can start labeling.
+
+## Next steps
+
+* Learn more about custom labeling:
+
+  > [!div class="nextstepaction"]
+  > [Custom labels](concept-custom-label.md)
+
+* Learn more about custom template models:
+
+  > [!div class="nextstepaction"]
+  > [Custom template models](concept-custom-template.md )
+
+* Learn more about custom neural models:
+
+  > [!div class="nextstepaction"]
+  > [Custom neural models](concept-custom-neural.md )
@@ -0,0 +1,119 @@
+---
+title: Best practices for labeling documents in the Form Recognizer Studio
+titleSuffix: Azure Applied AI Services
+description: Label documents in the Studio to create a training dataset. Labeling guidelines aimed at training a model with high accuracy
+author: laujan
+manager: nitinme
+ms.service: applied-ai-services
+ms.subservice: forms-recognizer
+ms.topic: conceptual
+ms.date: 12/15/2022
+ms.author: vikurpad
+ms.custom: references_regions
+monikerRange: 'form-recog-3.0.0'
+recommendations: false
+---
+
+# Best practices: Generating Form Recognizer labeled dataset
+
+Custom models (template and neural) require a labeled dataset of at least five documents to train a model. The quality of the labeled dataset affects the accuracy of the trained model. This guide helps you learn more about generating a model with high accuracy by assembling a diverse dataset and provides best practices for labeling your documents.
+
+## Understand the components of a labeled dataset
+
+A labeled dataset consists of several files:
+
+* You'll provide a set of sample documents (typically PDFs or images). A minimum of five documents is needed to train a model.
+
+* Additionally, the labeling process will generate the following files:
+
+  * A `fields.json` file is created when the first field is added. There's one `fields.json` file for the entire training dataset, the field list contains the field name and associated sub fields and types.
+
+  * The Studio runs each of the documents through the [Layout API](concept-layout.md). The layout response for each of the sample files in the dataset is added as `{file}.ocr.json`. The layout response is used to generate the field labels when a specific span of text is labeled.
+
+  * A `{file}.labels.json` file is created or updated when a field is labeled in a document. The label file contains the spans of text and associated polygons from the layout output for each span of text the user adds as a value for a specific field.
+
+## Create a balanced dataset
+
+Before you start labeling, it's a good idea to look at a few different samples of the document to identify which samples you want to use in your labeled dataset. A balanced dataset represents all the typical variations you would expect to see for the document. Creating a balanced dataset will result in a model with the highest possible accuracy. A few examples to consider are:
+
+* **Document formats**: If you expect to analyze both digital and scanned documents, add a few examples of each type to the training dataset
+
+* **Variations (template model)**:  Consider splitting the dataset into folders and train a model for each of variation. Variations that include either structure or layout should be split into different models. You can then compose the individual models into a single [composed model](concept-composed-models.md).
+
+* **Variations (Neural models)**: When your dataset has a manageable set of variations, about 15 or fewer, create a single dataset with a few samples of each of the different variations to train a single model. If the number of template variations is larger than 15, you'll train multiple models and [compose](concept-composed-models.md) them together.
+
+* **Tables**: For documents containing tables with a variable number of rows, ensure that the training dataset also represents documents with different numbers of rows.
+
+* **Multi page tables**: When tables span multiple pages, label a single table. Add documents to the training dataset with the expected variations represented—documents with the table on a single page only and documents with the table spanning two or more pages with all the rows labeled.
+
+* **Optional fields**: If your dataset contains documents with optional fields, validate that the training dataset has a few documents with the options represented.
+
+## Start by identifying the fields
+
+Take the time to identify each of the fields you plan to label in the dataset. Pay attention to optional fields. Define the fields with the labels that best match the supported types.
+
+Use the following guidelines to define the fields:
+
+* For custom neural models, use semantically relevant names for fields. For example, if the value being extracted is `Effective Date`, name it `effective_date` or `EffectiveDate` not a generic name like **date1**.
+
+* Ideally, name your fields with Pascal or camel case.
+
+* If a value is part of a visually repeating structure and you only need a single value, label it as a table and extract the required value during post-processing.
+
+* For tabular fields spanning multiple pages, define and label the fields as a single table.
+
+. [!NOTE] 
+> Custom neural models share the same labeling format and strategy as custom template models. Currently custom neural models only support a subset of the field types supported by custom template models.
+
+## Model capabilities
+
+Custom neural models currently only support key-value pairs, structured fields (tables), and selection marks. 
+
+| Model type | Form fields | Selection marks | Tabular fields | Signature | Region |
+|--|--|--|--|--|--|
+| Custom neural | ✔️Supported | ✔️Supported | ✔️Supported | Unsupported | ✔️Supported<sup>1</sup> |
+| Custom template | ✔️Supported| ✔️Supported | ✔️Supported | ✔️Supported | ✔️Supported |
+
+<sup>1</sup> Region labeling implementation differs between template and neural models. For template models, the training process injects synthetic data at training time if no text is found in the region labeled. With neural models, no synthetic text is injected and the recognized text is used as is.
+
+## Tabular fields
+
+Tabular fields (tables) are supported with custom neural models starting with API version ```2022-06-30-preview```. Models trained with API version 2022-06-30-preview or later will accept tabular field labels and documents analyzed with the model with API version 2022-06-30-preview or later will produce tabular fields in the output within the  ```documents``` section of the result in the ```analyzeResult``` object.
+
+Tabular fields support **cross page tables** by default. To label a table that spans multiple pages, label each row of the table across the different pages in the single table. As a best practice, ensure that your dataset contains a few samples of the expected variations. For example, include both samples where an entire table is on a single page and samples of a table spanning two or more pages.
+
+Tabular fields are also useful when extracting repeating information within a document that isn't recognized as a table. For example, a repeating section of work experiences in a resume can be labeled and extracted as a tabular field.
+
+## Labeling guidelines
+
+* **Labeling values is required.** Don't include the surrounding text. For example when labeling a checkbox, name the field to indicate the check box selection for example ```selectionYes``` and ```selectionNo``` rather than labeling the yes or no text in the document.
+
+* **Don't provide interleaving field values** The value of words and/or regions of one field must be either a consecutive sequence in natural reading order without interleaving with other fields or in a region that doesn't cover any other fields
+
+* **Consistent labeling**. If a value appears in multiple contexts withing the document, consistently pick the same context across documents to label the value.
+
+* **Visually repeating data**. Tables support visually repeating groups of information not just explicit tables. Explicit tables will be identified in tables section of the analyzed documents as part of the layout output and don't need to be labeled as tables. Only label a table field if the information is visually repeating and not identified as a table as part of the layout response. An example would be the repeating work experience section of a resume.
+
+* **Region labeling (custom template)**. Labeling specific regions allows you to define a value when none exists. If the value is optional, ensure that you leave a few sample documents with the region not labeled. When labeling regions, don't include the surrounding text with the label.
+
+## Next steps
+
+* Train a custom model:
+
+  > [!div class="nextstepaction"]
+  > [How to train a model](how-to-guides/build-custom-model-v3.md)
+
+* Learn more about custom template models:
+
+  > [!div class="nextstepaction"]
+  > [Custom template models](concept-custom-template.md )
+
+* Learn more about custom neural models:
+
+  > [!div class="nextstepaction"]
+  > [Custom neural models](concept-custom-neural.md )
+
+* View the REST API:
+
+    > [!div class="nextstepaction"]
+    > [Form Recognizer API v3.0](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v3-0-preview-2/operations/AnalyzeDocument)
@@ -7,7 +7,7 @@ manager: nitinme
 ms.service: applied-ai-services
 ms.subservice: forms-recognizer
 ms.topic: conceptual
-ms.date: 12/02/2022
+ms.date: 12/15/2022
 ms.author: lajanuar
 ms.custom: references_regions
 monikerRange: 'form-recog-3.0.0'
@@ -30,11 +30,13 @@ Custom neural models share the same labeling format and strategy as [custom temp
 
 ## Model capabilities
 
-Custom neural models currently only support key-value pairs and selection marks, future releases will include support for structured fields (tables) and signature.
+Custom neural models currently only support key-value pairs and selection marks and structured fields (tables), future releases will include support for signatures.
 
 | Form fields | Selection marks | Tabular fields | Signature | Region |
 |:--:|:--:|:--:|:--:|:--:|
-| Supported | Supported | Supported | Unsupported | Unsupported |
+| Supported | Supported | Supported | Unsupported | Supported <sup>1</sup> |
+
+<sup>1</sup> Region labels in custom neural models will use the results from the Layout API for specified region. This feature is different from template models where, if no value is present, text is generated at training time.
 
 ### Build mode
 
@@ -59,24 +61,30 @@ Tabular fields are also useful when extracting repeating information within a do
 
 ## Supported regions
 
-As of September 16, 2022, Form Recognizer custom neural model training will only be available in the following Azure regions until further notice:
+As of October 18, 2022, Form Recognizer custom neural model training will only be available in the following Azure regions until further notice:
 
 * Australia East
 * Brazil South
 * Canada Central
 * Central India
 * Central US
 * East Asia
+* East US
+* East US2
 * France Central
 * Japan East
 * South Central US
 * Southeast Asia
 * UK South
 * West Europe
 * West US2
+* US Gov Arizona
+* US Gov Virginia
+
+
 
 > [!TIP]
-> You can [copy a model](disaster-recovery.md#copy-api-overview) trained in one of the select regions listed above to **any other region** and use it accordingly.
+> You can [copy a model](disaster-recovery.md#copy-api-overview) trained in one of the select regions listed to **any other region** and use it accordingly.
 >
 > Use the [**REST API**](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2022-08-31/operations/CopyDocumentModelTo) or [**Form Recognizer Studio**](https://formrecognizer.appliedai.azure.com/studio/custommodel/projects) to copy a model to another region.