Skip to content

Commit 1507e63

Browse files
committed
update main
1 parent e008500 commit 1507e63

File tree

6 files changed

+45
-46
lines changed

6 files changed

+45
-46
lines changed

articles/applied-ai-services/form-recognizer/concept-composed-models.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,6 @@ With composed models, you can assign multiple custom models to a composed model
4343
::: moniker range="form-recog-3.0.0"
4444

4545
With the introduction of [****custom classifier models****](./concept-custom-classifier.md), you can choose to use [**composed models**](./concept-composed-models.md) or the classifier model as an explicit step before analysis. For a deeper understanding of when to use a classifier or composed model, _see_ [**Custom classifier models**](concept-custom-classifier.md).
46-
::: moniker-end
4746

4847
## Compose model limits
4948

articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This article highlights the best methods for labeling custom model datasets in t
2121

2222
* The following video is the second of two presentations intended to help you build custom models with higher accuracy (the first presentation explores [How to create a balanced data set](concept-custom-label.md#video-custom-label-tips-and-pointers)).
2323

24-
* Here, we'll examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
24+
* Here, we examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
2525

2626
> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE5fZKB ]
2727

articles/applied-ai-services/form-recognizer/concept-custom-label.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ Custom models (template and neural) require a labeled dataset of at least five d
2222

2323
A labeled dataset consists of several files:
2424

25-
* You'll provide a set of sample documents (typically PDFs or images). A minimum of five documents is needed to train a model.
25+
* You provide a set of sample documents (typically PDFs or images). A minimum of five documents is needed to train a model.
2626

27-
* Additionally, the labeling process will generate the following files:
27+
* Additionally, the labeling process generates the following files:
2828

2929
* A `fields.json` file is created when the first field is added. There's one `fields.json` file for the entire training dataset, the field list contains the field name and associated sub fields and types.
3030

@@ -36,19 +36,19 @@ A labeled dataset consists of several files:
3636

3737
* The following video is the first of two presentations intended to help you build custom models with higher accuracy (The second presentation examines [Best practices for labeling documents](concept-custom-label-tips.md#video-custom-labels-best-practices)).
3838

39-
* Here, we'll explore how to create a balanced data set and select the right documents to label. This process will set you on the path to higher quality models.</br></br>
39+
* Here, we explore how to create a balanced data set and select the right documents to label. This process sets you on the path to higher quality models.</br></br>
4040

4141
> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RWWHru]
4242
4343
## Create a balanced dataset
4444

45-
Before you start labeling, it's a good idea to look at a few different samples of the document to identify which samples you want to use in your labeled dataset. A balanced dataset represents all the typical variations you would expect to see for the document. Creating a balanced dataset will result in a model with the highest possible accuracy. A few examples to consider are:
45+
Before you start labeling, it's a good idea to look at a few different samples of the document to identify which samples you want to use in your labeled dataset. A balanced dataset represents all the typical variations you would expect to see for the document. Creating a balanced dataset results in a model with the highest possible accuracy. A few examples to consider are:
4646

4747
* **Document formats**: If you expect to analyze both digital and scanned documents, add a few examples of each type to the training dataset
4848

4949
* **Variations (template model)**: Consider splitting the dataset into folders and train a model for each of variation. Any variations that include either structure or layout should be split into different models. You can then compose the individual models into a single [composed model](concept-composed-models.md).
5050

51-
* **Variations (Neural models)**: When your dataset has a manageable set of variations, about 15 or fewer, create a single dataset with a few samples of each of the different variations to train a single model. If the number of template variations is larger than 15, you'll train multiple models and [compose](concept-composed-models.md) them together.
51+
* **Variations (Neural models)**: When your dataset has a manageable set of variations, about 15 or fewer, create a single dataset with a few samples of each of the different variations to train a single model. If the number of template variations is larger than 15, you train multiple models and [compose](concept-composed-models.md) them together.
5252

5353
* **Tables**: For documents containing tables with a variable number of rows, ensure that the training dataset also represents documents with different numbers of rows.
5454

@@ -70,12 +70,12 @@ Use the following guidelines to define the fields:
7070

7171
* For tabular fields spanning multiple pages, define and label the fields as a single table.
7272

73-
. [!NOTE]
73+
> [!NOTE]
7474
> Custom neural models share the same labeling format and strategy as custom template models. Currently custom neural models only support a subset of the field types supported by custom template models.
7575
7676
## Model capabilities
7777

78-
Custom neural models currently only support key-value pairs, structured fields (tables), and selection marks.
78+
Custom neural models currently only support key-value pairs, structured fields (tables), and selection marks.
7979

8080
| Model type | Form fields | Selection marks | Tabular fields | Signature | Region |
8181
|--|--|--|--|--|--|
@@ -100,7 +100,7 @@ Tabular fields are also useful when extracting repeating information within a do
100100

101101
* **Consistent labeling**. If a value appears in multiple contexts withing the document, consistently pick the same context across documents to label the value.
102102

103-
* **Visually repeating data**. Tables support visually repeating groups of information not just explicit tables. Explicit tables will be identified in tables section of the analyzed documents as part of the layout output and don't need to be labeled as tables. Only label a table field if the information is visually repeating and not identified as a table as part of the layout response. An example would be the repeating work experience section of a resume.
103+
* **Visually repeating data**. Tables support visually repeating groups of information not just explicit tables. Explicit tables are identified in tables section of the analyzed documents as part of the layout output and don't need to be labeled as tables. Only label a table field if the information is visually repeating and not identified as a table as part of the layout response. An example would be the repeating work experience section of a resume.
104104

105105
* **Region labeling (custom template)**. Labeling specific regions allows you to define a value when none exists. If the value is optional, ensure that you leave a few sample documents with the region not labeled. When labeling regions, don't include the surrounding text with the label.
106106

articles/applied-ai-services/form-recognizer/how-to-guides/build-a-custom-classifier.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ recommendations: false
1717

1818
[!INCLUDE [applies to v3.0](../includes/applies-to-v3-0.md)]
1919

20-
Custom classifier models can classify each page in a input file to identify the document(s) within. Classifier models can also identify multiple documents or multiple instances of a single document in the input file. Form Recognizer custom models require as few as five training documents per document class to get started. to get started training a custom classifier model you need at least **five documents** for each class and **two classes** of documents.
20+
Custom classifier models can classify each page in an input file to identify the document(s) within. Classifier models can also identify multiple documents or multiple instances of a single document in the input file. Form Recognizer custom models require as few as five training documents per document class to get started. To get started training a custom classifier model, you need at least **five documents** for each class and **two classes** of documents.
2121

2222
## Custom classifier model input requirements
2323

@@ -41,7 +41,7 @@ Once you've put together the set of forms or documents for training, you need to
4141

4242
The Form Recognizer Studio provides and orchestrates all the API calls required to complete your dataset and train your model.
4343

44-
1. Start by navigating to the [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio). The first time you use the Studio, you'll need to [initialize your subscription, resource group, and resource](../quickstarts/try-v3-form-recognizer-studio.md). Then, follow the [prerequisites for custom projects](../quickstarts/try-v3-form-recognizer-studio.md#additional-prerequisites-for-custom-projects) to configure the Studio to access your training dataset.
44+
1. Start by navigating to the [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio). The first time you use the Studio, you need to [initialize your subscription, resource group, and resource](../quickstarts/try-v3-form-recognizer-studio.md). Then, follow the [prerequisites for custom projects](../quickstarts/try-v3-form-recognizer-studio.md#additional-prerequisites-for-custom-projects) to configure the Studio to access your training dataset.
4545

4646
1. In the Studio, select the **Custom classifier models** tile, on the custom models section of the page and select the **Create a project** button.
4747

@@ -60,7 +60,7 @@ The Form Recognizer Studio provides and orchestrates all the API calls required
6060
6161
:::image type="content" source="../media/how-to/studio-select-storage.png" alt-text="Screenshot showing how to select the Form Recognizer resource.":::
6262

63-
1. Training a custom classifiers requires the output from the Layout model for each document in your dataset. Run layout on all documents as an optional step to speed up the model training process.
63+
1. Training a custom classifier requires the output from the Layout model for each document in your dataset. Run layout on all documents as an optional step to speed up the model training process.
6464

6565
1. Finally, review your project settings and select **Create Project** to create a new project. You should now be in the labeling window and see the files in your dataset listed.
6666

@@ -70,15 +70,15 @@ In your project, you only need to label each document with the appropriate class
7070

7171
:::image type="content" source="../media/how-to/studio-create-label.png" alt-text="Screenshot showing elect the Form Recognizer resource.":::
7272

73-
You'll see the files you uploaded to storage in the file list, ready to be labeled. You have a few options to label your dataset.
73+
You see the files you uploaded to storage in the file list, ready to be labeled. You have a few options to label your dataset.
7474

75-
1. If the documents are organized in folders, the Studio will prompt you to use the folder names as labels. This will simplify your labeling down to a single click.
75+
1. If the documents are organized in folders, the Studio prompts you to use the folder names as labels. This step simplifies your labeling down to a single select.
7676

77-
1. To assign a label to a document, click on the add label selection mark to assign a label.
77+
1. To assign a label to a document, select on the add label selection mark to assign a label.
7878

79-
1. Control click to multi-select documents to assign a label
79+
1. Control select to multi-select documents to assign a label
8080

81-
You should now have all the documents in your dataset labeled. If you look at the storage account, you'll find a *.ocr.json* files that correspond to each document in your training dataset and a new **class-name.jsonl** file for each class labeled. This training dataset will be submitted to train the model.
81+
You should now have all the documents in your dataset labeled. If you look at the storage account, you find *.ocr.json* files that correspond to each document in your training dataset and a new **class-name.jsonl** file for each class labeled. This training dataset is submitted to train the model.
8282

8383
## Train your model
8484

articles/applied-ai-services/form-recognizer/how-to-guides/compose-custom-models.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ See [Form Recognizer Studio: labeling as tables](../quickstarts/try-v3-form-reco
108108

109109
Training with labels leads to better performance in some scenarios. To train with labels, you need to have special label information files (*\<filename\>.pdf.labels.json*) in your blob storage container alongside the training documents.
110110

111-
Label files contain key-value associations that a user has entered manually. They're needed for labeled data training, but not every source file needs to have a corresponding label file. Source files without labels will be treated as ordinary training documents. We recommend five or more labeled files for reliable training. You can use a UI tool like [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects) to generate these files.
111+
Label files contain key-value associations that a user has entered manually. They're needed for labeled data training, but not every source file needs to have a corresponding label file. Source files without labels are treated as ordinary training documents. We recommend five or more labeled files for reliable training. You can use a UI tool like [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects) to generate these files.
112112

113113
Once you have your label files, you can include them with by calling the training method with the *useLabelFile* parameter set to `true`.
114114

@@ -347,9 +347,9 @@ Form Recognizer uses the [Layout](../concept-layout.md) API to learn the expecte
347347
> [!NOTE]
348348
> **Model Compose is only available for custom models trained *with* labels.** Attempting to compose unlabeled models will produce an error.
349349
350-
With the Model Compose operation, you can assign up to 200 trained custom models to a single model ID. When you call Analyze with the composed model ID, Form Recognizer will first classify the form you submitted, choose the best matching assigned model, and then return results for that model. This operation is useful when incoming forms may belong to one of several templates.
350+
With the Model Compose operation, you can assign up to 200 trained custom models to a single model ID. When you call Analyze with the composed model ID, Form Recognizer classifies the form you submitted first, chooses the best matching assigned model, and then returns results for that model. This operation is useful when incoming forms may belong to one of several templates.
351351

352-
Using the Form Recognizer Sample Labeling tool, the REST API, or the Client-library SDKs, follow the steps below to set up a composed model:
352+
Using the Form Recognizer Sample Labeling tool, the REST API, or the Client-library SDKs, follow the steps to set up a composed model:
353353

354354
1. [**Gather your custom model IDs**](#gather-your-custom-model-ids)
355355
1. [**Compose your custom models**](#compose-your-custom-models)
@@ -413,7 +413,7 @@ Using the **REST API**, you can make a [**Compose Custom Model**](https://westu
413413

414414
### [**Client-library SDKs**](#tab/sdks)
415415

416-
Use the programming language code of your choice to create a composed model that will be called with a single model ID. Below are links to code samples that demonstrate how to create a composed model from existing custom models:
416+
Use the programming language code of your choice to create a composed model that is called with a single model ID. The following links are code samples that demonstrate how to create a composed model from existing custom models:
417417

418418
* [**C#/.NET**](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/formrecognizer/Azure.AI.FormRecognizer/samples/Sample_ModelCompose.md).
419419

@@ -438,7 +438,7 @@ Use the programming language code of your choice to create a composed model that
438438

439439
1. Select the **Run Analysis** button.
440440

441-
1. The tool applies tags in bounding boxes and report the confidence percentage for each tag.
441+
1. The tool applies tags in bounding boxes and reports the confidence percentage for each tag.
442442

443443
:::image type="content" source="../media/analyze.png" alt-text="Screenshot: Form Recognizer tool analyze-a-custom-form window.":::
444444

0 commit comments

Comments
 (0)