You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/concept-composed-models.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,6 @@ With composed models, you can assign multiple custom models to a composed model
43
43
::: moniker range="form-recog-3.0.0"
44
44
45
45
With the introduction of [****custom classifier models****](./concept-custom-classifier.md), you can choose to use [**composed models**](./concept-composed-models.md) or the classifier model as an explicit step before analysis. For a deeper understanding of when to use a classifier or composed model, _see_[**Custom classifier models**](concept-custom-classifier.md).
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/concept-custom-label-tips.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ This article highlights the best methods for labeling custom model datasets in t
21
21
22
22
* The following video is the second of two presentations intended to help you build custom models with higher accuracy (the first presentation explores [How to create a balanced data set](concept-custom-label.md#video-custom-label-tips-and-pointers)).
23
23
24
-
* Here, we'll examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
24
+
* Here, we examine best practices for labeling your selected documents. With semantically relevant and consistent labeling, you should see an improvement in model performance.</br></br>
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/concept-custom-label.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,9 +22,9 @@ Custom models (template and neural) require a labeled dataset of at least five d
22
22
23
23
A labeled dataset consists of several files:
24
24
25
-
* You'll provide a set of sample documents (typically PDFs or images). A minimum of five documents is needed to train a model.
25
+
* You provide a set of sample documents (typically PDFs or images). A minimum of five documents is needed to train a model.
26
26
27
-
* Additionally, the labeling process will generate the following files:
27
+
* Additionally, the labeling process generates the following files:
28
28
29
29
* A `fields.json` file is created when the first field is added. There's one `fields.json` file for the entire training dataset, the field list contains the field name and associated sub fields and types.
30
30
@@ -36,19 +36,19 @@ A labeled dataset consists of several files:
36
36
37
37
* The following video is the first of two presentations intended to help you build custom models with higher accuracy (The second presentation examines [Best practices for labeling documents](concept-custom-label-tips.md#video-custom-labels-best-practices)).
38
38
39
-
* Here, we'll explore how to create a balanced data set and select the right documents to label. This process will set you on the path to higher quality models.</br></br>
39
+
* Here, we explore how to create a balanced data set and select the right documents to label. This process sets you on the path to higher quality models.</br></br>
Before you start labeling, it's a good idea to look at a few different samples of the document to identify which samples you want to use in your labeled dataset. A balanced dataset represents all the typical variations you would expect to see for the document. Creating a balanced dataset will result in a model with the highest possible accuracy. A few examples to consider are:
45
+
Before you start labeling, it's a good idea to look at a few different samples of the document to identify which samples you want to use in your labeled dataset. A balanced dataset represents all the typical variations you would expect to see for the document. Creating a balanced dataset results in a model with the highest possible accuracy. A few examples to consider are:
46
46
47
47
***Document formats**: If you expect to analyze both digital and scanned documents, add a few examples of each type to the training dataset
48
48
49
49
***Variations (template model)**: Consider splitting the dataset into folders and train a model for each of variation. Any variations that include either structure or layout should be split into different models. You can then compose the individual models into a single [composed model](concept-composed-models.md).
50
50
51
-
***Variations (Neural models)**: When your dataset has a manageable set of variations, about 15 or fewer, create a single dataset with a few samples of each of the different variations to train a single model. If the number of template variations is larger than 15, you'll train multiple models and [compose](concept-composed-models.md) them together.
51
+
***Variations (Neural models)**: When your dataset has a manageable set of variations, about 15 or fewer, create a single dataset with a few samples of each of the different variations to train a single model. If the number of template variations is larger than 15, you train multiple models and [compose](concept-composed-models.md) them together.
52
52
53
53
***Tables**: For documents containing tables with a variable number of rows, ensure that the training dataset also represents documents with different numbers of rows.
54
54
@@ -70,12 +70,12 @@ Use the following guidelines to define the fields:
70
70
71
71
* For tabular fields spanning multiple pages, define and label the fields as a single table.
72
72
73
-
.[!NOTE]
73
+
>[!NOTE]
74
74
> Custom neural models share the same labeling format and strategy as custom template models. Currently custom neural models only support a subset of the field types supported by custom template models.
75
75
76
76
## Model capabilities
77
77
78
-
Custom neural models currently only support key-value pairs, structured fields (tables), and selection marks.
78
+
Custom neural models currently only support key-value pairs, structured fields (tables), and selection marks.
79
79
80
80
| Model type | Form fields | Selection marks | Tabular fields | Signature | Region |
81
81
|--|--|--|--|--|--|
@@ -100,7 +100,7 @@ Tabular fields are also useful when extracting repeating information within a do
100
100
101
101
***Consistent labeling**. If a value appears in multiple contexts withing the document, consistently pick the same context across documents to label the value.
102
102
103
-
***Visually repeating data**. Tables support visually repeating groups of information not just explicit tables. Explicit tables will be identified in tables section of the analyzed documents as part of the layout output and don't need to be labeled as tables. Only label a table field if the information is visually repeating and not identified as a table as part of the layout response. An example would be the repeating work experience section of a resume.
103
+
***Visually repeating data**. Tables support visually repeating groups of information not just explicit tables. Explicit tables are identified in tables section of the analyzed documents as part of the layout output and don't need to be labeled as tables. Only label a table field if the information is visually repeating and not identified as a table as part of the layout response. An example would be the repeating work experience section of a resume.
104
104
105
105
***Region labeling (custom template)**. Labeling specific regions allows you to define a value when none exists. If the value is optional, ensure that you leave a few sample documents with the region not labeled. When labeling regions, don't include the surrounding text with the label.
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/how-to-guides/build-a-custom-classifier.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ recommendations: false
17
17
18
18
[!INCLUDE [applies to v3.0](../includes/applies-to-v3-0.md)]
19
19
20
-
Custom classifier models can classify each page in a input file to identify the document(s) within. Classifier models can also identify multiple documents or multiple instances of a single document in the input file. Form Recognizer custom models require as few as five training documents per document class to get started. to get started training a custom classifier model you need at least **five documents** for each class and **two classes** of documents.
20
+
Custom classifier models can classify each page in an input file to identify the document(s) within. Classifier models can also identify multiple documents or multiple instances of a single document in the input file. Form Recognizer custom models require as few as five training documents per document class to get started. To get started training a custom classifier model, you need at least **five documents** for each class and **two classes** of documents.
21
21
22
22
## Custom classifier model input requirements
23
23
@@ -41,7 +41,7 @@ Once you've put together the set of forms or documents for training, you need to
41
41
42
42
The Form Recognizer Studio provides and orchestrates all the API calls required to complete your dataset and train your model.
43
43
44
-
1. Start by navigating to the [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio). The first time you use the Studio, you'll need to [initialize your subscription, resource group, and resource](../quickstarts/try-v3-form-recognizer-studio.md). Then, follow the [prerequisites for custom projects](../quickstarts/try-v3-form-recognizer-studio.md#additional-prerequisites-for-custom-projects) to configure the Studio to access your training dataset.
44
+
1. Start by navigating to the [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio). The first time you use the Studio, you need to [initialize your subscription, resource group, and resource](../quickstarts/try-v3-form-recognizer-studio.md). Then, follow the [prerequisites for custom projects](../quickstarts/try-v3-form-recognizer-studio.md#additional-prerequisites-for-custom-projects) to configure the Studio to access your training dataset.
45
45
46
46
1. In the Studio, select the **Custom classifier models** tile, on the custom models section of the page and select the **Create a project** button.
47
47
@@ -60,7 +60,7 @@ The Form Recognizer Studio provides and orchestrates all the API calls required
60
60
61
61
:::image type="content" source="../media/how-to/studio-select-storage.png" alt-text="Screenshot showing how to select the Form Recognizer resource.":::
62
62
63
-
1. Training a custom classifiers requires the output from the Layout model for each document in your dataset. Run layout on all documents as an optional step to speed up the model training process.
63
+
1. Training a custom classifier requires the output from the Layout model for each document in your dataset. Run layout on all documents as an optional step to speed up the model training process.
64
64
65
65
1. Finally, review your project settings and select **Create Project** to create a new project. You should now be in the labeling window and see the files in your dataset listed.
66
66
@@ -70,15 +70,15 @@ In your project, you only need to label each document with the appropriate class
70
70
71
71
:::image type="content" source="../media/how-to/studio-create-label.png" alt-text="Screenshot showing elect the Form Recognizer resource.":::
72
72
73
-
You'll see the files you uploaded to storage in the file list, ready to be labeled. You have a few options to label your dataset.
73
+
You see the files you uploaded to storage in the file list, ready to be labeled. You have a few options to label your dataset.
74
74
75
-
1. If the documents are organized in folders, the Studio will prompt you to use the folder names as labels. This will simplify your labeling down to a single click.
75
+
1. If the documents are organized in folders, the Studio prompts you to use the folder names as labels. This step simplifies your labeling down to a single select.
76
76
77
-
1. To assign a label to a document, click on the add label selection mark to assign a label.
77
+
1. To assign a label to a document, select on the add label selection mark to assign a label.
78
78
79
-
1. Control click to multi-select documents to assign a label
79
+
1. Control select to multi-select documents to assign a label
80
80
81
-
You should now have all the documents in your dataset labeled. If you look at the storage account, you'll find a*.ocr.json* files that correspond to each document in your training dataset and a new **class-name.jsonl** file for each class labeled. This training dataset will be submitted to train the model.
81
+
You should now have all the documents in your dataset labeled. If you look at the storage account, you find *.ocr.json* files that correspond to each document in your training dataset and a new **class-name.jsonl** file for each class labeled. This training dataset is submitted to train the model.
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/how-to-guides/compose-custom-models.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,7 +108,7 @@ See [Form Recognizer Studio: labeling as tables](../quickstarts/try-v3-form-reco
108
108
109
109
Training with labels leads to better performance in some scenarios. To train with labels, you need to have special label information files (*\<filename\>.pdf.labels.json*) in your blob storage container alongside the training documents.
110
110
111
-
Label files contain key-value associations that a user has entered manually. They're needed for labeled data training, but not every source file needs to have a corresponding label file. Source files without labels will be treated as ordinary training documents. We recommend five or more labeled files for reliable training. You can use a UI tool like [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects) to generate these files.
111
+
Label files contain key-value associations that a user has entered manually. They're needed for labeled data training, but not every source file needs to have a corresponding label file. Source files without labels are treated as ordinary training documents. We recommend five or more labeled files for reliable training. You can use a UI tool like [Form Recognizer Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects) to generate these files.
112
112
113
113
Once you have your label files, you can include them with by calling the training method with the *useLabelFile* parameter set to `true`.
114
114
@@ -347,9 +347,9 @@ Form Recognizer uses the [Layout](../concept-layout.md) API to learn the expecte
347
347
> [!NOTE]
348
348
> **Model Compose is only available for custom models trained *with* labels.** Attempting to compose unlabeled models will produce an error.
349
349
350
-
With the Model Compose operation, you can assign up to 200 trained custom models to a single model ID. When you call Analyze with the composed model ID, Form Recognizer will first classify the form you submitted, choose the best matching assigned model, and then return results for that model. This operation is useful when incoming forms may belong to one of several templates.
350
+
With the Model Compose operation, you can assign up to 200 trained custom models to a single model ID. When you call Analyze with the composed model ID, Form Recognizer classifies the form you submitted first, chooses the best matching assigned model, and then returns results for that model. This operation is useful when incoming forms may belong to one of several templates.
351
351
352
-
Using the Form Recognizer Sample Labeling tool, the REST API, or the Client-library SDKs, follow the steps below to set up a composed model:
352
+
Using the Form Recognizer Sample Labeling tool, the REST API, or the Client-library SDKs, follow the steps to set up a composed model:
353
353
354
354
1.[**Gather your custom model IDs**](#gather-your-custom-model-ids)
355
355
1.[**Compose your custom models**](#compose-your-custom-models)
@@ -413,7 +413,7 @@ Using the **REST API**, you can make a [**Compose Custom Model**](https://westu
413
413
414
414
### [**Client-library SDKs**](#tab/sdks)
415
415
416
-
Use the programming language code of your choice to create a composed model that will be called with a single model ID. Below are links to code samples that demonstrate how to create a composed model from existing custom models:
416
+
Use the programming language code of your choice to create a composed model that is called with a single model ID. The following links are code samples that demonstrate how to create a composed model from existing custom models:
0 commit comments