Skip to content

Commit 92967c7

Browse files
authored
Merge pull request #188502 from MicrosoftDocs/release-preview2-form-recognizer
Release preview2 form recognizer--scheduled release at 10AM of 2/15
2 parents a0d0e1b + 254dec4 commit 92967c7

File tree

82 files changed

+1867
-534
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1867
-534
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44154,6 +44154,11 @@
4415444154
"redirect_url": "/azure/azure-monitor/agents/azure-monitor-agent-manage",
4415544155
"redirect_document_id": true
4415644156
},
44157+
{
44158+
"source_path_from_root": "/articles/applied-ai-services/form-recognizer/managed-identity-byos.md",
44159+
"redirect_url": "/azure/applied-ai-services/form-recognizer/managed-identities",
44160+
"redirect_document_id": false
44161+
},
4415744162
{
4415844163
"source_path_from_root": "/articles/azure/virtual-desktop/azure-advisor.md",
4415944164
"redirect_url": "/azure/advisor/advisor-overview",

articles/applied-ai-services/form-recognizer/api-v2-0/includes/csharp-v3-0-0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,7 @@ Submodel Form Type: form-63c013e3-1cab-43eb-84b0-f4b20cb9214c
382382

383383
## Analyze forms with a custom model
384384

385-
This section demonstrates how to extract key/value information and other content from your custom form types, using models you trained with your own forms.
385+
This section demonstrates how to extract key/value information and other content from your custom template types, using models you trained with your own forms.
386386

387387
> [!IMPORTANT]
388388
> In order to implement this scenario, you must have already trained a model so you can pass its ID into the method below.

articles/applied-ai-services/form-recognizer/api-v2-0/includes/java-v3-0-0.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ The model found field 'field-6' with label: VAT ID
294294

295295
## Analyze forms with a custom model
296296

297-
This section demonstrates how to extract key/value information and other content from your custom form types, using models you trained with your own forms.
297+
This section demonstrates how to extract key/value information and other content from your custom template types, using models you trained with your own forms.
298298

299299
> [!IMPORTANT]
300300
> In order to implement this scenario, you must have already trained a model so you can pass its ID into the method below. See the [Train a model](#train-a-model-without-labels) section.
@@ -314,7 +314,7 @@ The returned value is a collection of **RecognizedForm** objects: one for each p
314314

315315
```console
316316
Analyze PDF form...
317-
----------- Recognized custom form info for page 0 -----------
317+
----------- Recognized custom template info for page 0 -----------
318318
Form type: form-0
319319
Field 'field-0' has label 'Address:' with a confidence score of 0.91.
320320
Field 'field-1' has label 'Invoice For:' with a confidence score of 1.00.

articles/applied-ai-services/form-recognizer/api-v2-0/includes/javascript-v3-0-0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ Document errors: undefined
260260

261261
## Analyze forms with a custom model
262262

263-
This section demonstrates how to extract key/value information and other content from your custom form types, using models you trained with your own forms.
263+
This section demonstrates how to extract key/value information and other content from your custom template types, using models you trained with your own forms.
264264

265265
> [!IMPORTANT]
266266
> In order to implement this scenario, you must have already trained a model so you can pass its ID into the method below. See the [Train a model](#train-a-model-without-labels) section.

articles/applied-ai-services/form-recognizer/api-v2-0/includes/python-v3-0-0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ Document errors: []
268268

269269
## Analyze forms with a custom model
270270

271-
This section demonstrates how to extract key/value information and other content from your custom form types, using models you trained with your own forms.
271+
This section demonstrates how to extract key/value information and other content from your custom template types, using models you trained with your own forms.
272272

273273
> [!IMPORTANT]
274274
> In order to implement this scenario, you must have already trained a model so you can pass its ID into the method below. See the [Train a model](#train-a-model-without-labels) section.

articles/applied-ai-services/form-recognizer/build-training-data-set.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Follow these additional tips to further optimize your data set for training.
3838

3939
## Upload your training data
4040

41-
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](../../storage/blobs/storage-quickstart-blobs-portal.md). Use the standard performance tier.
41+
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, follow the [Azure Storage quickstart for Azure portal](../../storage/blobs/storage-quickstart-blobs-portal.md). Use the standard performance tier.
4242

4343
If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample Labeling tool](label-tool.md) (or your own UI) to generate these files.
4444

articles/applied-ai-services/form-recognizer/compose-custom-models-preview.md

Lines changed: 276 additions & 0 deletions
Large diffs are not rendered by default.

articles/applied-ai-services/form-recognizer/compose-custom-models.md

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,33 @@
11
---
2-
title: "How to guide: use custom and composed models"
2+
title: "How to guide: create and compose custom models with Form Recognizer v2.1"
33
titleSuffix: Azure Applied AI Services
4-
description: Learn how to create, use, and manage Form Recognizer custom and composed models
4+
description: Learn how to create, compose use, and manage custom models with Form Recognizer v2.1
55
author: laujan
66
manager: nitinme
77
ms.service: applied-ai-services
88
ms.subservice: forms-recognizer
99
ms.topic: how-to
10-
ms.date: 11/02/2021
10+
ms.date: 02/15/2022
1111
ms.author: lajanuar
1212
recommendations: false
13-
ms.custom: ignite-fall-2021
1413
---
1514

16-
# Use custom and composed models
15+
# Compose custom models v2.1
16+
17+
> [!NOTE]
18+
> This how-to guide references Form Recognizer v2.1 (GA). To try Form Recognizer v3.0 (preview), see [Compose custom models v3.0 (preview)](compose-custom-models-preview.md).
1719
1820
Form Recognizer uses advanced machine-learning technology to detect and extract information from document images and return the extracted data in a structured JSON output. With Form Recognizer, you can train standalone custom models or combine custom models to create composed models.
1921

2022
* **Custom models**. Form Recognizer custom models enable you to analyze and extract data from forms and documents specific to your business. Custom models are trained for your distinct data and use cases.
2123

2224
* **Composed models**. A composed model is created by taking a collection of custom models and assigning them to a single model that encompasses your form types. When a document is submitted to a composed model, the service performs a classification step to decide which custom model accurately represents the form presented for analysis.
2325

24-
***Model configuration window in Form Recognizer Studio***
25-
26-
:::image type="content" source="media/studio/composed-model.png" alt-text="Screenshot: model configuration window in Form Recognizer Studio.":::
27-
2826
In this article, you'll learn how to create Form Recognizer custom and composed models using our [Form Recognizer Sample Labeling tool](label-tool.md), [REST APIs](quickstarts/client-library.md?branch=main&pivots=programming-language-rest-api#train-a-custom-model), or [client-library SDKs](quickstarts/client-library.md?branch=main&pivots=programming-language-csharp#train-a-custom-model).
2927

3028
## Sample Labeling tool
3129

32-
You can see how data is extracted from custom forms by trying our Sample Labeling tool. You'll need the following:
30+
You can see how data is extracted from custom forms by trying our Sample Labeling tool. You'll need the following resources:
3331

3432
* An Azure subscription—you can [create one for free](https://azure.microsoft.com/free/cognitive-services/)
3533

@@ -72,15 +70,7 @@ to an Azure blob storage container. If you don't know how to create an Azure sto
7270

7371
## Train your custom model
7472

75-
You can [train your model](./quickstarts/try-sdk-rest-api.md#train-a-custom-model) with or without labeled data sets. Unlabeled datasets rely solely on the [Layout API](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeLayoutAsync) to detect and identify key information without added human input. Labeled datasets also rely on the Layout API, but supplementary human input is included such as your specific labels and field locations. To use both labeled and unlabeled data, start with at least five completed forms of the same type for the labeled training data and then add unlabeled data to the required data set.
76-
77-
### Train without labels
78-
79-
Form Recognizer uses unsupervised learning to understand the layout and relationships between fields and entries in your forms. When you submit your input forms, the algorithm clusters the forms by type, discovers what keys and tables are present, and associates values to keys and entries to tables. Training without labels doesn't require manual data labeling or intensive coding and maintenance, and we recommend you try this method first.
80-
81-
See [Build a training data set](./build-training-data-set.md) for tips on how to collect your training documents.
82-
83-
### Train with labels
73+
You [train your model](./quickstarts/try-sdk-rest-api.md#train-a-custom-model) with labeled data sets. Labeled datasets rely on the prebuilt-layout API, but supplementary human input is included such as your specific labels and field locations. Start with at least five completed forms of the same type for your labeled training data.
8474

8575
When you train with labeled data, the model uses supervised learning to extract values of interest, using the labeled forms you provide. Labeled data results in better-performing models and can produce models that work with complex forms or forms containing values without keys.
8676

@@ -220,4 +210,4 @@ Learn more about the Form Recognizer client library by exploring our API referen
220210

221211
> [!div class="nextstepaction"]
222212
> [Form Recognizer API reference](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeWithCustomForm)
223-
>
213+
>
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
title: Interpret and improve model accuracy and analysis confidence scores
3+
titleSuffix: Azure Applied AI Services
4+
description: Best practices to interpret the accuracy score from the train model operation and the confidence score from analysis operations.
5+
author: laujan
6+
manager: nitinme
7+
ms.service: applied-ai-services
8+
ms.subservice: forms-recognizer
9+
ms.topic: conceptual
10+
ms.date: 02/15/2022
11+
ms.author: vikurpad
12+
---
13+
14+
# Interpret and improve accuracy and confidence for custom models
15+
16+
> [!NOTE]
17+
>
18+
> * **Custom models do not provide accuracy scores during training**.
19+
> * Confidence scores for structured fields such as tables are currently unavailable.
20+
21+
Custom models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. In this article, you'll learn to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
22+
23+
## Accuracy scores
24+
25+
The output of a `build` (v3.0) or `train` (v2.1) custom model operation includes the estimated accuracy score. This score represents the model's ability to accurately predict the labeled value on a visually similar document.
26+
The accuracy value range is a percentage between 0% (low) and 100% (high). The estimated accuracy is calculated by running a few different combinations of the training data to predict the labeled values.
27+
28+
**Form Recognizer Studio** </br>
29+
**Trained custom model (invoice)**
30+
31+
:::image type="content" source="media/accuracy-confidence/accuracy-studio-results.png" alt-text="Trained custom model accuracy scores":::
32+
33+
## Confidence scores
34+
35+
Form Recognizer analysis results return an estimated confidence for predicted words, key-value pairs, selection marks, regions, and signatures. Currently, not all document fields return a confidence score.
36+
37+
Confidence indicates an estimated probability between 0 and 1 that the prediction is correct. For example, a confidence value of 0.95 (95%) indicates that the prediction is likely correct 19 out of 20 times. For scenarios where accuracy is critical, confidence may be used to determine whether to automatically accept the prediction or flag it for human review.
38+
39+
**Form Recognizer Studio** </br>
40+
**Analyzed invoice prebuilt-invoice model**
41+
42+
:::image type="content" source="media/accuracy-confidence/confidence-scores.png" alt-text="confidence scores from Form Recognizer Studio":::
43+
44+
## Interpret accuracy and confidence scores
45+
46+
The following table demonstrates how to interpret both the accuracy and confidence scores to measure your custom model's performance.
47+
48+
| Accuracy | Confidence | Result |
49+
|--|--|--|
50+
| High| High | <ul><li>The model is performing well with the labeled keys and document formats. </li><li>You have a balanced training dataset</li></ul> |
51+
| High | Low | <ul><li>The analyzed document appears different from the training dataset.</li><li>The model would benefit from retraining with at least five more labeled documents. </li><li>These results could also indicate a format variation between the training dataset and the analyzed document. </br>Consider adding a new model.</li></ul> |
52+
| Low | High | <ul><li>This result is most unlikely.</li><li>For low accuracy scores, add more labeled data or split visually distinct documents into multiple models.</li></ul> |
53+
| Low | Low| <ul><li>Add more labeled data.</li><li>Split visually distinct documents into multiple models.</li></ul>|
54+
55+
## Ensure high model accuracy
56+
57+
The accuracy of your model is affected by variances in the visual structure of your documents. Reported accuracy scores can be inconsistent when the analyzed documents differ from documents used in training. Keep in mind that a document set can look similar when viewed by humans but appear dissimilar to an AI model. Below, is a list of the best practices for training models with the highest accuracy. Following these guidelines should produce a model with higher accuracy and confidence scores during analysis and reduce the number of documents flagged for human review.
58+
59+
* Ensure that all variations of a document are included in the training dataset. Variations include different formats, for example, digital versus scanned PDFs.
60+
61+
* If you expect the model to analyze both types of PDF documents, add at least five samples of each type to the training dataset.
62+
63+
* Separate visually distinct document types to train different models.
64+
* As a general rule, if you remove all user entered values and the documents look similar, you need to add more training data to the existing model.
65+
* If the documents are dissimilar, split your training data into different folders and train a model for each variation. You can then [compose](compose-custom-models.md#create-a-composed-model) the different variations into a single model.
66+
67+
* Make sure that you don't have any extraneous labels.
68+
69+
* For signature and region labeling, don't include the surrounding text.
70+
71+
## Next step
72+
73+
> [!div class="nextstepaction"]
74+
> [Learn to create custom models ](quickstarts/try-v3-form-recognizer-studio.md#custom-models)

articles/applied-ai-services/form-recognizer/concept-business-card.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,21 +24,21 @@ The business card model combines powerful Optical Character Recognition (OCR) ca
2424

2525
## Development options
2626

27-
The following resources are supported by Form Recognizer v2.1:
27+
The following tools are supported by Form Recognizer v2.1:
2828

2929
| Feature | Resources |
3030
|----------|-------------------------|
3131
|**Business card model**| <ul><li>[**Form Recognizer labeling tool**](https://fott-2-1.azurewebsites.net/prebuilts-analyze)</li><li>[**REST API**](quickstarts/try-sdk-rest-api.md?pivots=programming-language-rest-api#analyze-business-cards)</li><li>[**Client-library SDK**](quickstarts/try-sdk-rest-api.md)</li><li>[**Form Recognizer Docker container**](containers/form-recognizer-container-install-run.md?tabs=business-card#run-the-container-with-the-docker-compose-up-command)</li></ul>|
3232

33-
The following resources are supported by Form Recognizer v3.0:
33+
The following tools are supported by Form Recognizer v3.0:
3434

3535
| Feature | Resources | Model ID |
3636
|----------|-------------|-----------|
3737
|**Business card model**| <ul><li>[**Form Recognizer Studio**](https://formrecognizer.appliedai.azure.com)</li><li>[**REST API**](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v3-0-preview-1/operations/AnalyzeDocument)</li><li>[**C# SDK**](quickstarts/try-v3-csharp-sdk.md)</li><li>[**Python SDK**](quickstarts/try-v3-python-sdk.md)</li><li>[**Java SDK**](quickstarts/try-v3-java-sdk.md)</li><li>[**JavaScript SDK**](quickstarts/try-v3-javascript-sdk.md)</li></ul>|**prebuilt-businessCard**|
3838

3939
### Try Form Recognizer
4040

41-
See how data, including name, job title, address, email, and company name, is extracted from business cards using the Form Recognizer Studio or our Sample Labeling tool. You'll need the following:
41+
See how data, including name, job title, address, email, and company name, is extracted from business cards using the Form Recognizer Studio or our Sample Labeling tool. You'll need the following resources:
4242

4343
* An Azure subscription—you can [create one for free](https://azure.microsoft.com/free/cognitive-services/)
4444

@@ -125,7 +125,7 @@ You will need a business card document. You can use our [sample business card do
125125

126126
* Follow our [**Form Recognizer v3.0 migration guide**](v3-migration-guide.md) to learn how to use the preview version in your applications and workflows.
127127

128-
* Explore our [**REST API (preview)**](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v3-0-preview-1/operations/AnalyzeDocument) to learn more about the preview version and new capabilities.
128+
* Explore our [**REST API (preview)**](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v3-0-preview-2/operations/AnalyzeDocument) to learn more about the preview version and new capabilities.
129129

130130
## Next steps
131131

0 commit comments

Comments
 (0)