Skip to content

Commit 3583c75

Browse files
Merge pull request #268550 from vkurpad/main
Update to Feb2024 release docs
2 parents fc36bca + 0199f8f commit 3583c75

File tree

7 files changed

+50
-67
lines changed

7 files changed

+50
-67
lines changed

articles/ai-services/document-intelligence/concept-accuracy-confidence.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ ms.author: lajanuar
1818

1919
> [!NOTE]
2020
>
21-
> * **Custom neural models do not provide accuracy scores during training**.
22-
> * Confidence scores for structured fields such as tables are currently unavailable.
21+
> * **Custom neural models** do not provide accuracy scores during training.
22+
> * Confidence scores for tables, table rows and table cells are available starting with the **2024-02-29-preview** API version for **custom models**.
2323
2424

25-
Custom models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. In this article, learn to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
25+
Custom template models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. In this article, learn to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
2626

2727
## Accuracy scores
2828

@@ -38,21 +38,25 @@ The accuracy value range is a percentage between 0% (low) and 100% (high). The e
3838

3939
> [!NOTE]
4040
>
41-
> * **Table cell confidence scores are now included with the 2024-02-29-preview API version**.
41+
> * **Table, row and cell confidence scores are now included with the 2024-02-29-preview API version**.
4242
> * Confidence scores for table cells from custom models is added to the API starting with the 2024-02-29-preview API.
4343
4444
Document Intelligence analysis results return an estimated confidence for predicted words, key-value pairs, selection marks, regions, and signatures. Currently, not all document fields return a confidence score.
4545

4646
Field confidence indicates an estimated probability between 0 and 1 that the prediction is correct. For example, a confidence value of 0.95 (95%) indicates that the prediction is likely correct 19 out of 20 times. For scenarios where accuracy is critical, confidence can be used to determine whether to automatically accept the prediction or flag it for human review.
4747

48-
Confidence scores have two data points: the field level confidence score and the text extraction confidence score. In addition to the field confidence of position and span, the text extraction confidence in the ```pages``` section of the response is the model's confidence in the text extraction (OCR) process. The two confidence scores should be combined to generate one overall confidence score.
49-
5048
**Document Intelligence Studio** </br>
5149
**Analyzed invoice prebuilt-invoice model**
5250

5351
:::image type="content" source="media/accuracy-confidence/confidence-scores.png" alt-text="confidence scores from Document Intelligence Studio":::
5452

55-
## Interpret accuracy and confidence scores
53+
## Interpret accuracy and confidence scores for custom models
54+
55+
When interpreting the confidence score from a custom model, you should consider all the confidence scores returned from the model. Let's start with a list of all the confidence scores.
56+
1. **Document type confidence score**: The document type confidence is an indicator of closely the analyzed document resembleds documents in the training dataset. When the document type confidence is low, this is indicative of template or structural variations in the analyzed document. To improve the document type confidence, label a document with that specific variation and add it to your training dataset. Once the model is re-trained, it should be better equipped to handl that class of variations.
57+
2. **Field level confidence**: Each labled field extracted has an associated confidence score. This score reflects the model's confidence on the position of the value extracted. While evaluating the confidence you should also look at the underlying extraction confidence to generate a comprehensive confidence for the extracted result. Evaluate the OCR results for text extraction or selection marks depending on the field type to generate a composite confidence score for the field.
58+
3. **Word confidence score** Each word extracted within the document has an associated confidence score. The score represents the confidence of the transcription. The pages array contains an array of words, each word has an associated span and confidence. Spans from the custom field extracted values will match the spans of the extracted words.
59+
4. **Selection mark confidence score**: The pages array also contains an array of selection marks, each selection mark has a confidence score representing the confidence of the seletion mark and selection state detection. When a labeled field is a selection mark, the custom field selection confidence combined with the selection mark confidence is an accurate representation of the overall confidence that the field was extracted correctly.
5660

5761
The following table demonstrates how to interpret both the accuracy and confidence scores to measure your custom model's performance.
5862

@@ -65,7 +69,7 @@ The following table demonstrates how to interpret both the accuracy and confiden
6569

6670
## Table, row, and cell confidence
6771

68-
With the addition of table, row and cell confidence with the ```2024-02-29-preview``` API, here are some common questions that should help with interpreting the scores:
72+
With the addition of table, row and cell confidence with the ```2024-02-29-preview``` API, here are some common questions that should help with interpreting the table, row and cell scores:
6973

7074
**Q:** Is it possible to see a high confidence score for cells, but a low confidence score for the row?<br>
7175

articles/ai-services/document-intelligence/concept-custom-classifier.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ With custom models, you need to maintain access to the training dataset to updat
6666

6767
> [!IMPORTANT]
6868
>
69-
> Incremental trainiing is only supported with models trained with the same API version. If you are trying to extend a model, use the API version the original model was trained with to extend the model.
69+
> Incremental training is only supported with models trained with the same API version. If you are trying to extend a model, use the API version the original model was trained with to extend the model. Incremental training is only supported with API version **2024-02-29-preview** or later.
7070
7171
Incremental training requires that you provide the original model ID as the `baseClassifierId`. See [incremental training](concept-incremental-classifier.md) to learn more about how to use incremental training.
7272

articles/ai-services/document-intelligence/concept-custom-neural.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,24 @@ Neural models support documents that have the same information, but different pa
6565

6666
*See* our [Language Support—custom models](language-support-custom.md) page for a complete list of supported languages.
6767

68-
## Tabular fields
68+
## Overlapping fields
69+
70+
With the release of API versions **2024-02-29-preview** and later, custom neural models will support overlapping fields:
71+
72+
To use the overlapping fields, your dataset needs to contain at least one sample with the expected overlap. To label an overlap, use **region labeling** to designate each of the spans of content (with the overlap) for each field. Labeling an overlap with field selection (highlighting a value) will fail in the studio as region labeling is the only supported labeling tool for indicating field overlaps. Overlap support includes:
73+
74+
* Complete overlap. The same set of tokens are labeled for two different fields.
75+
* Partial overlap. Some tokens belong to both fields, but there are tokens that are only part of one field or the other.
76+
77+
Overlapping fields have some limits:
78+
79+
* Any token or word can only be labeled as two fields.
80+
* overlapping fields in a table can't span table rows.
81+
* Overlapping fields can only be recognized if at least one sample in the dataset contains overlapping labels for those fields.
82+
83+
To use overlapping fields, label your dataset with the overlaps and train the model with the API version ```2024-02-29-preview``` or later.
84+
85+
## Tabular fields adds table, row and cell confidence
6986

7087
With the release of API versions **2022-06-30-preview** and later, custom neural models will support tabular fields (tables):
7188

@@ -92,23 +109,6 @@ Tabular fields provide **table, row and cell confidence** starting with the ```2
92109

93110
See [confidence and accuracy scores](concept-accuracy-confidence.md) to learn more about table, row, and cell confidence.
94111

95-
## Overlapping fields
96-
97-
With the release of API versions **2024-02-29-preview** and later, custom neural models will support overlapping fields:
98-
99-
To use the overlapping fields, your dataset needs to contain at least one sample with the expected overlap. To label an overlap, use region labeling to designate each of the spans of content (with the overlap) for each field. Overlap support includes:
100-
101-
* Complete overlap. The same set of tokens are labeled for two different fields.
102-
* Partial overlap. Some tokens belong to both fields, but there are tokens that are only part of one field or the other.
103-
104-
Overlapping fields have some limits:
105-
106-
* Any token or word can only be labeled as two fields.
107-
* overlapping fields in a table can't span table rows.
108-
* Overlapping fields can only be recognized if at least one sample in the dataset contains overlapping labels for those fields.
109-
110-
To use overlapping fields, label your dataset with the overlaps and train the model with the API version ```2024-02-29-preview``` or later.
111-
112112

113113
## Supported regions
114114

articles/ai-services/document-intelligence/concept-custom.md

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ To create a custom extraction model, label a dataset of documents with the value
5151

5252
> [!IMPORTANT]
5353
>
54-
> Starting with version 3.1—2024-02-29-preview API, custom neural models now support overlapping fields and table, row and cell level confidence.
54+
> Starting with version 4.0 — 2024-02-29-preview API, custom neural models now support **overlapping fields** and **table, row and cell level confidence**.
5555
>
5656
5757
The custom neural (custom document) model uses deep learning models and base model trained on a large collection of documents. This model is then fine-tuned or adapted to your data when you train the model with a labeled dataset. Custom neural models support structured, semi-structured, and unstructured documents to extract fields. Custom neural models currently support English-language documents. When you're choosing between the two model types, start with a neural model to determine if it meets your functional needs. See [neural models](concept-custom-neural.md) to learn more about custom document models.
@@ -219,10 +219,10 @@ For a detailed walkthrough to create your first custom extraction model, *see* [
219219

220220
This table compares the supported data extraction areas:
221221

222-
|Model| Form fields | Selection marks | Structured fields (Tables) | Signature | Region labeling |
223-
|--|:--:|:--:|:--:|:--:|:--:|
224-
|Custom template||||||
225-
|Custom neural|||| **n/a** | ***** |
222+
|Model| Form fields | Selection marks | Structured fields (Tables) | Signature | Region labeling | Overlapping fields |
223+
|--|:--:|:--:|:--:|:--:|:--:|:--:|
224+
|Custom template|||||| **n/a** |
225+
|Custom neural|||| **n/a** | * | ✔ (2024-02-29-preview) |
226226

227227
**Table symbols**:<br>
228228
✔—Supported<br>
@@ -268,27 +268,6 @@ The following table describes the features available with the associated tools a
268268

269269
*See* our [Language Support—custom models](language-support-custom.md) page for a complete list of supported languages.
270270

271-
### Try signature detection
272-
273-
* **Custom model v4.0, v3.1 and v3.0 APIs** supports signature detection for custom forms. When you train custom models, you can specify certain fields as signatures. When a document is analyzed with your custom model, it indicates whether a signature was detected or not.
274-
* [Document Intelligence v3.1 migration guide](v3-1-migration-guide.md): This guide shows you how to use the v3.0 version in your applications and workflows.
275-
* [REST API](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP): This API shows you more about the v3.0 version and new capabilities.
276-
277-
1. Build your training dataset.
278-
279-
1. Go to [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio). Under **Custom models**, select **Custom form**.
280-
281-
:::image type="content" source="media/label-tool/select-custom-form.png" alt-text="Screenshot that shows selecting the Document Intelligence Studio Custom form page.":::
282-
283-
1. Follow the workflow to create a new project:
284-
285-
* Follow the **Custom model** input requirements.
286-
287-
* Label your documents. For signature fields, use **Region** labeling for better accuracy.
288-
289-
:::image type="content" source="media/label-tool/signature-label-region-too.png" alt-text="Screenshot that shows the Label signature field.":::
290-
291-
After your training set is labeled, you can train your custom model and use it to analyze documents. The signature fields specify whether a signature was detected or not.
292271

293272
## Next steps
294273

articles/ai-services/document-intelligence/concept-model-overview.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ ms.author: lajanuar
4545

4646
The following table shows the available models for each current preview and stable API:
4747

48-
|**Model Type**| **Model**|&bullet; [2024-02-29-preview](/rest/api/aiservices/document-models/build-model?view=rest-aiservices-2024-02-29-preview&preserve-view=true&branch=docintelligence&tabs=HTTP) <br>&bullet [2023-10-31-preview](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-02-29-preview&preserve-view=true)|[2023-07-31 (GA)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP)|[2022-08-31 (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2022-08-31/operations/AnalyzeDocument)|[v2.1 (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeBusinessCardAsync)|
48+
|**Model Type**| **Model**|&bullet; [2024-02-29-preview](/rest/api/aiservices/document-models/build-model?view=rest-aiservices-2024-02-29-preview&preserve-view=true&branch=docintelligence&tabs=HTTP) <br> &bullet [2023-10-31-preview](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-02-29-preview&preserve-view=true)|[2023-07-31 (GA)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP)|[2022-08-31 (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2022-08-31/operations/AnalyzeDocument)|[v2.1 (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeBusinessCardAsync)|
4949
|----------------|-----------|---|--|---|---|
5050
|Document analysis models|[Read](concept-read.md) | ✔️| ✔️| ✔️| n/a|
5151
|Document analysis models|[Layout](concept-layout.md) | ✔️| ✔️| ✔️| ✔️|
@@ -61,11 +61,14 @@ The following table shows the available models for each current preview and stab
6161
|Prebuilt models|[US 1098-T Tax](concept-tax-document.md) | ✔️| ✔️| n/a| n/a|
6262
|Prebuilt models|[US 1099 Tax](concept-tax-document.md) | ✔️| n/a| n/a| n/a|
6363
|Prebuilt models|[US W2 Tax](concept-tax-document.md) | ✔️| ✔️| ✔️| n/a|
64-
|Prebuilt models|[Add-on capabilities](concept-add-on-capabilities.md) | ✔️| ✔️| n/a| n/a|
64+
|Prebuilt models|[US Mortgage 1003 URLA](concept-mortgage-documents.md) | ✔️| n/a| n/a| n/a|
65+
|Prebuilt models|[US Mortgage 1008 ](concept-mortgage-documents.md) | ✔️| n/a| n/a| n/a|
66+
|Prebuilt models|[US Mortgage closing disclosure](concept-mortgage-documents.md) | ✔️| n/a| n/a| n/a|
6567
|Custom models|[Custom classifier](concept-custom-classifier.md) | ✔️| ✔️| n/a| n/a|
6668
|Custom models|[Custom neural](concept-custom-neural.md) | ✔️| ✔️| ✔️| n/a|
6769
|Custom models|[Custom template](concept-custom-template.md) | ✔️| ✔️| ✔️| ✔️|
6870
|Custom models|[Custom composed](concept-composed-models.md) | ✔️| ✔️| ✔️| ✔️|
71+
|All models|[Add-on capabilities](concept-add-on-capabilities.md) | ✔️| ✔️| n/a| n/a|
6972

7073
|**Add-on Capability**| **Add-On/Free**|&bullet; [2024-02-29-preview](/rest/api/aiservices/document-models/build-model?view=rest-aiservices-2024-02-29-preview&preserve-view=true&branch=docintelligence&tabs=HTTP) <br>&bullet [2023-10-31-preview](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-02-29-preview&preserve-view=true|[`2023-07-31` (GA)](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP)|[`2022-08-31` (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-2022-08-31/operations/AnalyzeDocument)|[v2.1 (GA)](https://westus.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-1/operations/AnalyzeBusinessCardAsync)|
7174
|----------------|-----------|---|--|---|---|

articles/ai-services/document-intelligence/concept-mortgage-documents.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Document Intelligence US mortgage document
2+
title: Document Intelligence US mortgage documents
33
titleSuffix: Azure AI services
4-
description: Use Document Intelligence mortgage model to analyze and extract key fields from mortgage documents.
4+
description: Use Document Intelligence prebuilt models to analyze and extract key fields from mortgage documents.
55
author: laujan
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
@@ -17,21 +17,18 @@ monikerRange: '>=doc-intel-4.0.0'
1717
<!-- markdownlint-disable MD049 -->
1818
<!-- markdownlint-disable MD001 -->
1919

20-
# Document Intelligence mortgage documents model
20+
# Document Intelligence mortgage document models
2121

2222
**This content applies to:** ![checkmark](media/yes-icon.png) **v4.0 (preview)** ![checkmark](media/yes-icon.png)
2323

24-
The Document Intelligence Mortgage model uses powerful Optical Character Recognition (OCR) capabilities to analyze and extract key fields from mortgage documents. Mortgage documents can be of various formats and quality including. The API analyzes document text from mortgage documents and returns a structured JSON data representation. The model currently supports English-language document formats.
24+
The Document Intelligence Mortgage models use powerful Optical Character Recognition (OCR) capabilities and deep learning models to analyze and extract key fields from mortgage documents. Mortgage documents can be of various formats and quality. The API analyzes mortgage documents and returns a structured JSON data representation. The models currently support English-language documents only.
2525

2626
**Supported document types:**
2727

2828
* 1003 End-User License Agreement (EULA)
2929
* Form 1008
3030
* Mortgage closing disclosure
3131

32-
## Automated mortgage documents processing
33-
34-
Automated mortgage card processing is the process of extracting key fields from bank cards. Historically, bank card analysis process is achieved manually and, hence, very time consuming. Accurate extraction of key data from bank cards s is typically the first and one of the most critical steps in the contract automation process.
3532

3633
## Development options
3734

@@ -48,7 +45,7 @@ Document Intelligence v4.0 (2024-02-29-preview) supports the following tools, ap
4845

4946
[!INCLUDE [input requirements](./includes/input-requirements.md)]
5047

51-
## Try mortgage document data extraction
48+
## Try mortgage documents data extraction
5249

5350
To see how data extraction works for the mortgage documents service, you need the following resources:
5451

0 commit comments

Comments
 (0)