You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-accuracy-confidence.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,11 +18,11 @@ ms.author: lajanuar
18
18
19
19
> [!NOTE]
20
20
>
21
-
> ***Custom neural models do not provide accuracy scores during training**.
22
-
> * Confidence scores for structured fields such as tables are currently unavailable.
21
+
> ***Custom neural models** do not provide accuracy scores during training.
22
+
> * Confidence scores for tables, table rows and table cells are available starting with the **2024-02-29-preview** API version for **custom models**.
23
23
24
24
25
-
Custom models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. In this article, learn to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
25
+
Custom template models generate an estimated accuracy score when trained. Documents analyzed with a custom model produce a confidence score for extracted fields. In this article, learn to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
26
26
27
27
## Accuracy scores
28
28
@@ -38,21 +38,25 @@ The accuracy value range is a percentage between 0% (low) and 100% (high). The e
38
38
39
39
> [!NOTE]
40
40
>
41
-
> ***Table cell confidence scores are now included with the 2024-02-29-preview API version**.
41
+
> ***Table, row and cell confidence scores are now included with the 2024-02-29-preview API version**.
42
42
> * Confidence scores for table cells from custom models is added to the API starting with the 2024-02-29-preview API.
43
43
44
44
Document Intelligence analysis results return an estimated confidence for predicted words, key-value pairs, selection marks, regions, and signatures. Currently, not all document fields return a confidence score.
45
45
46
46
Field confidence indicates an estimated probability between 0 and 1 that the prediction is correct. For example, a confidence value of 0.95 (95%) indicates that the prediction is likely correct 19 out of 20 times. For scenarios where accuracy is critical, confidence can be used to determine whether to automatically accept the prediction or flag it for human review.
47
47
48
-
Confidence scores have two data points: the field level confidence score and the text extraction confidence score. In addition to the field confidence of position and span, the text extraction confidence in the ```pages``` section of the response is the model's confidence in the text extraction (OCR) process. The two confidence scores should be combined to generate one overall confidence score.
49
-
50
48
**Document Intelligence Studio** </br>
51
49
**Analyzed invoice prebuilt-invoice model**
52
50
53
51
:::image type="content" source="media/accuracy-confidence/confidence-scores.png" alt-text="confidence scores from Document Intelligence Studio":::
54
52
55
-
## Interpret accuracy and confidence scores
53
+
## Interpret accuracy and confidence scores for custom models
54
+
55
+
When interpreting the confidence score from a custom model, you should consider all the confidence scores returned from the model. Let's start with a list of all the confidence scores.
56
+
1.**Document type confidence score**: The document type confidence is an indicator of closely the analyzed document resembleds documents in the training dataset. When the document type confidence is low, this is indicative of template or structural variations in the analyzed document. To improve the document type confidence, label a document with that specific variation and add it to your training dataset. Once the model is re-trained, it should be better equipped to handl that class of variations.
57
+
2.**Field level confidence**: Each labled field extracted has an associated confidence score. This score reflects the model's confidence on the position of the value extracted. While evaluating the confidence you should also look at the underlying extraction confidence to generate a comprehensive confidence for the extracted result. Evaluate the OCR results for text extraction or selection marks depending on the field type to generate a composite confidence score for the field.
58
+
3.**Word confidence score** Each word extracted within the document has an associated confidence score. The score represents the confidence of the transcription. The pages array contains an array of words, each word has an associated span and confidence. Spans from the custom field extracted values will match the spans of the extracted words.
59
+
4.**Selection mark confidence score**: The pages array also contains an array of selection marks, each selection mark has a confidence score representing the confidence of the seletion mark and selection state detection. When a labeled field is a selection mark, the custom field selection confidence combined with the selection mark confidence is an accurate representation of the overall confidence that the field was extracted correctly.
56
60
57
61
The following table demonstrates how to interpret both the accuracy and confidence scores to measure your custom model's performance.
58
62
@@ -65,7 +69,7 @@ The following table demonstrates how to interpret both the accuracy and confiden
65
69
66
70
## Table, row, and cell confidence
67
71
68
-
With the addition of table, row and cell confidence with the ```2024-02-29-preview``` API, here are some common questions that should help with interpreting the scores:
72
+
With the addition of table, row and cell confidence with the ```2024-02-29-preview``` API, here are some common questions that should help with interpreting the table, row and cell scores:
69
73
70
74
**Q:** Is it possible to see a high confidence score for cells, but a low confidence score for the row?<br>
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-custom-classifier.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ With custom models, you need to maintain access to the training dataset to updat
66
66
67
67
> [!IMPORTANT]
68
68
>
69
-
> Incremental trainiing is only supported with models trained with the same API version. If you are trying to extend a model, use the API version the original model was trained with to extend the model.
69
+
> Incremental training is only supported with models trained with the same API version. If you are trying to extend a model, use the API version the original model was trained with to extend the model. Incremental training is only supported with API version **2024-02-29-preview** or later.
70
70
71
71
Incremental training requires that you provide the original model ID as the `baseClassifierId`. See [incremental training](concept-incremental-classifier.md) to learn more about how to use incremental training.
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-custom-neural.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,24 @@ Neural models support documents that have the same information, but different pa
65
65
66
66
*See* our [Language Support—custom models](language-support-custom.md) page for a complete list of supported languages.
67
67
68
-
## Tabular fields
68
+
## Overlapping fields
69
+
70
+
With the release of API versions **2024-02-29-preview** and later, custom neural models will support overlapping fields:
71
+
72
+
To use the overlapping fields, your dataset needs to contain at least one sample with the expected overlap. To label an overlap, use **region labeling** to designate each of the spans of content (with the overlap) for each field. Labeling an overlap with field selection (highlighting a value) will fail in the studio as region labeling is the only supported labeling tool for indicating field overlaps. Overlap support includes:
73
+
74
+
* Complete overlap. The same set of tokens are labeled for two different fields.
75
+
* Partial overlap. Some tokens belong to both fields, but there are tokens that are only part of one field or the other.
76
+
77
+
Overlapping fields have some limits:
78
+
79
+
* Any token or word can only be labeled as two fields.
80
+
* overlapping fields in a table can't span table rows.
81
+
* Overlapping fields can only be recognized if at least one sample in the dataset contains overlapping labels for those fields.
82
+
83
+
To use overlapping fields, label your dataset with the overlaps and train the model with the API version ```2024-02-29-preview``` or later.
84
+
85
+
## Tabular fields adds table, row and cell confidence
69
86
70
87
With the release of API versions **2022-06-30-preview** and later, custom neural models will support tabular fields (tables):
71
88
@@ -92,23 +109,6 @@ Tabular fields provide **table, row and cell confidence** starting with the ```2
92
109
93
110
See [confidence and accuracy scores](concept-accuracy-confidence.md) to learn more about table, row, and cell confidence.
94
111
95
-
## Overlapping fields
96
-
97
-
With the release of API versions **2024-02-29-preview** and later, custom neural models will support overlapping fields:
98
-
99
-
To use the overlapping fields, your dataset needs to contain at least one sample with the expected overlap. To label an overlap, use region labeling to designate each of the spans of content (with the overlap) for each field. Overlap support includes:
100
-
101
-
* Complete overlap. The same set of tokens are labeled for two different fields.
102
-
* Partial overlap. Some tokens belong to both fields, but there are tokens that are only part of one field or the other.
103
-
104
-
Overlapping fields have some limits:
105
-
106
-
* Any token or word can only be labeled as two fields.
107
-
* overlapping fields in a table can't span table rows.
108
-
* Overlapping fields can only be recognized if at least one sample in the dataset contains overlapping labels for those fields.
109
-
110
-
To use overlapping fields, label your dataset with the overlaps and train the model with the API version ```2024-02-29-preview``` or later.
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-custom.md
+5-26Lines changed: 5 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ To create a custom extraction model, label a dataset of documents with the value
51
51
52
52
> [!IMPORTANT]
53
53
>
54
-
> Starting with version 3.1—2024-02-29-preview API, custom neural models now support overlapping fields and table, row and cell level confidence.
54
+
> Starting with version 4.0 — 2024-02-29-preview API, custom neural models now support **overlapping fields** and **table, row and cell level confidence**.
55
55
>
56
56
57
57
The custom neural (custom document) model uses deep learning models and base model trained on a large collection of documents. This model is then fine-tuned or adapted to your data when you train the model with a labeled dataset. Custom neural models support structured, semi-structured, and unstructured documents to extract fields. Custom neural models currently support English-language documents. When you're choosing between the two model types, start with a neural model to determine if it meets your functional needs. See [neural models](concept-custom-neural.md) to learn more about custom document models.
@@ -219,10 +219,10 @@ For a detailed walkthrough to create your first custom extraction model, *see* [
219
219
220
220
This table compares the supported data extraction areas:
221
221
222
-
|Model| Form fields | Selection marks | Structured fields (Tables) | Signature | Region labeling |
223
-
|--|:--:|:--:|:--:|:--:|:--:|
224
-
|Custom template| ✔ | ✔ | ✔ | ✔ | ✔ |
225
-
|Custom neural| ✔| ✔ | ✔ |**n/a**|*****|
222
+
|Model| Form fields | Selection marks | Structured fields (Tables) | Signature | Region labeling | Overlapping fields |
@@ -268,27 +268,6 @@ The following table describes the features available with the associated tools a
268
268
269
269
*See* our [Language Support—custom models](language-support-custom.md) page for a complete list of supported languages.
270
270
271
-
### Try signature detection
272
-
273
-
***Custom model v4.0, v3.1 and v3.0 APIs** supports signature detection for custom forms. When you train custom models, you can specify certain fields as signatures. When a document is analyzed with your custom model, it indicates whether a signature was detected or not.
274
-
*[Document Intelligence v3.1 migration guide](v3-1-migration-guide.md): This guide shows you how to use the v3.0 version in your applications and workflows.
275
-
*[REST API](/rest/api/aiservices/document-models/analyze-document?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP): This API shows you more about the v3.0 version and new capabilities.
276
-
277
-
1. Build your training dataset.
278
-
279
-
1. Go to [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio). Under **Custom models**, select **Custom form**.
280
-
281
-
:::image type="content" source="media/label-tool/select-custom-form.png" alt-text="Screenshot that shows selecting the Document Intelligence Studio Custom form page.":::
282
-
283
-
1. Follow the workflow to create a new project:
284
-
285
-
* Follow the **Custom model** input requirements.
286
-
287
-
* Label your documents. For signature fields, use **Region** labeling for better accuracy.
288
-
289
-
:::image type="content" source="media/label-tool/signature-label-region-too.png" alt-text="Screenshot that shows the Label signature field.":::
290
-
291
-
After your training set is labeled, you can train your custom model and use it to analyze documents. The signature fields specify whether a signature was detected or not.
The Document Intelligence Mortgage model uses powerful Optical Character Recognition (OCR) capabilities to analyze and extract key fields from mortgage documents. Mortgage documents can be of various formats and quality including. The API analyzes document text from mortgage documents and returns a structured JSON data representation. The model currently supports English-language document formats.
24
+
The Document Intelligence Mortgage models use powerful Optical Character Recognition (OCR) capabilities and deep learning models to analyze and extract key fields from mortgage documents. Mortgage documents can be of various formats and quality. The API analyzes mortgage documents and returns a structured JSON data representation. The models currently support English-language documents only.
25
25
26
26
**Supported document types:**
27
27
28
28
* 1003 End-User License Agreement (EULA)
29
29
* Form 1008
30
30
* Mortgage closing disclosure
31
31
32
-
## Automated mortgage documents processing
33
-
34
-
Automated mortgage card processing is the process of extracting key fields from bank cards. Historically, bank card analysis process is achieved manually and, hence, very time consuming. Accurate extraction of key data from bank cards s is typically the first and one of the most critical steps in the contract automation process.
35
32
36
33
## Development options
37
34
@@ -48,7 +45,7 @@ Document Intelligence v4.0 (2024-02-29-preview) supports the following tools, ap
0 commit comments