You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/document/enrichments.md
+23-23Lines changed: 23 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: "Document analysis with confidence, grounding and in-context learning"
2
+
title: "Document analysis with confidence, grounding, and in-context learning"
3
3
titleSuffix: Azure AI services
4
-
description: Learn about Azure AI Content Understanding's value add-ons to improve model extraction quality and performance
4
+
description: Learn about Azure AI Content Understanding's value add-ons that improve model extraction quality and performance
5
5
author: PatrickFarley
6
6
ms.author: additi
7
7
manager: nitinme
@@ -13,7 +13,7 @@ ms.custom:
13
13
---
14
14
15
15
16
-
# Improve document output quality with confidence, grounding and in-context learning
16
+
# Improve document output quality with confidence, grounding, and in-context learning
17
17
18
18
Intelligent document processing, whether for unstructured documents like contracts and statements of work, or structured documents like invoices and insurance forms, is done for critical information for RAG, search, agentic workflows, and any downstream applications or automation. Extracting this data reliably, at scale, requires more capabilities than just text extraction. Intelligent document processing requires information like what was extracted, why it was extracted, and how reliably it was extracted.
19
19
@@ -22,7 +22,7 @@ Most enterprises face the following challenges when handling a variety of docume
22
22
- Need to **automate workflows**, but only when the extraction is meeting an accuracy threshold that is critical for the business application. You need to know how confident/accurate the model is in its predictions.
23
23
- Need a way to **correct the model without retraining from scratch** (ideally by providing a few labeled examples) when it gets something wrong or encounters a new format.
24
24
25
-
To address these enterprise needs, Azure AI Content Understanding supports the following features for post-processing your extracted output.
25
+
To address these needs, Azure AI Content Understanding supports the following features for post-processing your extracted output.
26
26
27
27
| Feature | Purpose | Value |
28
28
|--------|---------|-------|
@@ -51,14 +51,14 @@ In enterprise workflows, accuracy is not enough; you also need traceability. Whe
51
51
You want to extract the *termination clause* from a contract. The model returns:
52
52
53
53
-**Extracted text**: "Either party may terminate this agreement with 60 days’ notice."
Span indicates the element's logical position using character offset and length, while source gives its visual position with page number and bounding box coordinates.
@@ -68,7 +68,7 @@ With this grounding data, your legal team can verify the extraction by jumping d
68
68
69
69
## Confidence scoring: Automate with control
70
70
71
-
Every extraction field type comes with a confidence score between 0 and 1, indicating how certain the model is about the result. This gives you a tunable point to automate high-confidence results and flag lower-confidence outputs for human reviews.
71
+
Every extraction field type comes with a confidence score between 0 and 1, indicating how certain the model is about the result. This number gives you a tunable point to automate high-confidence results and flag lower-confidence outputs for human reviews.
72
72
73
73
### Why confidence score matters
74
74
@@ -81,15 +81,15 @@ Confidence score let you design intelligent workflows, such as:
81
81
82
82
You're processing scanned utility bills to extract billing address and amount due. For a document:
In this case, your automation pipeline can post the billing address directly to your downstream application while routing the amount due to a human for verification. By using confidence scores, you reduce manual effort while maintaining accuracy.
88
88
89
89
90
90
## In-context learning: Teach the model by giving examples
91
91
92
-
If the context for all the fields is clearly provided in the testing document, a zero-shot document extraction call should be sufficient. In-context learning allows you to providing additional labeled examples in Foundry to guide the model’s behavior without the need for retraining or fine-tuning. The model uses these examples to adapt to new formats, naming conventions, or extraction rules by correcting itself.
92
+
If the context for all the fields is clearly provided in the testing document, a zero-shot document extraction call should be sufficient. In-context learning allows you to provide extra labeled examples in Foundry to guide the model’s behavior without the need for retraining or fine-tuning. The model uses these examples to adapt to new formats, naming conventions, or extraction rules by correcting itself.
93
93
94
94
To enhance the model quality:
95
95
- For datasets with minimal template variations, you can add just a single labeled example.
@@ -105,16 +105,16 @@ In-context learning helps:
105
105
- Rapidly onboard new templates without labeling data within a single analyzer.
106
106
- Add samples only when dealing with lower confidence scores or incomplete/partial extraction.
107
107
108
-
To add a label sample, you can upload a sample under **Label data**, and select **Auto label**. Auto label will predict all the fields out of the box.
108
+
To add a label sample, go to a document extraction result page in the Azure AI Foundry portal and select the **Label data** tab. Upload a sample, and select the **Auto label** button. Auto label predicts all the fields out of the box.
109
109
110
-
:::image type="content" source="../media/document/in-context-learning.png" alt-text="Screenshot of auto labelling an invoice sample.":::
110
+
:::image type="content" source="../media/document/in-context-learning.png" lightbox="../media/document/in-context-learning.png" alt-text="Screenshot of auto labeling an invoice sample.":::
111
111
112
-
Then you can edit the fields by selecting the correct values. Once you save it, it will show with the **corrected** tag for all the extracted fields that were corrected.
112
+
Then you can edit the fields by selecting the correct values. Once you save it, it shows with the **corrected** tag for all the extracted fields that were corrected.
113
113
114
-
:::image type="content" source="../media/document/label-corrected.png" alt-text="Screenshot of corrected labels":::
114
+
:::image type="content" source="../media/document/label-corrected.png" lightbox="../media/document/label-corrected.png" alt-text="Screenshot of corrected labels.":::
115
115
116
116
> [!NOTE]
117
-
> Labelled samples can be added in Foundry UX. Once samples are added, you need to build the analyzer again so that samples can take effect. This will not improve any OCR corrections or generative fields output. (Method == `Generate` or `Classify`)
117
+
> Labeled samples can be added in the Azure AI Foundry portal. Once samples are added, you need to build the analyzer again so that samples can take effect. This will not improve any OCR corrections or generative fields output. (Method == `Generate` or `Classify`)
118
118
119
119
### Example
120
120
@@ -124,15 +124,15 @@ The model will now refer to this pattern to correctly extract the value in futur
124
124
125
125
## A complete workflow
126
126
127
-
For building an intelligent document automation pipeline, these capabilities will help you reliably extract and scale the application. For example: If you want to process procurement contracts, you'll extract:
127
+
For building an intelligent document automation pipeline, these capabilities help you reliably extract and scale the application. For example if you want to process procurement contracts, you extract:
128
128
- Vendor name
129
129
- Start and end dates
130
130
- Cancellation clause
131
131
132
-
To ensure quality and trust, which is critical for enterprise-scale document understanding.:
132
+
To ensure quality and trust, which is critical for enterprise-scale document understanding:
133
133
-**Grounding** gives your team full traceability to every field.
134
134
-**Confidence scores** helps you automate, as human review is needed only when threshold is low.
135
-
-**In-context learning** lets your model adapt to new contract templates or handling edge cases using just a few labelled examples.
135
+
-**In-context learning** lets your model adapt to new contract templates or handling edge cases using just a few labeled examples.
0 commit comments