Skip to content

Commit f321cea

Browse files
authored
Merge pull request #6407 from Additi/patch-85
Create add-ons page for confidence, grounding and in context learning
2 parents 3afcef4 + f077f94 commit f321cea

File tree

5 files changed

+140
-0
lines changed

5 files changed

+140
-0
lines changed
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
---
2+
title: "Document analysis with confidence, grounding, and in-context learning"
3+
titleSuffix: Azure AI services
4+
description: Learn about Azure AI Content Understanding's value add-ons that improve model extraction quality and performance
5+
author: PatrickFarley
6+
ms.author: admaheshwari
7+
manager: nitinme
8+
ms.date: 08/11/2025
9+
ms.service: azure-ai-content-understanding
10+
ms.topic: overview
11+
ms.custom:
12+
- build-2025
13+
---
14+
15+
16+
# Improve document output quality with confidence, grounding, and in-context learning
17+
18+
Intelligent document processing, whether for unstructured documents like contracts and statements of work, or structured documents like invoices and insurance forms, is done for critical information for RAG, search, agentic workflows, and any downstream applications or automation. Extracting this data reliably, at scale, requires more capabilities than just text extraction. Intelligent document processing requires information like what was extracted, why it was extracted, and how reliably it was extracted.
19+
20+
Most enterprises face the following challenges when handling a variety of documents at scale:
21+
- Need to **validate the sources** of extracted data for true reference. For example, if the model pulls out a payment term or contract clause, you must know exactly where in the document it came from.
22+
- Need to **automate workflows**, but only when the extraction is meeting an accuracy threshold that is critical for the business application. You need to know how confident/accurate the model is in its predictions.
23+
- Need a way to **correct the model without retraining from scratch** (ideally by providing a few labeled examples) when it gets something wrong or encounters a new format.
24+
25+
To address these needs, Azure AI Content Understanding supports the following features for post-processing your extracted output.
26+
27+
| Feature | Purpose | Value |
28+
|--------|---------|-------|
29+
| **Grounding** | Provides references/citations for every extracted output to the original document content | Ensures traceability, compliance, and user trust |
30+
| **Confidence scoring** | Quantifies the model’s certainty in each prediction through confidence scores. | Drives automation with quality controls |
31+
| **In-context learning** | Teaches the model new patterns using examples and correcting the predicted outputs for incorrect values, improving overall accuracy and extraction quality. | Rapidly adapts to new formats or edge cases |
32+
33+
> [!NOTE]
34+
> These features are only available for the extractive field type. (Method == Extract)
35+
36+
Learn more about these features below.
37+
38+
## Grounding: Trace every result to its source
39+
40+
Grounding ensures that every extracted field, answer, or classification has a reference to its original location in the document. This includes source information: page number and spatial coordinates, and spans: offset and length details.
41+
42+
### Why grounding matters
43+
44+
In enterprise workflows, accuracy is not enough; you also need traceability. When a model extracts a customer name or a termination clause, you must be able to validate where that information came from. Grounding is critical for:
45+
- Maintaining clear traceability and localization of extracted data for any extracted output like clauses, financial numbers, tables, insurance ID, etc.
46+
- Ensure transparency with internal compliance checks.
47+
- Use efficient human-in-the-loop validation from the actual reference source.
48+
49+
### Example
50+
51+
You want to extract the *termination clause* from a contract. The model returns:
52+
53+
- **Extracted text**: "Either party may terminate this agreement with 60 days’ notice."
54+
- "spans": [ <br>
55+
{ <br>
56+
"offset": 343, <br>
57+
"length": 102 <br>
58+
} <br>
59+
] <br>
60+
- **Source**: <br>
61+
Page: 3 <br>
62+
Coordinates: ({x1},{y1},{x2},{y2},{x3},{y3},{x4},{y4})
63+
64+
Span indicates the element's logical position using character offset and length, while source gives its visual position with page number and bounding box coordinates.
65+
66+
With this grounding data, your legal team can verify the extraction by jumping directly to the source paragraph in the PDF. This eliminates guesswork and builds trust in the application output.
67+
68+
69+
## Confidence scoring: Automate with control
70+
71+
Every extraction field type comes with a confidence score between 0 and 1, indicating how certain the model is about the result. This number gives you a tunable point to automate high-confidence results and flag lower-confidence outputs for human reviews.
72+
73+
### Why confidence score matters
74+
75+
Confidence score let you design intelligent workflows, such as:
76+
- Auto-approving extractions when confidence is above a defined threshold to intelligently automate document processing tasks.
77+
- Optimizing resource allocation by reducing operational costs and using human-in-the-loop review for critical aspects.
78+
- Rejecting or flagging extractions below a certain threshold for manual intervention, enhancing decision-making accuracy.
79+
80+
### Example
81+
82+
You're processing scanned utility bills to extract billing address and amount due. For a document:
83+
84+
- **Billing address**: "1234 Market St., San Francisco, CA" → Confidence: 0.96
85+
- **Amount due**: "$128.74" → Confidence: 0.52
86+
87+
In this case, your automation pipeline can post the billing address directly to your downstream application while routing the amount due to a human for verification. By using confidence scores, you reduce manual effort while maintaining accuracy.
88+
89+
90+
## In-context learning: Teach the model by giving examples
91+
92+
If the context for all the fields is clearly provided in the testing document, a zero-shot document extraction call should be sufficient. In-context learning allows you to provide extra labeled examples in Foundry to guide the model’s behavior without the need for retraining or fine-tuning. The model uses these examples to adapt to new formats, naming conventions, or extraction rules by correcting itself.
93+
94+
To enhance the model quality:
95+
- For datasets with minimal template variations, you can add just a single labeled example.
96+
- For more complex variations, add a sample per templates to cover all the scenarios.
97+
98+
99+
### Why in-context learning matters
100+
101+
To manage diverse layout changes across different versions, templates, languages, or regions, help the model learn by adding examples.
102+
103+
In-context learning helps:
104+
- Provide context for the model to understand the meaning of each field by examples and thus improve model accuracy.
105+
- Rapidly onboard new templates without labeling data within a single analyzer.
106+
- Add samples only when dealing with lower confidence scores or incomplete/partial extraction.
107+
108+
To add a label sample, go to a document extraction result page in the Azure AI Foundry portal and select the **Label data** tab. Upload a sample, and select the **Auto label** button. Auto label predicts all the fields out of the box.
109+
110+
:::image type="content" source="../media/document/in-context-learning.png" lightbox="../media/document/in-context-learning.png" alt-text="Screenshot of auto labeling an invoice sample.":::
111+
112+
Then you can edit the fields by selecting the correct values. Once you save it, it shows with the **corrected** tag for all the extracted fields that were corrected.
113+
114+
:::image type="content" source="../media/document/label-corrected.png" lightbox="../media/document/label-corrected.png" alt-text="Screenshot of corrected labels.":::
115+
116+
> [!NOTE]
117+
> Labeled samples can be added in the Azure AI Foundry portal. Once samples are added, you need to build the analyzer again so that samples can take effect. This will not improve any OCR corrections or generative fields output. (Method == `Generate` or `Classify`)
118+
119+
### Example
120+
121+
You start receiving invoices from a new vendor that uses the label "Invoice Total" instead of "Amount Due." The model keeps missing the correct value. Instead of retraining, you can add an example of the different invoice vendor.
122+
123+
The model will now refer to this pattern to correctly extract the value in future similar types of documents, even though it wasn’t part of the original training data.
124+
125+
## A complete workflow
126+
127+
For building an intelligent document automation pipeline, these capabilities help you reliably extract and scale the application. For example if you want to process procurement contracts, you extract:
128+
- Vendor name
129+
- Start and end dates
130+
- Cancellation clause
131+
132+
To ensure quality and trust, which is critical for enterprise-scale document understanding:
133+
- **Grounding** gives your team full traceability to every field.
134+
- **Confidence scores** helps you automate, as human review is needed only when threshold is low.
135+
- **In-context learning** lets your model adapt to new contract templates or handling edge cases using just a few labeled examples.
136+
137+
138+
126 KB
Loading
250 KB
Loading

articles/ai-services/content-understanding/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ items:
5454
- name: Markdown 🆕
5555
displayName: document, text, images, video, audio, visual, structured, content, field, extraction
5656
href: document/markdown.md
57+
- name: Enrichments
58+
href: document/enrichments.md
5759
- name: Image
5860
displayName: image, OCR, optical character recognition, text, extraction, analysis, detection, recognition, model
5961
href: image/overview.md
-158 KB
Binary file not shown.

0 commit comments

Comments
 (0)