Skip to content

Commit e111c9d

Browse files
author
Aditi
committed
Merge branch 'patch-85' of https://github.com/Additi/azure-ai-docs-pr into patch-85
2 parents 88d627a + 854d7f8 commit e111c9d

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

articles/ai-services/content-understanding/document/enrichments.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "Document analysis with confidence, grounding and in-context learning"
2+
title: "Document analysis with confidence, grounding, and in-context learning"
33
titleSuffix: Azure AI services
4-
description: Learn about Azure AI Content Understanding's value add-ons to improve model extraction quality and performance
4+
description: Learn about Azure AI Content Understanding's value add-ons that improve model extraction quality and performance
55
author: PatrickFarley
66
ms.author: additi
77
manager: nitinme
@@ -13,7 +13,7 @@ ms.custom:
1313
---
1414

1515

16-
# Improve document output quality with confidence, grounding and in-context learning
16+
# Improve document output quality with confidence, grounding, and in-context learning
1717

1818
Intelligent document processing, whether for unstructured documents like contracts and statements of work, or structured documents like invoices and insurance forms, is done for critical information for RAG, search, agentic workflows, and any downstream applications or automation. Extracting this data reliably, at scale, requires more capabilities than just text extraction. Intelligent document processing requires information like what was extracted, why it was extracted, and how reliably it was extracted.
1919

@@ -22,7 +22,7 @@ Most enterprises face the following challenges when handling a variety of docume
2222
- Need to **automate workflows**, but only when the extraction is meeting an accuracy threshold that is critical for the business application. You need to know how confident/accurate the model is in its predictions.
2323
- Need a way to **correct the model without retraining from scratch** (ideally by providing a few labeled examples) when it gets something wrong or encounters a new format.
2424

25-
To address these enterprise needs, Azure AI Content Understanding supports the following features for post-processing your extracted output.
25+
To address these needs, Azure AI Content Understanding supports the following features for post-processing your extracted output.
2626

2727
| Feature | Purpose | Value |
2828
|--------|---------|-------|
@@ -51,14 +51,14 @@ In enterprise workflows, accuracy is not enough; you also need traceability. Whe
5151
You want to extract the *termination clause* from a contract. The model returns:
5252

5353
- **Extracted text**: "Either party may terminate this agreement with 60 days’ notice."
54-
- "spans": [
55-
{
56-
"offset": 343,
57-
"length": 102
58-
}
59-
],
60-
- **Source**:
61-
Page: 3
54+
- "spans": [ <br>
55+
{ <br>
56+
"offset": 343, <br>
57+
"length": 102 <br>
58+
} <br>
59+
] <br>
60+
- **Source**: <br>
61+
Page: 3 <br>
6262
Coordinates: ({x1},{y1},{x2},{y2},{x3},{y3},{x4},{y4})
6363

6464
Span indicates the element's logical position using character offset and length, while source gives its visual position with page number and bounding box coordinates.
@@ -68,7 +68,7 @@ With this grounding data, your legal team can verify the extraction by jumping d
6868

6969
## Confidence scoring: Automate with control
7070

71-
Every extraction field type comes with a confidence score between 0 and 1, indicating how certain the model is about the result. This gives you a tunable point to automate high-confidence results and flag lower-confidence outputs for human reviews.
71+
Every extraction field type comes with a confidence score between 0 and 1, indicating how certain the model is about the result. This number gives you a tunable point to automate high-confidence results and flag lower-confidence outputs for human reviews.
7272

7373
### Why confidence score matters
7474

@@ -81,15 +81,15 @@ Confidence score let you design intelligent workflows, such as:
8181

8282
You're processing scanned utility bills to extract billing address and amount due. For a document:
8383

84-
- **Billing address**: "1234 Market St, San Francisco, CA" → Confidence: 0.96
84+
- **Billing address**: "1234 Market St., San Francisco, CA" → Confidence: 0.96
8585
- **Amount due**: "$128.74" → Confidence: 0.52
8686

8787
In this case, your automation pipeline can post the billing address directly to your downstream application while routing the amount due to a human for verification. By using confidence scores, you reduce manual effort while maintaining accuracy.
8888

8989

9090
## In-context learning: Teach the model by giving examples
9191

92-
If the context for all the fields is clearly provided in the testing document, a zero-shot document extraction call should be sufficient. In-context learning allows you to providing additional labeled examples in Foundry to guide the model’s behavior without the need for retraining or fine-tuning. The model uses these examples to adapt to new formats, naming conventions, or extraction rules by correcting itself.
92+
If the context for all the fields is clearly provided in the testing document, a zero-shot document extraction call should be sufficient. In-context learning allows you to provide extra labeled examples in Foundry to guide the model’s behavior without the need for retraining or fine-tuning. The model uses these examples to adapt to new formats, naming conventions, or extraction rules by correcting itself.
9393

9494
To enhance the model quality:
9595
- For datasets with minimal template variations, you can add just a single labeled example.
@@ -105,16 +105,16 @@ In-context learning helps:
105105
- Rapidly onboard new templates without labeling data within a single analyzer.
106106
- Add samples only when dealing with lower confidence scores or incomplete/partial extraction.
107107

108-
To add a label sample, you can upload a sample under **Label data**, and select **Auto label**. Auto label will predict all the fields out of the box.
108+
To add a label sample, go to a document extraction result page in the Azure AI Foundry portal and select the **Label data** tab. Upload a sample, and select the **Auto label** button. Auto label predicts all the fields out of the box.
109109

110-
:::image type="content" source="../media/document/in-context-learning.png" alt-text="Screenshot of auto labelling an invoice sample.":::
110+
:::image type="content" source="../media/document/in-context-learning.png" lightbox="../media/document/in-context-learning.png" alt-text="Screenshot of auto labeling an invoice sample.":::
111111

112-
Then you can edit the fields by selecting the correct values. Once you save it, it will show with the **corrected** tag for all the extracted fields that were corrected.
112+
Then you can edit the fields by selecting the correct values. Once you save it, it shows with the **corrected** tag for all the extracted fields that were corrected.
113113

114-
:::image type="content" source="../media/document/label-corrected.png" alt-text="Screenshot of corrected labels":::
114+
:::image type="content" source="../media/document/label-corrected.png" lightbox="../media/document/label-corrected.png" alt-text="Screenshot of corrected labels.":::
115115

116116
> [!NOTE]
117-
> Labelled samples can be added in Foundry UX. Once samples are added, you need to build the analyzer again so that samples can take effect. This will not improve any OCR corrections or generative fields output. (Method == `Generate` or `Classify`)
117+
> Labeled samples can be added in the Azure AI Foundry portal. Once samples are added, you need to build the analyzer again so that samples can take effect. This will not improve any OCR corrections or generative fields output. (Method == `Generate` or `Classify`)
118118
119119
### Example
120120

@@ -124,15 +124,15 @@ The model will now refer to this pattern to correctly extract the value in futur
124124

125125
## A complete workflow
126126

127-
For building an intelligent document automation pipeline, these capabilities will help you reliably extract and scale the application. For example: If you want to process procurement contracts, you'll extract:
127+
For building an intelligent document automation pipeline, these capabilities help you reliably extract and scale the application. For example if you want to process procurement contracts, you extract:
128128
- Vendor name
129129
- Start and end dates
130130
- Cancellation clause
131131

132-
To ensure quality and trust, which is critical for enterprise-scale document understanding.:
132+
To ensure quality and trust, which is critical for enterprise-scale document understanding:
133133
- **Grounding** gives your team full traceability to every field.
134134
- **Confidence scores** helps you automate, as human review is needed only when threshold is low.
135-
- **In-context learning** lets your model adapt to new contract templates or handling edge cases using just a few labelled examples.
135+
- **In-context learning** lets your model adapt to new contract templates or handling edge cases using just a few labeled examples.
136136

137137

138138

0 commit comments

Comments
 (0)