Skip to content

Commit 7c6daaa

Browse files
committed
confidence and accuracy
1 parent ffb99bd commit 7c6daaa

File tree

3 files changed

+44
-46
lines changed

3 files changed

+44
-46
lines changed
Lines changed: 44 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,59 @@
11
---
2-
title: Understanding Confidence Scores in Azure AI Content Understanding.
2+
title: Understand and improve confidence scores in Azure AI Content Understanding.
33
titleSuffix: Azure AI services
4-
description: Best practices to interpret and improve Azure AI Content Understanding accuracy and confidence scores.
4+
description: Tips for interpreting and improving Azure AI Content Understanding accuracy and confidence scores.
55
author: laujan
66
ms.author: admaheshwari
77
manager: nitinme
88
ms.service: azure-ai-content-understanding
99
ms.topic: overview
10-
ms.date: 02/20/2025
10+
ms.date: 02/28/2025
1111
---
1212

13-
# Interpret and improve accuracy and confidence scores
13+
# Interpret and improve confidence and accuracy scores
1414

15-
A confidence score indicates probability by measuring the degree of statistical certainty that the extracted result is detected correctly. The estimated accuracy is calculated by running a few different combinations of the training data to predict the labeled values. In this article, we share how to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results.
15+
> [!NOTE]
16+
>
17+
> Azure AI Content Understanding is currently available in preview.
18+
>
19+
> While the service is in active development, confidence scores are only available for the document modality.
1620
21+
Confidence scores quantify the probability that a result is accurately detected, by gauging the degree of statistical certainty. The estimated accuracy is derived from evaluating various combinations of the training data to forecast the labeled values. In this article, we share how to interpret accuracy and confidence scores and best practices for using those scores to improve both accuracy and confidence results.
22+
23+
Confidence scores are essential as they indicate the model's level of certainty in its predictions. These scores enable users to assess the reliability of extracted data, guiding whether a human review is necessary. Additionally, confidence scores are instrumental in streamlining workflows and enhancing efficiency by minimizing the need for manual validation.
1724

18-
Understanding Confidence Scores
19-
What are confidence scores?
20-
Confidence scores represent the probability that the extracted result is correct. For example, a confidence score of 0.95 (95%) suggests that the prediction is likely correct 19 out of 20 times. These scores are derived from various factors, including the quality of the input document, the similarity between the training data and the document being analyzed, and the model's ability to recognize patterns and features in the document.
21-
Why are confidence scores important?
22-
Confidence scores are important because they provide a measure of the model's certainty in its predictions. They help users make informed decisions about the reliability of the extracted data and determine whether human review is necessary. Confidence scores also play a crucial role in automating workflows and improving efficiency by reducing the need for manual validation.
23-
Supported fields
2425
Confidence scores are supported for extractive fields, including text, tables for documents and speech transcription. The specific fields supported may vary depending on the model and the use case.
2526

26-
JSON output for documents
27-
"fields": {
28-
"ClientProjectManager": {
29-
"type": "string",
30-
"valueString": "Nestor Wilke",
31-
"spans": [
32-
{
33-
"offset": 4345,
34-
"length": 12
35-
}
36-
],
37-
"confidence": 0.964,
38-
"source": "D(2,3.5486,8.3139,4.2943,8.3139,4.2943,8.4479,3.5486,8.4479)"
39-
},
40-
What are thresholds for confidence scores?
41-
Thresholds for confidence scores are predefined values that determine whether a prediction is considered reliable or requires further review. These thresholds can be set across different modalities to ensure consistent and accurate results. Setting appropriate thresholds is important because it helps balance the trade-off between automation and accuracy. By setting the right thresholds, users can ensure that only high-confidence predictions are automated, while low-confidence predictions are flagged for human review. This helps improve the overall accuracy and reliability of the predictions
42-
43-
Improving Confidence Scores
44-
What are some common challenges with confidence scores?
45-
Common challenges with confidence scores include low-quality input documents, variability in document types, complexity of the documents, and limitations of the model in recognizing certain types of content or features.
46-
Human in the Loop (HITL)
47-
What is Human in the Loop (HITL)?
48-
Human in the Loop (HITL) is a process that involves human intervention in the model's predictions to validate and correct the results. HITL helps improve the accuracy and reliability of the predictions by incorporating human expertise and judgment. HITL helps identify and correct errors, improve the model's performance, and enhance the overall quality of the predictions by human experts intervening only when the confidence scores are below a certain threshold.
49-
It can improved accuracy and reliability of the predictions, reduced errors, and enhanced overall quality of the results.
50-
51-
How can customers access confidence score in CU?
52-
For every field extraction, confidence score is listed as part of the field extraction output. You can also check confidence score as part of your JSON output under "confidence"
27+
## Supported fields
28+
29+
Confidence scores are supported for extractive various fields, including text, tables, and images. The specific fields supported may vary depending on the model and the use case.
30+
31+
## Confidence scores
32+
33+
Confidence scores are listed for every field as part of the field extraction output:
34+
35+
:::image type="content" source="../media/confidence-accuracy/field-extraction-score.png" alt-text="Screenshot of field extraction scores from Azure AI Foundry.":::
36+
37+
Confidence scores are also part of extraction output JSON file:
38+
39+
:::image type="content" source="../media/confidence-accuracy/json-output.png" alt-text="Screenshot of field extraction JSON output.":::
5340

54-
Tips to improve confidence score
55-
1. Correcting an expected output so that the model can understand the definition better. Example: Here we can see the confidence score is 12%, to improve confidence score, we can go to label data, select auto label which will give us predicted field labels. Now we can correct our definition and it will show corrected field label. Test the analyzer again for better confidence score. Here it jumped to 98%. Confidence improvement will vary as per the complexity and nature of document.
56-
57-
2. Adding more samples and label them for different variation and templates the model may expect.
58-
3. Add documents that contains various input values for the schema you want to extract.
59-
4. Improve the quality of your input documents.
60-
5. Incorporate human in the loop for lower confidence results.
61-
Note: Confidence score is only available for document modality in the preview. For other modalities it will be added soon.
41+
## Improving accuracy results
42+
43+
Common challenges with confidence scores include the quality of input documents, diversity in document types, complexity of the documents, and limitations of the model to recognize certain types of content or features. These limitations underscore the need for continuous improvements and adaptations in the modeling process to enhance reliability and accuracy. Here are some tips
44+
45+
* **Establish appropriate thresholds**. Setting thresholds can enhance the accuracy and reliability of predictions. These thresholds are predefined values that determine whether a prediction is considered reliable or requires further review. Establishing the right thresholds ensures that only high-confidence predictions are automated, while low-confidence predictions are flagged for human review. This approach helps increases the overall accuracy and reliability of predictions.
46+
47+
* **Incorporate human review into workflows**. Human in the Loop (`HITL`) is a process where human intervention is introduced to validate and correct the model's predictions. Utilizing human expertise and judgment enhances the accuracy and reliability of predictions. `HITL` allows for the identification and correction of errors, improves the model's performance, and elevates the overall quality of predictions by involving human experts only when confidence scores fall below a specified threshold.
48+
49+
* **Include diverse input values for the schema you aim to extract**. To enrich the dataset and account for different variations and templates the model might encounter, use forms with unique values in each field and add labeled samples.
50+
51+
* **Improve the quality of your input documents**. Clear, well-structured forms with consistent formatting typically result in higher confidence scores.
52+
53+
## Related content
54+
55+
* [Best practices for Content Understanding](best-practices.md)
56+
57+
* [Document Intelligence accuracy and confidence scores](../../document-intelligence/concept/accuracy-confidence.md)
58+
59+
321 KB
Loading
74.2 KB
Loading

0 commit comments

Comments
 (0)