|
1 | 1 | ---
|
2 |
| -title: Understanding Confidence Scores in Azure AI Content Understanding. |
| 2 | +title: Understand and improve confidence scores in Azure AI Content Understanding. |
3 | 3 | titleSuffix: Azure AI services
|
4 |
| -description: Best practices to interpret and improve Azure AI Content Understanding accuracy and confidence scores. |
| 4 | +description: Tips for interpreting and improving Azure AI Content Understanding accuracy and confidence scores. |
5 | 5 | author: laujan
|
6 | 6 | ms.author: admaheshwari
|
7 | 7 | manager: nitinme
|
8 | 8 | ms.service: azure-ai-content-understanding
|
9 | 9 | ms.topic: overview
|
10 |
| -ms.date: 02/20/2025 |
| 10 | +ms.date: 02/28/2025 |
11 | 11 | ---
|
12 | 12 |
|
13 |
| -# Interpret and improve accuracy and confidence scores |
| 13 | +# Interpret and improve confidence and accuracy scores |
14 | 14 |
|
15 |
| -A confidence score indicates probability by measuring the degree of statistical certainty that the extracted result is detected correctly. The estimated accuracy is calculated by running a few different combinations of the training data to predict the labeled values. In this article, we share how to interpret accuracy and confidence scores and best practices for using those scores to improve accuracy and confidence results. |
| 15 | +> [!NOTE] |
| 16 | +> |
| 17 | +> Azure AI Content Understanding is currently available in preview. |
| 18 | +> |
| 19 | +> While the service is in active development, confidence scores are only available for the document modality. |
16 | 20 |
|
| 21 | +Confidence scores quantify the probability that a result is accurately detected, by gauging the degree of statistical certainty. The estimated accuracy is derived from evaluating various combinations of the training data to forecast the labeled values. In this article, we share how to interpret accuracy and confidence scores and best practices for using those scores to improve both accuracy and confidence results. |
| 22 | + |
| 23 | +Confidence scores are essential as they indicate the model's level of certainty in its predictions. These scores enable users to assess the reliability of extracted data, guiding whether a human review is necessary. Additionally, confidence scores are instrumental in streamlining workflows and enhancing efficiency by minimizing the need for manual validation. |
17 | 24 |
|
18 |
| -Understanding Confidence Scores |
19 |
| -What are confidence scores? |
20 |
| -Confidence scores represent the probability that the extracted result is correct. For example, a confidence score of 0.95 (95%) suggests that the prediction is likely correct 19 out of 20 times. These scores are derived from various factors, including the quality of the input document, the similarity between the training data and the document being analyzed, and the model's ability to recognize patterns and features in the document. |
21 |
| -Why are confidence scores important? |
22 |
| -Confidence scores are important because they provide a measure of the model's certainty in its predictions. They help users make informed decisions about the reliability of the extracted data and determine whether human review is necessary. Confidence scores also play a crucial role in automating workflows and improving efficiency by reducing the need for manual validation. |
23 |
| -Supported fields |
24 | 25 | Confidence scores are supported for extractive fields, including text, tables for documents and speech transcription. The specific fields supported may vary depending on the model and the use case.
|
25 | 26 |
|
26 |
| -JSON output for documents |
27 |
| -"fields": { |
28 |
| - "ClientProjectManager": { |
29 |
| - "type": "string", |
30 |
| - "valueString": "Nestor Wilke", |
31 |
| - "spans": [ |
32 |
| - { |
33 |
| - "offset": 4345, |
34 |
| - "length": 12 |
35 |
| - } |
36 |
| - ], |
37 |
| - "confidence": 0.964, |
38 |
| - "source": "D(2,3.5486,8.3139,4.2943,8.3139,4.2943,8.4479,3.5486,8.4479)" |
39 |
| - }, |
40 |
| -What are thresholds for confidence scores? |
41 |
| -Thresholds for confidence scores are predefined values that determine whether a prediction is considered reliable or requires further review. These thresholds can be set across different modalities to ensure consistent and accurate results. Setting appropriate thresholds is important because it helps balance the trade-off between automation and accuracy. By setting the right thresholds, users can ensure that only high-confidence predictions are automated, while low-confidence predictions are flagged for human review. This helps improve the overall accuracy and reliability of the predictions |
42 |
| - |
43 |
| -Improving Confidence Scores |
44 |
| -What are some common challenges with confidence scores? |
45 |
| -Common challenges with confidence scores include low-quality input documents, variability in document types, complexity of the documents, and limitations of the model in recognizing certain types of content or features. |
46 |
| -Human in the Loop (HITL) |
47 |
| -What is Human in the Loop (HITL)? |
48 |
| -Human in the Loop (HITL) is a process that involves human intervention in the model's predictions to validate and correct the results. HITL helps improve the accuracy and reliability of the predictions by incorporating human expertise and judgment. HITL helps identify and correct errors, improve the model's performance, and enhance the overall quality of the predictions by human experts intervening only when the confidence scores are below a certain threshold. |
49 |
| -It can improved accuracy and reliability of the predictions, reduced errors, and enhanced overall quality of the results. |
50 |
| - |
51 |
| -How can customers access confidence score in CU? |
52 |
| -For every field extraction, confidence score is listed as part of the field extraction output. You can also check confidence score as part of your JSON output under "confidence" |
| 27 | +## Supported fields |
| 28 | + |
| 29 | +Confidence scores are supported for extractive various fields, including text, tables, and images. The specific fields supported may vary depending on the model and the use case. |
| 30 | + |
| 31 | +## Confidence scores |
| 32 | + |
| 33 | +Confidence scores are listed for every field as part of the field extraction output: |
| 34 | + |
| 35 | + :::image type="content" source="../media/confidence-accuracy/field-extraction-score.png" alt-text="Screenshot of field extraction scores from Azure AI Foundry."::: |
| 36 | + |
| 37 | +Confidence scores are also part of extraction output JSON file: |
| 38 | + |
| 39 | + :::image type="content" source="../media/confidence-accuracy/json-output.png" alt-text="Screenshot of field extraction JSON output."::: |
53 | 40 |
|
54 |
| - Tips to improve confidence score |
55 |
| -1. Correcting an expected output so that the model can understand the definition better. Example: Here we can see the confidence score is 12%, to improve confidence score, we can go to label data, select auto label which will give us predicted field labels. Now we can correct our definition and it will show corrected field label. Test the analyzer again for better confidence score. Here it jumped to 98%. Confidence improvement will vary as per the complexity and nature of document. |
56 |
| - |
57 |
| -2. Adding more samples and label them for different variation and templates the model may expect. |
58 |
| -3. Add documents that contains various input values for the schema you want to extract. |
59 |
| -4. Improve the quality of your input documents. |
60 |
| -5. Incorporate human in the loop for lower confidence results. |
61 |
| -Note: Confidence score is only available for document modality in the preview. For other modalities it will be added soon. |
| 41 | +## Improving accuracy results |
| 42 | + |
| 43 | +Common challenges with confidence scores include the quality of input documents, diversity in document types, complexity of the documents, and limitations of the model to recognize certain types of content or features. These limitations underscore the need for continuous improvements and adaptations in the modeling process to enhance reliability and accuracy. Here are some tips |
| 44 | + |
| 45 | +* **Establish appropriate thresholds**. Setting thresholds can enhance the accuracy and reliability of predictions. These thresholds are predefined values that determine whether a prediction is considered reliable or requires further review. Establishing the right thresholds ensures that only high-confidence predictions are automated, while low-confidence predictions are flagged for human review. This approach helps increases the overall accuracy and reliability of predictions. |
| 46 | + |
| 47 | +* **Incorporate human review into workflows**. Human in the Loop (`HITL`) is a process where human intervention is introduced to validate and correct the model's predictions. Utilizing human expertise and judgment enhances the accuracy and reliability of predictions. `HITL` allows for the identification and correction of errors, improves the model's performance, and elevates the overall quality of predictions by involving human experts only when confidence scores fall below a specified threshold. |
| 48 | + |
| 49 | +* **Include diverse input values for the schema you aim to extract**. To enrich the dataset and account for different variations and templates the model might encounter, use forms with unique values in each field and add labeled samples. |
| 50 | + |
| 51 | +* **Improve the quality of your input documents**. Clear, well-structured forms with consistent formatting typically result in higher confidence scores. |
| 52 | + |
| 53 | +## Related content |
| 54 | + |
| 55 | +* [Best practices for Content Understanding](best-practices.md) |
| 56 | + |
| 57 | +* [Document Intelligence accuracy and confidence scores](../../document-intelligence/concept/accuracy-confidence.md) |
| 58 | + |
| 59 | + |
0 commit comments