You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/concept-model-monitoring-generative-ai-evaluation-metrics.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.subservice: mlops
10
10
ms.reviewer: scottpolly
11
11
reviewer: s-polly
12
12
ms.topic: how-to
13
-
ms.date: 09/06/2023
13
+
ms.date: 08/25/2025
14
14
ms.custom:
15
15
- devplatv2
16
16
- ignite-2023
@@ -22,7 +22,7 @@ ms.custom:
22
22
In this article, you learn about the metrics used when monitoring and evaluating generative AI models in Azure Machine Learning, and the recommended practices for using generative AI model monitoring.
23
23
24
24
> [!IMPORTANT]
25
-
> Monitoring is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
25
+
> Monitoring is currently in public preview. This preview is provided without a service-level agreement, and isn't recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
26
26
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
27
27
28
28
Model monitoring tracks model performance in production and aims to understand it from both data science and operational perspectives. To implement monitoring, Azure Machine Learning uses monitoring signals acquired through data analysis on streamed data. Each monitoring signal has one or more metrics. You can set thresholds for these metrics in order to receive alerts via Azure Machine Learning or Azure Monitor about model or data anomalies.
@@ -46,9 +46,9 @@ The relevance metric measures the extent to which the model's generated response
46
46
## Coherence
47
47
Coherence evaluates how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language. How well does the bot communicate its messages in a brief and clear way, using simple and appropriate language and avoiding unnecessary or confusing information? How easy is it for the user to understand and follow the bot responses, and how well do they match the user's needs and expectations?
48
48
-**Use it when:** You would like to test the readability and user-friendliness of your model's generated responses in real-world applications.
49
-
-**How to read it:** If the model's answers are highly coherent, it indicates that the AI system generates seamless, well-structured text with smooth transitions. Consistent context throughout the text enhances readability and understanding. Low coherence means that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text may lack a logical flow, and the sentences may appear disjointed, making it challenging for readers to understand the overall context or intended message. Answers are scored in their clarity, brevity, appropriate language, and ability to match defined user needs and expectations
49
+
-**How to read it:** If the model's answers are highly coherent, it indicates that the AI system generates seamless, well-structured text with smooth transitions. Consistent context throughout the text enhances readability and understanding. Low coherence means that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text might lack a logical flow, and the sentences might appear disjointed, making it challenging for readers to understand the overall context or intended message. Answers are scored in their clarity, brevity, appropriate language, and ability to match defined user needs and expectations
50
50
-**Scale:**
51
-
- 1 = "incoherent": suggests that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text may lack a logical flow, and the sentences may appear disjointed, making it challenging for readers to understand the overall context or intended message.
51
+
- 1 = "incoherent": suggests that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text might lack a logical flow, and the sentences might appear disjointed, making it challenging for readers to understand the overall context or intended message.
52
52
- 5 = "perfectly coherent": suggests that the AI system generates seamless, well-structured text with smooth transitions and consistent context throughout the text that enhances readability and understanding.
0 commit comments