Skip to content

Commit fa6e558

Browse files
authored
Merge pull request #6738 from s-polly/stp_pf_freshness
prompt flow freshness
2 parents 7644101 + 98f6720 commit fa6e558

File tree

4 files changed

+87
-87
lines changed

4 files changed

+87
-87
lines changed

articles/machine-learning/prompt-flow/concept-model-monitoring-generative-ai-evaluation-metrics.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.subservice: mlops
1010
ms.reviewer: scottpolly
1111
reviewer: s-polly
1212
ms.topic: how-to
13-
ms.date: 09/06/2023
13+
ms.date: 08/25/2025
1414
ms.custom:
1515
- devplatv2
1616
- ignite-2023
@@ -22,7 +22,7 @@ ms.custom:
2222
In this article, you learn about the metrics used when monitoring and evaluating generative AI models in Azure Machine Learning, and the recommended practices for using generative AI model monitoring.
2323

2424
> [!IMPORTANT]
25-
> Monitoring is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
25+
> Monitoring is currently in public preview. This preview is provided without a service-level agreement, and isn't recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
2626
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2727
2828
Model monitoring tracks model performance in production and aims to understand it from both data science and operational perspectives. To implement monitoring, Azure Machine Learning uses monitoring signals acquired through data analysis on streamed data. Each monitoring signal has one or more metrics. You can set thresholds for these metrics in order to receive alerts via Azure Machine Learning or Azure Monitor about model or data anomalies.
@@ -46,9 +46,9 @@ The relevance metric measures the extent to which the model's generated response
4646
## Coherence
4747
Coherence evaluates how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language. How well does the bot communicate its messages in a brief and clear way, using simple and appropriate language and avoiding unnecessary or confusing information? How easy is it for the user to understand and follow the bot responses, and how well do they match the user's needs and expectations?
4848
- **Use it when:** You would like to test the readability and user-friendliness of your model's generated responses in real-world applications.
49-
- **How to read it:** If the model's answers are highly coherent, it indicates that the AI system generates seamless, well-structured text with smooth transitions. Consistent context throughout the text enhances readability and understanding. Low coherence means that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text may lack a logical flow, and the sentences may appear disjointed, making it challenging for readers to understand the overall context or intended message. Answers are scored in their clarity, brevity, appropriate language, and ability to match defined user needs and expectations
49+
- **How to read it:** If the model's answers are highly coherent, it indicates that the AI system generates seamless, well-structured text with smooth transitions. Consistent context throughout the text enhances readability and understanding. Low coherence means that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text might lack a logical flow, and the sentences might appear disjointed, making it challenging for readers to understand the overall context or intended message. Answers are scored in their clarity, brevity, appropriate language, and ability to match defined user needs and expectations
5050
- **Scale:**
51-
- 1 = "incoherent": suggests that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text may lack a logical flow, and the sentences may appear disjointed, making it challenging for readers to understand the overall context or intended message.
51+
- 1 = "incoherent": suggests that the quality of the sentences in a model's predicted answer is poor, and they don't fit together naturally. The generated text might lack a logical flow, and the sentences might appear disjointed, making it challenging for readers to understand the overall context or intended message.
5252
- 5 = "perfectly coherent": suggests that the AI system generates seamless, well-structured text with smooth transitions and consistent context throughout the text that enhances readability and understanding.
5353

5454
## Fluency

0 commit comments

Comments
 (0)