Skip to content

Commit 4b66a66

Browse files
authored
Merge pull request #1450 from lgayhardt/eval3
AI Studio Eval: Fix screenshots
2 parents 10351b0 + 4e97561 commit 4b66a66

8 files changed

+10
-8
lines changed

articles/ai-studio/how-to/evaluate-generative-ai-app.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ An evaluation run allows you to generate metric outputs for each data row in you
3333

3434
### From the evaluate page
3535

36-
From the collapsible left menu, select **Evaluation** > **+ New evaluation**.
36+
From the collapsible left menu, select **Evaluation** > **+ Create a new evaluation**.
3737

38-
:::image type="content" source="../media/evaluations/evaluate/evaluation-list-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/evaluation-list-new-evaluation.png":::
38+
:::image type="content" source="../media/evaluations/evaluate/create-new-evaluation.png" alt-text="Screenshot of the button to create a new evaluation." lightbox="../media/evaluations/evaluate/create-new-evaluation.png":::
3939

4040
### From the model catalog page
4141

@@ -45,9 +45,9 @@ From the collapsible left menu, select **Model catalog** > go to specific model
4545

4646
### From the flow page
4747

48-
From the collapsible left menu, select **Prompt flow** > **Evaluate** > **Built-in evaluation**.
48+
From the collapsible left menu, select **Prompt flow** > **Evaluate** > **Automated evaluation**.
4949

50-
:::image type="content" source="../media/evaluations/evaluate/new-evaluation-flow-page.png" alt-text="Screenshot of how to select builtin evaluation." lightbox="../media/evaluations/evaluate/new-evaluation-flow-page.png":::
50+
:::image type="content" source="../media/evaluations/evaluate/automated-evaluation.png" alt-text="Screenshot of how to select builtin evaluation." lightbox="../media/evaluations/evaluate/automated-evaluation.png":::
5151

5252
#### Evaluation target
5353

@@ -97,6 +97,8 @@ We support three types of metrics curated by Microsoft to facilitate a comprehen
9797
- AI quality (NLP): These NLP metrics are mathematical based, and they also evaluate the overall quality of the generated content. They often require ground truth data, but they don't require model deployment as judge.
9898
- Risk and safety metrics: These metrics focus on identifying potential content risks and ensuring the safety of the generated content.
9999

100+
:::image type="content" source="../media/evaluations/evaluate/select-metric-category.png" alt-text="Screenshot of the Choose what you'd like to evaluate with AI quality and safety selected." lightbox="../media/evaluations/evaluate/select-metric-category.png":::
101+
100102
You can refer to the table for the complete list of metrics we offer support for in each scenario. For more in-depth information on each metric definition and how it's calculated, see [Evaluation and monitoring metrics](../concepts/evaluation-metrics-built-in.md).
101103

102104
| AI quality (AI assisted) | AI quality (NLP) | Risk and safety metrics |
@@ -105,11 +107,11 @@ You can refer to the table for the complete list of metrics we offer support for
105107

106108
When running AI assisted quality evaluation, you must specify a GPT model for the calculation process. Choose an Azure OpenAI connection and a deployment with either GPT-3.5, GPT-4, or the Davinci model for our calculations.
107109

108-
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png" alt-text="Screenshot of the AI quality (AI assisted) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png":::
110+
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png" alt-text="Screenshot of the AI quality (AI assisted) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png":::
109111

110112
AI Quality (NLP) metrics are mathematically based measurements that assess your application's performance. They often require ground truth data for calculation. ROUGE is a family of metrics. You can select the ROUGE type to calculate the scores. Various types of ROUGE metrics offer ways to evaluate the quality of text generation. ROUGE-N measures the overlap of n-grams between the candidate and reference texts.
111113

112-
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png" alt-text="Screenshot of the AI quality (NLP) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-ai-assisted.png":::
114+
:::image type="content" source="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png" alt-text="Screenshot of the AI quality (NLP) with groundedness, relevance, and coherence metrics selected when creating a new evaluation." lightbox="../media/evaluations/evaluate/select-metrics-ai-quality-nlp.png":::
113115

114116
For risk and safety metrics, you don't need to provide a connection and deployment. The Azure AI Studio safety evaluations back-end service provisions a GPT-4 model that can generate content risk severity scores and reasoning to enable you to evaluate your application for content harms.
115117

articles/ai-studio/how-to/evaluate-results.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,11 @@ We break down the aggregate views with different types of your metrics by AI Qua
5858
- For AI Quality (NLP) metrics, we show histogram of the metric distribution between 0 and 1. We aggregate by calculating an average across all the scores for each metric.
5959
:::image type="content" source="../media/evaluations/view-results/ai-quality-nlp-chart.png" alt-text="Screenshot of AI Quality (NLP) dashboard tab." lightbox="../media/evaluations/view-results/ai-quality-nlp-chart.png":::
6060
- For custom metrics, you can select **Add custom chart**, to create a custom chart with your chosen metrics or to view a metric against selected input parameters.
61-
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of create custom chart pop up." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
61+
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of create custom chart pop-up." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
6262

6363
You can also customize existing charts for built-in metrics by changing the chart type.
6464

65-
:::image type="content" source="../media/evaluations/view-results/custom-chart-pop-up.png" alt-text="Screenshot of changing chart type." lightbox="../media/evaluations/view-results/custom-chart-pop-up.png":::
65+
:::image type="content" source="../media/evaluations/view-results/change-chart-type.png" alt-text="Screenshot of changing chart type." lightbox="../media/evaluations/view-results/change-chart-type.png":::
6666

6767
### Detailed metrics result table
6868

196 KB
Loading
264 KB
Loading
Binary file not shown.
-672 KB
Loading
Binary file not shown.
165 KB
Loading

0 commit comments

Comments
 (0)