Skip to content

Commit fd48a30

Browse files
Merge pull request #46096 from dotnet/main
Merge main into live
2 parents 4d51f31 + 19286ea commit fd48a30

File tree

15 files changed

+149
-232
lines changed

15 files changed

+149
-232
lines changed

.openpublishing.redirection.ai.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@
5555
{
5656
"source_path_from_root": "/docs/ai/quickstarts/quickstart-openai-summarize-text.md",
5757
"redirect_url": "/dotnet/ai/quickstarts/prompt-model"
58+
},
59+
{
60+
"source_path_from_root": "/docs/ai/tutorials/llm-eval.md",
61+
"redirect_url": "/dotnet/ai/quickstarts/evaluate-ai-response"
5862
}
5963
]
6064
}

docs/ai/conceptual/evaluation-libraries.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: The Microsoft.Extensions.AI.Evaluation libraries
33
description: Learn about the Microsoft.Extensions.AI.Evaluation libraries, which simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps.
44
ms.topic: concept-article
5-
ms.date: 03/18/2025
5+
ms.date: 05/09/2025
66
---
77
# The Microsoft.Extensions.AI.Evaluation libraries (Preview)
88

@@ -11,7 +11,8 @@ The Microsoft.Extensions.AI.Evaluation libraries (currently in preview) simplify
1111
The evaluation libraries, which are built on top of the [Microsoft.Extensions.AI abstractions](../microsoft-extensions-ai.md), are composed of the following NuGet packages:
1212

1313
- [📦 Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) – Defines the core abstractions and types for supporting evaluation.
14-
- [📦 Microsoft.Extensions.AI.Evaluation.Quality](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Quality) – Contains evaluators that assess the quality of LLM responses in an app according to metrics such as relevance, fluency, coherence, and truthfulness.
14+
- [📦 Microsoft.Extensions.AI.Evaluation.Quality](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Quality) – Contains evaluators that assess the quality of LLM responses in an app according to metrics such as relevance and completeness. These evaluators use the LLM directly to perform evaluations.
15+
- [📦 Microsoft.Extensions.AI.Evaluation.Safety](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Safety) – Contains evaluators, such as the `ProtectedMaterialEvaluator` and `ContentHarmEvaluator`, that use the [Azure AI Foundry](/azure/ai-foundry/) Evaluation service to perform evaluations.
1516
- [📦 Microsoft.Extensions.AI.Evaluation.Reporting](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Reporting) – Contains support for caching LLM responses, storing the results of evaluations, and generating reports from that data.
1617
- [📦 Microsoft.Extensions.AI.Evaluation.Reporting.Azure](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Reporting.Azure) - Supports the reporting library with an implementation for caching LLM responses and storing the evaluation results in an [Azure Storage](/azure/storage/common/storage-introduction) container.
1718
- [📦 Microsoft.Extensions.AI.Evaluation.Console](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation.Console) – A command-line tool for generating reports and managing evaluation data.
@@ -24,13 +25,25 @@ The libraries are designed to integrate smoothly with existing .NET apps, allowi
2425

2526
The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following table shows the built-in evaluators.
2627

27-
| Metric | Description | Evaluator type |
28-
|------------------------------------|----------------------------------------------|----------------|
29-
| Relevance, truth, and completeness | How effectively a response addresses a query | <xref:Microsoft.Extensions.AI.Evaluation.Quality.RelevanceTruthAndCompletenessEvaluator> |
30-
| Fluency | Grammatical accuracy, vocabulary range, sentence complexity, and overall readability| <xref:Microsoft.Extensions.AI.Evaluation.Quality.FluencyEvaluator> |
31-
| Coherence | The logical and orderly presentation of ideas | <xref:Microsoft.Extensions.AI.Evaluation.Quality.CoherenceEvaluator> |
32-
| Equivalence | The similarity between the generated text and its ground truth with respect to a query | <xref:Microsoft.Extensions.AI.Evaluation.Quality.EquivalenceEvaluator> |
33-
| Groundedness | How well a generated response aligns with the given context | <xref:Microsoft.Extensions.AI.Evaluation.Quality.GroundednessEvaluator> |
28+
| Metric | Description | Evaluator type |
29+
|--------------|--------------------------------------------------------|----------------|
30+
| Relevance | Evaluates how relevant a response is to a query | `RelevanceEvaluator` <!-- <xref:Microsoft.Extensions.AI.Evaluation.Quality.RelevanceEvaluator> --> |
31+
| Completeness | Evaluates how comprehensive and accurate a response is | `CompletenessEvaluator` <!-- <xref:Microsoft.Extensions.AI.Evaluation.Quality.CompletenessEvaluator> --> |
32+
| Retrieval | Evaluates performance in retrieving information for additional context | `RetrievalEvaluator` <!-- <xref:Microsoft.Extensions.AI.Evaluation.Quality.RetrievalEvaluator> --> |
33+
| Fluency | Evaluates grammatical accuracy, vocabulary range, sentence complexity, and overall readability| <xref:Microsoft.Extensions.AI.Evaluation.Quality.FluencyEvaluator> |
34+
| Coherence | Evaluates the logical and orderly presentation of ideas | <xref:Microsoft.Extensions.AI.Evaluation.Quality.CoherenceEvaluator> |
35+
| Equivalence | Evaluates the similarity between the generated text and its ground truth with respect to a query | <xref:Microsoft.Extensions.AI.Evaluation.Quality.EquivalenceEvaluator> |
36+
| Groundedness | Evaluates how well a generated response aligns with the given context | <xref:Microsoft.Extensions.AI.Evaluation.Quality.GroundednessEvaluator><br />`GroundednessProEvaluator` |
37+
| Protected material | Evaluates response for the presence of protected material | `ProtectedMaterialEvaluator` |
38+
| Ungrounded human attributes | Evaluates a response for the presence of content that indicates ungrounded inference of human attributes | `UngroundedAttributesEvaluator` |
39+
| Hate content | Evaluates a response for the presence of content that's hateful or unfair | `HateAndUnfairnessEvaluator`|
40+
| Self-harm content | Evaluates a response for the presence of content that indicates self harm | `SelfHarmEvaluator`|
41+
| Violent content | Evaluates a response for the presence of violent content | `ViolenceEvaluator`|
42+
| Sexual content | Evaluates a response for the presence of sexual content | `SexualEvaluator`|
43+
| Code vulnerability content | Evaluates a response for the presence of vulnerable code | `CodeVulnerabilityEvaluator` |
44+
| Indirect attack content | Evaluates a response for the presence of indirect attacks, such as manipulated content, intrusion, and information gathering | `IndirectAttackEvaluator` |
45+
46+
† In addition, the `ContentHarmEvaluator` provides single-shot evaluation for the four metrics supported by `HateAndUnfairnessEvaluator`, `SelfHarmEvaluator`, `ViolenceEvaluator`, and `SexualEvaluator`.
3447

3548
You can also customize to add your own evaluations by implementing the <xref:Microsoft.Extensions.AI.Evaluation.IEvaluator> interface or extending the base classes such as <xref:Microsoft.Extensions.AI.Evaluation.Quality.ChatConversationEvaluator> and <xref:Microsoft.Extensions.AI.Evaluation.Quality.SingleNumericMetricEvaluator>.
3649

docs/ai/snippets/microsoft-extensions-ai/ConsoleAI.AddMessages/Program.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
client.GetStreamingResponseAsync(history))
3030
{
3131
Console.Write(update);
32+
updates.Add(update);
3233
}
3334
Console.WriteLine();
3435

docs/ai/toc.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,6 @@ items:
8585
href: quickstarts/evaluate-ai-response.md
8686
- name: "Tutorial: Evaluate a response with response caching and reporting"
8787
href: tutorials/evaluate-with-reporting.md
88-
- name: "Tutorial: Evaluate LLM prompt completions"
89-
href: tutorials/llm-eval.md
9088
- name: Resources
9189
items:
9290
- name: API reference

docs/ai/tutorials/evaluate-with-reporting.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Tutorial - Evaluate a model's response
33
description: Create an MSTest app and add a custom evaluator to evaluate the AI chat response of a language model, and learn how to use the caching and reporting features of Microsoft.Extensions.AI.Evaluation.
4-
ms.date: 03/14/2025
4+
ms.date: 05/09/2025
55
ms.topic: tutorial
66
ms.custom: devx-track-dotnet-ai
77
---
88

99
# Tutorial: Evaluate a model's response with response caching and reporting
1010

11-
In this tutorial, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries to perform the evaluations, cache the model responses, and create reports. The tutorial uses both a [built-in evaluator](xref:Microsoft.Extensions.AI.Evaluation.Quality.RelevanceTruthAndCompletenessEvaluator) and a custom evaluator.
11+
In this tutorial, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries to perform the evaluations, cache the model responses, and create reports. The tutorial uses both built-in and custom evaluators.
1212

1313
## Prerequisites
1414

@@ -25,32 +25,32 @@ Complete the following steps to create an MSTest project that connects to the `g
2525

2626
1. In a terminal window, navigate to the directory where you want to create your app, and create a new MSTest app with the `dotnet new` command:
2727

28-
```dotnetcli
29-
dotnet new mstest -o TestAIWithReporting
30-
```
28+
```dotnetcli
29+
dotnet new mstest -o TestAIWithReporting
30+
```
3131

3232
1. Navigate to the `TestAIWithReporting` directory, and add the necessary packages to your app:
3333

34-
```dotnetcli
35-
dotnet add package Azure.AI.OpenAI
36-
dotnet add package Azure.Identity
37-
dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease
38-
dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
39-
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease
40-
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease
41-
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
42-
dotnet add package Microsoft.Extensions.Configuration
43-
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
44-
```
34+
```dotnetcli
35+
dotnet add package Azure.AI.OpenAI
36+
dotnet add package Azure.Identity
37+
dotnet add package Microsoft.Extensions.AI.Abstractions --prerelease
38+
dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
39+
dotnet add package Microsoft.Extensions.AI.Evaluation.Quality --prerelease
40+
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease
41+
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
42+
dotnet add package Microsoft.Extensions.Configuration
43+
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
44+
```
4545

4646
1. Run the following commands to add [app secrets](/aspnet/core/security/app-secrets) for your Azure OpenAI endpoint, model name, and tenant ID:
4747

48-
```bash
49-
dotnet user-secrets init
50-
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
51-
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
52-
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
53-
```
48+
```bash
49+
dotnet user-secrets init
50+
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
51+
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
52+
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
53+
```
5454

5555
(Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the <xref:Azure.Identity.DefaultAzureCredential>.)
5656

0 commit comments

Comments
 (0)