You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ai/conceptual/evaluation-libraries.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
title: The Microsoft.Extensions.AI.Evaluation libraries
3
3
description: Learn about the Microsoft.Extensions.AI.Evaluation libraries, which simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps.
4
4
ms.topic: concept-article
5
-
ms.date: 02/19/2025
5
+
ms.date: 03/18/2025
6
6
---
7
7
# The Microsoft.Extensions.AI.Evaluation libraries (Preview)
8
8
@@ -44,7 +44,7 @@ The library contains support for storing evaluation results and generating repor
44
44
45
45
:::image type="content" source="../media/ai-extensions/pipeline-report.jpg" lightbox="../media/ai-extensions/pipeline-report.jpg" alt-text="Screenshot of an AI evaluation report in an Azure DevOps pipeline.":::
46
46
47
-
The `dotnet aieval` tool, which ships as part of the `Microsoft.Extensions.AI.Evaluation.Console` package, also includes functionality for generating reports and managing the stored evaluation data and cached responses.
47
+
The `dotnet aieval` tool, which ships as part of the `Microsoft.Extensions.AI.Evaluation.Console` package, includes functionality for generating reports and managing the stored evaluation data and cached responses. For more information, see [Generate a report](../tutorials/evaluate-with-reporting.md#generate-a-report).
In this quickstart, you create an MSTest app to evaluate the chat response of a model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries.
11
+
In this quickstart, you create an MSTest app to evaluate the chat response of an OpenAI model. The test app uses the [Microsoft.Extensions.AI.Evaluation](https://www.nuget.org/packages/Microsoft.Extensions.AI.Evaluation) libraries.
12
12
13
13
> [!NOTE]
14
14
> This quickstart demonstrates the simplest usage of the evaluation API. Notably, it doesn't demonstrate use of the [response caching](../conceptual/evaluation-libraries.md#cached-responses) and [reporting](../conceptual/evaluation-libraries.md#reporting) functionality, which are important if you're authoring unit tests that run as part of an "offline" evaluation pipeline. The scenario shown in this quickstart is suitable in use cases such as "online" evaluation of AI responses within production code and logging scores to telemetry, where caching and reporting aren't relevant. For a tutorial that demonstrates the caching and reporting functionality, see [Tutorial: Evaluate a model's response with response caching and reporting](../tutorials/evaluate-with-reporting.md)
15
15
16
16
## Prerequisites
17
17
18
-
-[Install .NET 8.0](https://dotnet.microsoft.com/download) or a later version
19
-
-[Install Ollama](https://ollama.com/) locally on your machine
18
+
-[.NET 8 or a later version](https://dotnet.microsoft.com/download)
20
19
-[Visual Studio Code](https://code.visualstudio.com/) (optional)
21
20
22
-
## Run the local AI model
21
+
## Configure the AI service
23
22
24
-
Complete the following steps to configure and run a local AI model on your device. For this quickstart, you'll use the general purpose `phi3:mini` model, which is a small but capable generative AI created by Microsoft.
25
-
26
-
1. Open a terminal window and verify that Ollama is available on your device:
27
-
28
-
```bash
29
-
ollama
30
-
```
31
-
32
-
If Ollama is available, it displays a list of available commands.
33
-
34
-
1. Start Ollama:
35
-
36
-
```bash
37
-
ollama serve
38
-
```
39
-
40
-
If Ollama is running, it displays a list of available commands.
41
-
42
-
1. Pull the `phi3:mini` model from the Ollama registry and waitfor it to download:
43
-
44
-
```bash
45
-
ollama pull phi3:mini
46
-
```
47
-
48
-
1. After the download completes, run the model:
49
-
50
-
```bash
51
-
ollama run phi3:mini
52
-
```
53
-
54
-
Ollama starts the `phi3:mini` model and provides a prompt for you to interact with it.
23
+
To provision an Azure OpenAI service and model using the Azure portal, complete the steps in the [Create and deploy an Azure OpenAI Service resource](/azure/ai-services/openai/how-to/create-resource?pivots=web-portal) article. In the "Deploy a model" step, select the `gpt-4o` model.
55
24
56
25
## Create the test app
57
26
@@ -66,21 +35,32 @@ Complete the following steps to create an MSTest project that connects to your l
66
35
1. Navigate to the `TestAI` directory, and add the necessary packages to your app:
1. Open the new app inyour editor of choice, such as Visual Studio Code.
48
+
1. Run the following commands to add [app secrets](/aspnet/core/security/app-secrets) for your Azure OpenAI endpoint, model name, and tenant ID:
76
49
77
-
```dotnetcli
78
-
code .
50
+
```bash
51
+
dotnet user-secrets init
52
+
dotnet user-secrets set AZURE_OPENAI_ENDPOINT <your-azure-openai-endpoint>
53
+
dotnet user-secrets set AZURE_OPENAI_GPT_NAME gpt-4o
54
+
dotnet user-secrets set AZURE_TENANT_ID <your-tenant-id>
79
55
```
80
56
57
+
(Depending on your environment, the tenant ID might not be needed. In that case, remove it from the code that instantiates the <xref:Azure.Identity.DefaultAzureCredential>.)
58
+
59
+
1. Open the new app in your editor of choice.
60
+
81
61
## Add the test app code
82
62
83
-
1. Rename the file *Test1.cs* to *MyTests.cs*, and then open the file and rename the class to `MyTests`.
63
+
1. Rename the *Test1.cs* file to *MyTests.cs*, and then open the file and rename the class to `MyTests`.
84
64
1. Add the private <xref:Microsoft.Extensions.AI.Evaluation.ChatConfiguration> and chat message and response members to the `MyTests` class. The `s_messages` field is a list that contains two <xref:Microsoft.Extensions.AI.ChatMessage> objects—one instructs the behavior of the chat bot, and the other is the question from the user.
@@ -95,7 +75,7 @@ Complete the following steps to create an MSTest project that connects to your l
95
75
- Sets the <xref:Microsoft.Extensions.AI.ChatOptions>, including the <xref:Microsoft.Extensions.AI.ChatOptions.Temperature> and the <xref:Microsoft.Extensions.AI.ChatOptions.ResponseFormat>.
96
76
- Fetches the response to be evaluated by calling <xref:Microsoft.Extensions.AI.IChatClient.GetResponseAsync(System.Collections.Generic.IEnumerable{Microsoft.Extensions.AI.ChatMessage},Microsoft.Extensions.AI.ChatOptions,System.Threading.CancellationToken)>, and stores it in a static variable.
97
77
98
-
1. Add the `GetOllamaChatConfiguration` method, which creates the <xref:Microsoft.Extensions.AI.IChatClient> that the evaluator uses to communicate with the model.
78
+
1. Add the `GetAzureOpenAIChatConfiguration` method, which creates the <xref:Microsoft.Extensions.AI.IChatClient> that the evaluator uses to communicate with the model.
@@ -116,4 +96,5 @@ Run the test using your preferred test workflow, for example, by using the CLI c
116
96
117
97
## Next steps
118
98
119
-
Next, try evaluating against different models to see if the results change. Then, check out the extensive examples in the [dotnet/ai-samples repo](https://github.com/dotnet/ai-samples/blob/main/src/microsoft-extensions-ai-evaluation/api/) to see how to invoke multiple evaluators, add additional context, invoke a custom evaluator, attach diagnostics, or change the default interpretation of metrics.
99
+
- Evaluate the responses from different OpenAI models.
100
+
- Add response caching and reporting to your evaluation code. For more information, see [Tutorial: Evaluate a model's response with response caching and reporting](../tutorials/evaluate-with-reporting.md).
0 commit comments