You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prepare the data according to the [input data requirements forbuilt-in evaluators](./evaluate-sdk.md#data-requirements-for-built-in-evaluators). For examplein text evaluation, prepare a `"./evaluate_test_data.jsonl"` file that contains single-turn data inputs like this:
73
+
```json
74
+
{"query":"What is the capital of France?","response":"Paris."}
75
+
{"query":"What atoms compose water?","response":"Hydrogen and oxygen."}
76
+
{"query":"What color is my shirt?","response":"Blue."}
77
+
```
78
+
or contains conversation data like this:
79
+
```json
80
+
{"conversation":
81
+
{
82
+
"messages": [
83
+
{
84
+
"content": "Which tent is the most waterproof?",
85
+
"role": "user"
86
+
},
87
+
{
88
+
"content": "The Alpine Explorer Tent is the most waterproof",
89
+
"role": "assistant",
90
+
"context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight."
91
+
},
92
+
{
93
+
"content": "How much does it cost?",
94
+
"role": "user"
95
+
},
96
+
{
97
+
"content": "The Alpine Explorer Tent is $120.",
98
+
"role": "assistant",
99
+
"context": null
100
+
}
101
+
]
102
+
}
103
+
}
104
+
```
105
+
106
+
To learn more about input data formats for evaluating GenAI applications, see [single-turn data](./evaluate-sdk.md#single-turn-support-for-text), [conversation data](./evaluate-sdk.md#conversation-support-for-text), and [conversation data for images and multi-modalities](./evaluate-sdk.md#conversation-support-for-images-and-multi-modal-text-and-image).
107
+
108
+
To learn more about input data formats for evaluating agents, see [evaluating Azure AI agents](./agent-evaluate-sdk.md#evaluate-azure-ai-agents) and [evaluating other agents](./agent-evaluate-sdk.md#evaluating-other-agents).
109
+
110
+
72
111
We provide two ways to register your data in Azure AI project required forevaluationsin the cloud:
73
112
74
-
1. **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result:
113
+
- Uploading new datasets to your Project:
114
+
115
+
- **From SDK**: Upload new data from your local directory to your Azure AI project in the SDK, and fetch the dataset ID as a result.
**From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
121
+
- **From UI**: Alternatively, you can upload new data or update existing data versions by following the UI walkthrough under the **Data** tab of your Azure AI project.
81
122
82
-
2. Given existing datasets uploaded to your Project:
123
+
- Specifying existing datasets uploaded to your Project:
83
124
84
125
- **From SDK**: if you already know the dataset name you created, construct the dataset ID in this format: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>/data/<dataset-name>/versions/<version-number>`
After logging your custom evaluator to your Azure AI project, you can view it in your [Evaluator library](../evaluate-generative-ai-app.md#view-and-manage-the-evaluators-in-the-evaluator-library) under **Evaluation** tab of your Azure AI project.
197
238
198
-
## Cloud evaluation (preview) with Azure AI Projects SDK
239
+
## Submit a cloud evaluation
199
240
200
-
You can now submit a cloud evaluation with Azure AI Projects SDK via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
241
+
Putting the previous code altogether, you can now submit a cloud evaluation with Azure AI Projects SDK client library via a Python API. See the following example specifying an NLP evaluator (F1 score), AI-assisted quality and safety evaluator (Relevance and Violence), and a custom evaluator (Friendliness) with their [evaluator IDs](#specifying-evaluators-from-evaluator-library):
Now you can use the URI to view your evaluation results in your Azure AI project, in order to better assess the quality and safety performance of your applications.
319
+
320
+
Following the URI, you will be redirected to Foundry to view your evaluation results in your Azure AI project and debug your application. Using reason fields and pass/fail, you will be able to better assess the quality and safety performance of your applications. You can run and compare multiple runs to test for regression or improvements.
0 commit comments