You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -51,7 +50,10 @@ Built-in composite evaluators are composed of individual evaluators.
51
50
-`ContentSafetyEvaluator` combines all the safety evaluators for a single output of combined metrics for question and answer pairs
52
51
-`ContentSafetyChatEvaluator` combines all the safety evaluators for a single output of combined metrics for chat messages following the OpenAI message protocol that can be found [here](https://platform.openai.com/docs/api-reference/messages/object#messages/object-content).
53
52
54
-
### Required data input for built-in evaluators
53
+
> [!TIP]
54
+
> For more information about inputs and outputs, see the [Prompt flow Python reference documentation](https://microsoft.github.io/promptflow/reference/python-library-reference/promptflow-evals/promptflow.evals.evaluators.html).
55
+
56
+
### Data requirements for built-in evaluators
55
57
We require question and answer pairs in `.jsonl` format with the required inputs, and column mapping for evaluating datasets, as follows:
After logging your custom evaluator to your AI project, you can view it in your [Evaluator library](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-generative-ai-app#view-and-manage-the-evaluators-in-the-evaluator-library) under Evaluation tab in AI studio.
189
222
### Prompt-based evaluators
190
223
To build your own prompt-based large language model evaluator, you can create a custom evaluator based on a **Prompty** file. Prompty is a file with `.prompty` extension for developing prompt template. The Prompty asset is a markdown file with a modified front matter. The front matter is in YAML format that contains many metadata fields that define model configuration and expected inputs of the Prompty. Given an example `apology.prompty` file that looks like the following:
191
224
@@ -252,7 +285,23 @@ Here is the result:
252
285
```JSON
253
286
{"apology": 0}
254
287
```
255
-
288
+
#### Log your custom prompt-based evaluator to your AI Studio project
After logging your custom evaluator to your AI project, you can view it in your [Evaluator library](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-generative-ai-app#view-and-manage-the-evaluators-in-the-evaluator-library) under Evaluation tab in AI studio.
256
305
## Evaluate on test dataset using `evaluate()`
257
306
After you spot-check your built-in or custom evaluators on a single row of data, you can combine multiple evaluators with the `evaluate()` API on an entire test dataset. In order to ensure the `evaluate()` can correctly parse the data, you must specify column mapping to map the column from the dataset to key words that are accepted by the evaluators. In this case, we specify the data mapping for `ground_truth`.
258
307
```python
@@ -312,7 +361,9 @@ The evaluator outputs results in a dictionary which contains aggregate `metrics`
312
361
'outputs.relevance.gpt_relevance': 5}],
313
362
'traces': {}}
314
363
```
315
-
### Supported data formats for `evaluate()`
364
+
### Requirements for `evaluate()`
365
+
The `evaluate()` API has a few requirements for the data format that it accepts and how it handles evaluator parameter key names so that the charts in your AI Studio evaluation results show up properly.
366
+
#### Data format
316
367
The `evaluate()` API only accepts data in the JSONLines format. For all built-in evaluators, except for `ChatEvaluator` or `ContentSafetyChatEvaluator`, `evaluate()` requires data in the following format with required input fields. See the [previous section on required data input for built-in evaluators](#required-data-input-for-built-in-evaluators).
317
368
```json
318
369
{
@@ -360,7 +411,7 @@ To `evaluate()` with either the `ChatEvaluator` or `ContentSafetyChatEvaluator`,
360
411
result = evaluate(
361
412
data="data.jsonl",
362
413
evaluators={
363
-
"chatevaluator": chat_evaluator
414
+
"chat": chat_evaluator
364
415
},
365
416
# column mapping for messages
366
417
evaluator_config={
@@ -370,7 +421,36 @@ result = evaluate(
370
421
}
371
422
)
372
423
```
373
-
424
+
#### Evaluator parameter format
425
+
When passing in your built-in evaluators, it is important to specify the right keyword mapping in the `evaluators` parameter list. The following is the keyword mapping required for the results from your built-in evaluators to show up in the UI when logged to Azure AI Studio.
Here's an example of setting the `evaluators` parameters:
443
+
```python
444
+
result = evaluate(
445
+
data="data.jsonl",
446
+
evaluators={
447
+
"sexual":sexual_evaluator
448
+
"self_harm":self_harm_evaluator
449
+
"hate_unfairness":hate_unfairness_evaluator
450
+
"violence":violence_evaluator
451
+
}
452
+
)
453
+
```
374
454
## Evaluate on a target
375
455
376
456
If you have a list of queries that you'd like to run then evaluate, the `evaluate()` also supports a `target` parameter, which can send queries to an application to collect answers then run your evaluators on the resulting question and answers.
@@ -399,4 +479,6 @@ result = evaluate(
399
479
## Related content
400
480
401
481
-[Get started building a chat app using the prompt flow SDK](../../quickstarts/get-started-code.md)
0 commit comments