Skip to content

Commit c27d2c7

Browse files
committed
minor update
1 parent f2d3b09 commit c27d2c7

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

articles/ai-studio/how-to/develop/evaluate-sdk.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Built-in evaluators can accept *either* query and response pairs or a list of co
6161
| Evaluator | `query` | `response` | `context` | `ground_truth` | `conversation` |
6262
|----------------|---------------|---------------|---------------|---------------|-----------|
6363
|`GroundednessEvaluator` | Optional: String | Required: String | Required: String | N/A | Supported for text |
64-
| `GroundednessProEvaluator` | Required: String | Required: String | Required: String | N/A | Supported for text |
64+
| `GroundednessProEvaluator` | Required: String | Required: String | Required: String | N/A | Supported for text |
6565
| `RetrievalEvaluator` | Required: String | N/A | Required: String | N/A | Supported for text |
6666
| `RelevanceEvaluator` | Required: String | Required: String | N/A | N/A | Supported for text |
6767
| `CoherenceEvaluator` | Required: String | Required: String | N/A | N/A |Supported for text |
@@ -79,7 +79,7 @@ Built-in evaluators can accept *either* query and response pairs or a list of co
7979
| `IndirectAttackEvaluator` | Required: String | Required: String | Required: String | N/A |Supported for text |
8080
| `ProtectedMaterialEvaluator` | Required: String | Required: String | N/A | N/A |Supported for text and image |
8181
| `QAEvaluator` | Required: String | Required: String | Required: String | Required: String | Not supported |
82-
| `ContentSafetyEvaluator` | Required: String | Required: String | N/A | N/A | Supported for text and image |
82+
| `ContentSafetyEvaluator` | Required: String | Required: String | N/A | N/A | Supported for text and image |
8383

8484
- Query: the query sent in to the generative AI application
8585
- Response: the response to the query generated by the generative AI application
@@ -295,9 +295,10 @@ The result of the AI-assisted quality evaluators for a query and response pair i
295295
- `{metric_name}_label` provides a binary label.
296296
- `{metric_name}_reason` explains why a certain score or label was given for each data point.
297297

298+
#### Comparing quality and custom evaluators
298299
For NLP evaluators, only a score is given in the `{metric_name}` key.
299300

300-
Like 6 other AI-assisted evaluators, `GroundednessEvaluator` is a prompt-based evaluator that outputs a score on a 5-point scale (the higher the score, the more grounded the result is). On the other hand, `GroundednessProEvaluator` invokes our backend evaluation service powered by Azure AI Content Safety and outputs `True` if all content is grounded, or `False` if any ungrounded content is detected.
301+
Like 6 other AI-assisted evaluators, `GroundednessEvaluator` is a prompt-based evaluator that outputs a score on a 5-point scale (the higher the score, the more grounded the result is). On the other hand, `GroundednessProEvaluator` (preview) invokes our backend evaluation service powered by Azure AI Content Safety and outputs `True` if all content is grounded, or `False` if any ungrounded content is detected.
301302

302303
We open-source the prompts of our quality evaluators except for `GroundednessProEvaluator` (powered by Azure AI Content Safety) for transparency. These prompts serve as instructions for a language model to perform their evaluation task, which requires a human-friendly definition of the metric and its associated scoring rubrics (what the 5 levels of quality mean for the metric). We highly recommend that users customize the definitions and grading rubrics to their scenario specifics. See details in [Custom Evaluators](#custom-evaluators).
303304

0 commit comments

Comments
 (0)