Skip to content

Commit 74471c4

Browse files
committed
sync evaluate-sdk file with Main branch
1 parent 548d634 commit 74471c4

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ Built-in evaluators can accept *either* query and response pairs or a list of co
6767
| `RelevanceEvaluator` | Required: String | Required: String | N/A | N/A | Supported for text |
6868
| `CoherenceEvaluator` | Required: String | Required: String | N/A | N/A |Supported for text |
6969
| `FluencyEvaluator` | N/A | Required: String | N/A | N/A |Supported for text |
70+
|`ResponseCompletenessEvaluator` | N/A | Required: String | N/A | Required: String |Not supported |
7071
| `SimilarityEvaluator` | Required: String | Required: String | N/A | Required: String |Not supported |
7172
|`F1ScoreEvaluator` | N/A | Required: String | N/A | Required: String |Not supported |
7273
| `RougeScoreEvaluator` | N/A | Required: String | N/A | Required: String | Not supported |
@@ -294,10 +295,17 @@ For
294295

295296
The result of the AI-assisted quality evaluators for a query and response pair is a dictionary containing:
296297

297-
- `{metric_name}` provides a numerical score.
298-
- `{metric_name}_label` provides a binary label.
298+
- `{metric_name}` provides a numerical score, on a likert scale (integer 1 to 5) or a float between 0-1.
299+
- `{metric_name}_label` provides a binary label (if the metric outputs a binary score naturally).
299300
- `{metric_name}_reason` explains why a certain score or label was given for each data point.
300301

302+
To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
303+
304+
- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
305+
- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
306+
307+
308+
301309
#### Comparing quality and custom evaluators
302310

303311
For NLP evaluators, only a score is given in the `{metric_name}` key.

0 commit comments

Comments
 (0)