sync evaluate-sdk file with Main branch

msakande · msakande · commit 74471c435c53 · 2025-04-04T13:31:23.000-05:00
diff --git a/articles/ai-foundry/how-to/develop/evaluate-sdk.md b/articles/ai-foundry/how-to/develop/evaluate-sdk.md
@@ -67,6 +67,7 @@ Built-in evaluators can accept *either* query and response pairs or a list of co
 | `RelevanceEvaluator`      | Required: String | Required: String | N/A | N/A           | Supported for text |
 | `CoherenceEvaluator`      | Required: String | Required: String | N/A           | N/A           |Supported for text |
 | `FluencyEvaluator`        | N/A  | Required: String | N/A          | N/A           |Supported for text |
+|`ResponseCompletenessEvaluator`  | N/A  | Required: String | N/A           | Required: String |Not supported |
 | `SimilarityEvaluator` | Required: String | Required: String | N/A           | Required: String |Not supported |
 |`F1ScoreEvaluator` | N/A  | Required: String | N/A           | Required: String |Not supported |
 | `RougeScoreEvaluator` | N/A | Required: String | N/A           | Required: String           | Not supported |
@@ -294,10 +295,17 @@ For
 
 The result of the AI-assisted quality evaluators for a query and response pair is a dictionary containing:
 
-- `{metric_name}` provides a numerical score.
-- `{metric_name}_label` provides a binary label.
+- `{metric_name}` provides a numerical score, on a likert scale (integer 1 to 5) or a float between 0-1.
+- `{metric_name}_label` provides a binary label (if the metric outputs a binary score naturally).
 - `{metric_name}_reason` explains why a certain score or label was given for each data point.
 
+To further improve intelligibility, all evaluators accept a binary threshold and output two new keys. For the binarization threshold, a default is set and user can override it. The two new keys are:
+
+- `{metric_name}_result` a "pass" or "fail" string based on a binarization threshold.
+- `{metric_name}_threshold` a numerical binarization threshold set by default or by the user
+
+
+
 #### Comparing quality and custom evaluators
 
 For NLP evaluators, only a score is given in the `{metric_name}` key.