You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> If only risk and safety metrics are passed into `metrics_list` then the `model_config` parameter in the interface below is optional. The Azure AI Studio safety evaluations back-end service provisions a GPT-4 model that can generate content risk severity scores and reasoning to enable you to evaluate your application for content harms.
144
+
> If only risk and safety metrics are passed into `metrics_list` then the `model_config` parameter in the following interface is optional. The Azure AI Studio safety evaluations back-end service provisions a GPT-4 model that can generate content risk severity scores and reasoning to enable you to evaluate your application for content harms.
145
145
146
146
### Evaluate question answering: `qa`
147
147
@@ -243,7 +243,7 @@ The contents of `eval_results.jsonl` looks like this:
243
243
244
244
The outputs of your risk and safety metrics will provide the following:
245
245
246
-
-`{metric_name}_defect_rate` which measures % of instances which surpassed the severity threshold (set to default 4) and is the aggregate metric over the whole dataset.
246
+
-`{metric_name}_defect_rate`, which measures % of instances that surpassed the severity threshold (set to default 4) and is the aggregate metric over the whole dataset.
247
247
-`{metric_name}_score` with a range between 0 and 7 severity for each data point. You can read more about the descriptions of each [content risk and severity scale](../../../concepts/evaluation-metrics-built-in.md).
248
248
-`{metric_name}_reasoning` with a text reasoning for why a certain severity score was given for each data point.
The same interface can be used with `evaluate()` for the conversation scenario but with data mapping required only for model output `y_pred` and `task_type="chat"` shown below
261
+
The same interface can be used with `evaluate()` for the conversation scenario but with data mapping required only for model output `y_pred` and `task_type="chat"`.
0 commit comments