You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The event name MUST be `gen_ai.evaluation.result`.
225
+
226
+
This event captures the result of evaluating GenAI output for quality, accuracy, or other characteristics. This event SHOULD be parented to GenAI operation span being evaluated when possible or set `gen_ai.response.id` when span id is not available.
|[`gen_ai.evaluation.name`](/docs/registry/attributes/gen-ai.md)| string | The name of the evaluation metric used for the GenAI response. |`Relevance`; `IntentResolution`|`Required`||
231
+
|[`error.type`](/docs/registry/attributes/error.md)| string | Describes a class of error the operation ended with. [1]|`timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500`|`Conditionally Required` if the operation ended in an error ||
232
+
|[`gen_ai.evaluation.score.label`](/docs/registry/attributes/gen-ai.md)| string | Human readable label for evaluation. [2]|`relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail`|`Conditionally Required` if applicable ||
233
+
|[`gen_ai.evaluation.score.value`](/docs/registry/attributes/gen-ai.md)| double | The evaluation score returned by the evaluator. |`4.0`|`Conditionally Required` if applicable ||
234
+
|[`gen_ai.evaluation.explanation`](/docs/registry/attributes/gen-ai.md)| string | A free-form explanation for the assigned score provided by the evaluator. |`The response is factually accurate but lacks sufficient detail to fully address the question.`|`Recommended`||
235
+
|[`gen_ai.response.id`](/docs/registry/attributes/gen-ai.md)| string | The unique identifier for the completion. [3]|`chatcmpl-123`|`Recommended` when available ||
236
+
237
+
**[1]`error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
238
+
the canonical name of exception that occurred, or another low-cardinality error identifier.
239
+
Instrumentations SHOULD document the list of errors they report.
240
+
241
+
**[2]`gen_ai.evaluation.score.label`:** This attribute provides a human-readable interpretation of the evaluation score produced by an evaluator. For example, a score value of 1 could mean "relevant" in one evaluation system and "not relevant" in another, depending on the scoring range and evaluator. The label SHOULD have low cardinality. Possible values depend on the evaluation metric and evaluator used; implementations SHOULD document the possible values.
242
+
243
+
**[3]`gen_ai.response.id`:** The unique identifier assigned to the specific
244
+
completion being evaluated. This attribute helps correlate the evaluation
245
+
event with the corresponding operation when span id is not available.
246
+
247
+
---
248
+
249
+
`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
250
+
251
+
| Value | Description | Stability |
252
+
|---|---|---|
253
+
|`_OTHER`| A fallback error value to be used when the instrumentation doesn't define a custom value. ||
0 commit comments