open-telemetry
diff --git a/‎.chloggen/users_singankit_gen_ai_evaluation_result_event.yaml‎
Lines changed: 6 additions & 0 deletions b/‎.chloggen/users_singankit_gen_ai_evaluation_result_event.yaml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/gen-ai/gen-ai-events.md‎
Lines changed: 48 additions & 0 deletions b/‎docs/gen-ai/gen-ai-events.md‎
Lines changed: 48 additions & 0 deletions
@@ -0,0 +1,6 @@
+change_type: enhancement
+component: gen-ai
+note: |
+  Introducing `Evaluation Event` in GenAI Semantic Conventions to represent and capture evaluation results.
+
+issues: [2563]
@@ -9,6 +9,7 @@ linkTitle: Events
 <!-- toc -->
 
 - [Event: `event.gen_ai.client.inference.operation.details`](#event-eventgen_aiclientinferenceoperationdetails)
+- [Event: `event.gen_ai.evaluation.result`](#event-eventgen_aievaluationresult)
 
 <!-- tocstop -->
 
@@ -209,4 +210,51 @@ section for more details.
 <!-- END AUTOGENERATED TEXT -->
 <!-- endsemconv -->
 
+## Event: `event.gen_ai.evaluation.result`
+
+<!-- semconv event.gen_ai.evaluation.result -->
+<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
+<!-- see templates/registry/markdown/snippet.md.j2 -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-capture -->
+<!-- markdownlint-disable -->
+
+**Status:** ![Development](https://img.shields.io/badge/-development-blue)
+
+The event name MUST be `gen_ai.evaluation.result`.
+
+This event captures the result of evaluating GenAI output for quality, accuracy, or other characteristics. This event SHOULD be parented to GenAI operation span being evaluated when possible or set `gen_ai.response.id` when span id is not available.
+
+| Attribute  | Type | Description  | Examples  | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
+|---|---|---|---|---|---|
+| [`gen_ai.evaluation.name`](/docs/registry/attributes/gen-ai.md) | string | The name of the evaluation metric used for the GenAI response. | `Relevance`; `IntentResolution` | `Required` | ![Development](https://img.shields.io/badge/-development-blue) |
+| [`error.type`](/docs/registry/attributes/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+| [`gen_ai.evaluation.score.label`](/docs/registry/attributes/gen-ai.md) | string | Human readable label for evaluation. [2] | `relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail` | `Conditionally Required` if applicable | ![Development](https://img.shields.io/badge/-development-blue) |
+| [`gen_ai.evaluation.score.value`](/docs/registry/attributes/gen-ai.md) | double | The evaluation score returned by the evaluator. | `4.0` | `Conditionally Required` if applicable | ![Development](https://img.shields.io/badge/-development-blue) |
+| [`gen_ai.evaluation.explanation`](/docs/registry/attributes/gen-ai.md) | string | A free-form explanation for the assigned score provided by the evaluator. | `The response is factually accurate but lacks sufficient detail to fully address the question.` | `Recommended` | ![Development](https://img.shields.io/badge/-development-blue) |
+| [`gen_ai.response.id`](/docs/registry/attributes/gen-ai.md) | string | The unique identifier for the completion. [3] | `chatcmpl-123` | `Recommended` when available | ![Development](https://img.shields.io/badge/-development-blue) |
+
+**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
+the canonical name of exception that occurred, or another low-cardinality error identifier.
+Instrumentations SHOULD document the list of errors they report.
+
+**[2] `gen_ai.evaluation.score.label`:** This attribute provides a human-readable interpretation of the evaluation score produced by an evaluator. For example, a score value of 1 could mean "relevant" in one evaluation system and "not relevant" in another, depending on the scoring range and evaluator. The label SHOULD have low cardinality. Possible values depend on the evaluation metric and evaluator used; implementations SHOULD document the possible values.
+
+**[3] `gen_ai.response.id`:** The unique identifier assigned to the specific
+completion being evaluated. This attribute helps correlate the evaluation
+event with the corresponding operation when span id is not available.
+
+---
+
+`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.
+
+| Value  | Description | Stability |
+|---|---|---|
+| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
 [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status