Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
cdb0938
Update events.yaml
singankit Jul 24, 2025
01b9ac4
Update registry.yaml
singankit Jul 24, 2025
1f8e684
Update gen-ai-events.md
singankit Jul 24, 2025
485a76e
Update gen-ai.md
singankit Jul 24, 2025
53f120d
Adding gen_ai.evaluation.ouptut.metadata attribute
singankit Jul 25, 2025
a04553f
Updating metadata attribute
singankit Jul 28, 2025
5338dee
Updating md files
singankit Jul 29, 2025
9ecd14e
Adding evaluation event header in docs
singankit Jul 29, 2025
f78bdd3
Span to capture evaluation result instead of events
singankit Aug 1, 2025
90e4b08
Updating changelog
singankit Aug 5, 2025
cdc4b0a
Fixing yamllint issues
singankit Aug 6, 2025
f334fee
Review comments updates
singankit Aug 6, 2025
7fc2e36
Evaluation result as event
singankit Aug 13, 2025
ffada83
Updating changelog
singankit Aug 13, 2025
633e801
Review comments feedback
singankit Aug 19, 2025
bc827e7
Updating docs
singankit Aug 19, 2025
f2fdb68
Updating event description
singankit Aug 19, 2025
8289d09
Updating evaluation event description
singankit Aug 19, 2025
8716e21
Updating description in md file
singankit Aug 19, 2025
8ec9658
Merge remote-tracking branch 'origin/main' into users/singankit/gen_a…
singankit Aug 20, 2025
9801b06
Updating docs and runnign checks
singankit Aug 20, 2025
48b8c9e
Regenerating docs
singankit Aug 20, 2025
2c6dd4e
Review comments reasoning to explanation
singankit Aug 20, 2025
b81da2c
Updating recommendation level for score.value and score.label
singankit Aug 20, 2025
bd8f4ee
Review comment and yamllint fix
singankit Aug 20, 2025
aaa6367
Doc review comments
singankit Aug 21, 2025
ac0a67e
Removing token usage attribute from evaluation result
singankit Aug 26, 2025
2a5f787
Merge main
singankit Aug 26, 2025
890b9aa
Rebase from main and updating md files
singankit Aug 26, 2025
ac667ef
Reviw comments for response_id attribute on evaluation result
singankit Aug 26, 2025
5096a55
Update event docs
singankit Aug 26, 2025
f6c4408
Updating doc content
singankit Aug 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
change_type: enhancement
component: gen-ai
note: |
Introducing `Evaluation Event` in GenAI Semantic Conventions to represent and capture evaluation results.

issues: [2563]
48 changes: 48 additions & 0 deletions docs/gen-ai/gen-ai-events.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ linkTitle: Events
<!-- toc -->

- [Event: `event.gen_ai.client.inference.operation.details`](#event-eventgen_aiclientinferenceoperationdetails)
- [Event: `event.gen_ai.evaluation.result`](#event-eventgen_aievaluationresult)

<!-- tocstop -->

Expand Down Expand Up @@ -209,4 +210,51 @@ section for more details.
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

## Event: `event.gen_ai.evaluation.result`

<!-- semconv event.gen_ai.evaluation.result -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

**Status:** ![Development](https://img.shields.io/badge/-development-blue)

The event name MUST be `gen_ai.evaluation.result`.

This event captures the result of evaluating GenAI output for quality, accuracy, or other characteristics. This event SHOULD be parented to GenAI operation span being evaluated when possible or set `gen_ai.response.id` when span id is not available.

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`gen_ai.evaluation.name`](/docs/registry/attributes/gen-ai.md) | string | The name of the evaluation metric used for the GenAI response. | `Relevance`; `IntentResolution` | `Required` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`error.type`](/docs/registry/attributes/error.md) | string | Describes a class of error the operation ended with. [1] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`gen_ai.evaluation.score.label`](/docs/registry/attributes/gen-ai.md) | string | Human readable label for evaluation. [2] | `relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail` | `Conditionally Required` if applicable | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.evaluation.score.value`](/docs/registry/attributes/gen-ai.md) | double | The evaluation score returned by the evaluator. | `4.0` | `Conditionally Required` if applicable | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.evaluation.explanation`](/docs/registry/attributes/gen-ai.md) | string | A free-form explanation for the assigned score provided by the evaluator. | `The response is factually accurate but lacks sufficient detail to fully address the question.` | `Recommended` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.response.id`](/docs/registry/attributes/gen-ai.md) | string | The unique identifier for the completion. [3] | `chatcmpl-123` | `Recommended` when available | ![Development](https://img.shields.io/badge/-development-blue) |

**[1] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
the canonical name of exception that occurred, or another low-cardinality error identifier.
Instrumentations SHOULD document the list of errors they report.

**[2] `gen_ai.evaluation.score.label`:** This attribute provides a human-readable interpretation of the evaluation score produced by an evaluator. For example, a score value of 1 could mean "relevant" in one evaluation system and "not relevant" in another, depending on the scoring range and evaluator. The label SHOULD have low cardinality. Possible values depend on the evaluation metric and evaluator used; implementations SHOULD document the possible values.

**[3] `gen_ai.response.id`:** The unique identifier assigned to the specific
completion being evaluated. This attribute helps correlate the evaluation
event with the corresponding operation when span id is not available.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
Loading
Loading