Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
cdb0938
Update events.yaml
singankit Jul 24, 2025
01b9ac4
Update registry.yaml
singankit Jul 24, 2025
1f8e684
Update gen-ai-events.md
singankit Jul 24, 2025
485a76e
Update gen-ai.md
singankit Jul 24, 2025
53f120d
Adding gen_ai.evaluation.ouptut.metadata attribute
singankit Jul 25, 2025
a04553f
Updating metadata attribute
singankit Jul 28, 2025
5338dee
Updating md files
singankit Jul 29, 2025
9ecd14e
Adding evaluation event header in docs
singankit Jul 29, 2025
f78bdd3
Span to capture evaluation result instead of events
singankit Aug 1, 2025
90e4b08
Updating changelog
singankit Aug 5, 2025
cdc4b0a
Fixing yamllint issues
singankit Aug 6, 2025
f334fee
Review comments updates
singankit Aug 6, 2025
7fc2e36
Evaluation result as event
singankit Aug 13, 2025
ffada83
Updating changelog
singankit Aug 13, 2025
633e801
Review comments feedback
singankit Aug 19, 2025
bc827e7
Updating docs
singankit Aug 19, 2025
f2fdb68
Updating event description
singankit Aug 19, 2025
8289d09
Updating evaluation event description
singankit Aug 19, 2025
8716e21
Updating description in md file
singankit Aug 19, 2025
8ec9658
Merge remote-tracking branch 'origin/main' into users/singankit/gen_a…
singankit Aug 20, 2025
9801b06
Updating docs and runnign checks
singankit Aug 20, 2025
48b8c9e
Regenerating docs
singankit Aug 20, 2025
2c6dd4e
Review comments reasoning to explanation
singankit Aug 20, 2025
b81da2c
Updating recommendation level for score.value and score.label
singankit Aug 20, 2025
bd8f4ee
Review comment and yamllint fix
singankit Aug 20, 2025
aaa6367
Doc review comments
singankit Aug 21, 2025
ac0a67e
Removing token usage attribute from evaluation result
singankit Aug 26, 2025
2a5f787
Merge main
singankit Aug 26, 2025
890b9aa
Rebase from main and updating md files
singankit Aug 26, 2025
ac667ef
Reviw comments for response_id attribute on evaluation result
singankit Aug 26, 2025
5096a55
Update event docs
singankit Aug 26, 2025
f6c4408
Updating doc content
singankit Aug 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
change_type: enhancement
component: gen-ai
note: |
Introducing `Evaluation Span` in GenAI Semantic Conventions to represent and capture evaluation results.

issues: [2563]
82 changes: 82 additions & 0 deletions docs/gen-ai/gen-ai-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ linkTitle: Spans
- [Inference](#inference)
- [Embeddings](#embeddings)
- [Execute tool span](#execute-tool-span)
- [Evaluation span](#evaluation-span)
- [Capturing inputs and outputs](#capturing-inputs-and-outputs)

<!-- tocstop -->
Expand Down Expand Up @@ -351,6 +352,87 @@ Datastore: A tool used by the agent to access and query structured or unstructur
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

### Evaluation span

<!-- semconv span.gen_ai.evaluation.result.internal -->
<!-- NOTE: THIS TEXT IS AUTOGENERATED. DO NOT EDIT BY HAND. -->
<!-- see templates/registry/markdown/snippet.md.j2 -->
<!-- prettier-ignore-start -->
<!-- markdownlint-capture -->
<!-- markdownlint-disable -->

**Status:** ![Development](https://img.shields.io/badge/-development-blue)

This span captures the process and the result of evaluating GenAI output for quality, accuracy, or other characteristics.

`gen_ai.operation.name` SHOULD be `evaluation`.

**Span name** SHOULD be `evaluation {gen_ai.evaluation.name}`.

**Span kind** SHOULD be `INTERNAL`.

**Span status** SHOULD follow the [Recording Errors](/docs/general/recording-errors.md) document.

| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability |
|---|---|---|---|---|---|
| [`gen_ai.evaluation.name`](/docs/registry/attributes/gen-ai.md) | string | The name of the evaluation used for the GenAI response. | `Relevance`; `IntentResolution` | `Required` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.operation.name`](/docs/registry/attributes/gen-ai.md) | string | The name of the operation being performed. [1] | `chat`; `generate_content`; `text_completion` | `Required` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`error.type`](/docs/registry/attributes/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`gen_ai.evaluation.score`](/docs/registry/attributes/gen-ai.md) | double | The evaluation score returned by the evaluator. | `4.0` | `Conditionally Required` if evaluation completed successfully | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.request.model`](/docs/registry/attributes/gen-ai.md) | string | The name of the GenAI model a request is being made to. [3] | `gpt-4` | `Conditionally Required` If available. | ![Development](https://img.shields.io/badge/-development-blue) |
| [`server.port`](/docs/registry/attributes/server.md) | int | GenAI server port. [4] | `80`; `8080`; `443` | `Conditionally Required` If `server.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`gen_ai.evaluation.label`](/docs/registry/attributes/gen-ai.md) | string | Human readable label for evaluation. | `relevant`; `not_relevant`; `correct`; `incorrect`; `pass`; `fail` | `Recommended` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.evaluation.reasoning`](/docs/registry/attributes/gen-ai.md) | string | A free-form reasoning for the assigned score provided by the evaluator. | `The response is factually accurate but lacks sufficient detail to fully address the question.` | `Recommended` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.metadata`](/docs/registry/attributes/gen-ai.md) | string | Metadata associated with Gen AI operation. [5] | `{\"evaluator_version\": \"1.2.0\"}` | `Recommended` | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.usage.input_tokens`](/docs/registry/attributes/gen-ai.md) | int | The number of tokens used in the GenAI input (prompt). [6] | `100` | `Recommended` if evaluation was performed by a model | ![Development](https://img.shields.io/badge/-development-blue) |
| [`gen_ai.usage.output_tokens`](/docs/registry/attributes/gen-ai.md) | int | The number of tokens used in the GenAI response (completion). [7] | `180` | `Recommended` if evaluation was performed by a model | ![Development](https://img.shields.io/badge/-development-blue) |
| [`server.address`](/docs/registry/attributes/server.md) | string | GenAI server address. [8] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

**[1] `gen_ai.operation.name`:** If one of the predefined values applies, but specific system uses a different name it's RECOMMENDED to document it in the semantic conventions for specific GenAI system and use system-specific name in the instrumentation. If a different name is not documented, instrumentation libraries SHOULD use applicable predefined value.

**[2] `error.type`:** The `error.type` SHOULD match the error code returned by the Generative AI Evaluation provider or the client library,
the canonical name of exception that occurred, or another low-cardinality error identifier.
Instrumentations SHOULD document the list of errors they report.

**[3] `gen_ai.request.model`:** The name of the GenAI model a request is being made to. If the model is supplied by a vendor, then the value must be the exact name of the model requested. If the model is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.

**[4] `server.port`:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available.

**[5] `gen_ai.metadata`:** Metadata associated with evaluation.

**[6] `gen_ai.usage.input_tokens`:** The total number of input tokens used by the model during the evaluation.

**[7] `gen_ai.usage.output_tokens`:** The total number of output tokens used by the model during the evaluation.

**[8] `server.address`:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available.

---

`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

---

`gen_ai.operation.name` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

| Value | Description | Stability |
|---|---|---|
| `chat` | Chat completion operation such as [OpenAI Chat API](https://platform.openai.com/docs/api-reference/chat) | ![Development](https://img.shields.io/badge/-development-blue) |
| `create_agent` | Create GenAI agent | ![Development](https://img.shields.io/badge/-development-blue) |
| `embeddings` | Embeddings operation such as [OpenAI Create embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create) | ![Development](https://img.shields.io/badge/-development-blue) |
| `execute_tool` | Execute a tool | ![Development](https://img.shields.io/badge/-development-blue) |
| `generate_content` | Multimodal content generation operation such as [Gemini Generate Content](https://ai.google.dev/api/generate-content) | ![Development](https://img.shields.io/badge/-development-blue) |
| `invoke_agent` | Invoke GenAI agent | ![Development](https://img.shields.io/badge/-development-blue) |
| `text_completion` | Text completions operation such as [OpenAI Completions API (Legacy)](https://platform.openai.com/docs/api-reference/completions) | ![Development](https://img.shields.io/badge/-development-blue) |

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->
<!-- END AUTOGENERATED TEXT -->
<!-- endsemconv -->

## Capturing inputs and outputs

User inputs and model responses may be recorded as events parented to GenAI operation span. See [Semantic Conventions for GenAI events](./gen-ai-events.md) for the details.
Expand Down
Loading