diff --git a/util/opentelemetry-util-genai-dev/README.traceloop_translator.md b/util/opentelemetry-util-genai-dev/README.traceloop_translator.md new file mode 100644 index 0000000000..34cb49d8ae --- /dev/null +++ b/util/opentelemetry-util-genai-dev/README.traceloop_translator.md @@ -0,0 +1,90 @@ +# Traceloop -> GenAI Semantic Convention Translator Emitter + +This optional emitter promotes legacy `traceloop.*` attributes attached to an `LLMInvocation` into +Semantic Convention (or forward-looking custom `gen_ai.*`) attributes **before** the standard +Semantic Convention span emitter runs. It does **not** create its own span. + +## Why Use It? +If you have upstream code (or the Traceloop compat emitter) producing `traceloop.*` keys but you +want downstream dashboards/tools to rely on GenAI semantic conventions, enabling this translator +lets you transition without rewriting upstream code immediately. + +## What It Does +At `on_start` of an `LLMInvocation` it scans `invocation.attributes` for keys beginning with +`traceloop.` and (non-destructively) adds corresponding keys: + +| Traceloop Key (prefixed or raw) | Added Key | Notes | +|---------------------------------|---------------------------|-------| +| `traceloop.workflow.name` / `workflow.name` | `gen_ai.workflow.name` | Custom (not yet in spec) | +| `traceloop.entity.name` / `entity.name` | `gen_ai.agent.name` | Approximates entity as agent name | +| `traceloop.entity.path` / `entity.path` | `gen_ai.workflow.path` | Custom placeholder | +| `traceloop.callback.name` / `callback.name` | `gen_ai.callback.name` | Also sets `gen_ai.operation.source` if absent | +| `traceloop.callback.id` / `callback.id` | `gen_ai.callback.id` | Custom | +| `traceloop.entity.input` / `entity.input` | `gen_ai.input.messages` | Serialized form already present | +| `traceloop.entity.output` / `entity.output` | `gen_ai.output.messages`| Serialized form already present | + +Existing `gen_ai.*` keys are never overwritten. + +## Enabling +Fast path (no entry point needed): + +```bash +export OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR=1 +export OTEL_INSTRUMENTATION_GENAI_EMITTERS=span,traceloop_compat + +Optional (remove original traceloop.* after promotion): +export OTEL_GENAI_TRACELOOP_TRANSLATOR_STRIP_LEGACY=1 +``` + +The flag auto-prepends the translator before the semantic span emitter. You can still add +`traceloop_translator` explicitly once an entry point is created. + +You can also load this emitter the same way as other extra emitters. There are two common patterns: + +### 1. Via `OTEL_INSTRUMENTATION_GENAI_EMITTERS` with an extra token +If your emitter loading logic supports extra entry-point based names directly (depending on branch state), add the translator token (e.g. `traceloop_translator`). Example: + +```bash +export OTEL_INSTRUMENTATION_GENAI_EMITTERS=span,traceloop_translator,traceloop_compat +``` + +Ordering is important: we request placement `before=semconv_span` in the spec, but if your environment override reorders span emitters you can enforce explicitly (see next section). + +### 2. Using Category Override Environment Variable +If your build supports category overrides (as implemented in `configuration.py`), you can prepend: + +```bash +export OTEL_INSTRUMENTATION_GENAI_EMITTERS=span,traceloop_compat +export OTEL_INSTRUMENTATION_GENAI_EMITTERS_SPAN=prepend:TraceloopTranslator +``` + +The override ensures the translator emitter runs before the semantic span emitter regardless of default resolution order. + +## Example +Minimal Python snippet (assuming emitters are loaded via entry points and the translator is installed): + +```python +from opentelemetry.util.genai.handler import get_telemetry_handler +from opentelemetry.util.genai.types import LLMInvocation, InputMessage, OutputMessage, Text + +inv = LLMInvocation( + request_model="gpt-4", + input_messages=[InputMessage(role="user", parts=[Text("Hello")])], + attributes={ + "traceloop.entity.name": "ChatLLM", + "traceloop.workflow.name": "user_flow", + "traceloop.callback.name": "root_chain", + "traceloop.entity.input": "[{'role':'user','content':'Hello'}]", + }, +) +handler = get_telemetry_handler() +handler.start_llm(inv) +inv.output_messages = [OutputMessage(role="assistant", parts=[Text("Hi")], finish_reason="stop")] +handler.stop_llm(inv) +# Result: final semantic span contains gen_ai.agent.name, gen_ai.workflow.name, gen_ai.input.messages, etc. +``` + +## Non-Goals +- It does not remove or rename original `traceloop.*` attributes (no destructive behavior yet). +- It does not attempt deep semantic inference; mappings are intentionally conservative. +- It does not serialize messages itself—relies on upstream emitters to have placed serialized content already. diff --git a/util/opentelemetry-util-genai-dev/README.translator.md b/util/opentelemetry-util-genai-dev/README.translator.md new file mode 100644 index 0000000000..46b59dc6d9 --- /dev/null +++ b/util/opentelemetry-util-genai-dev/README.translator.md @@ -0,0 +1,41 @@ +# Translator + +## Automatic Span Processing (Recommended) + +Add `TraceloopSpanProcessor` to your TracerProvider to automatically transform all matching spans: + +```python +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.util.genai.processors import TraceloopSpanProcessor + +# Set up tracer provider +provider = TracerProvider() + +# Add processor - transforms all matching spans automatically +processor = TraceloopSpanProcessor( + attribute_transformations={ + "remove": ["debug_info"], + "rename": {"model_ver": "llm.model.version"}, + "add": {"service.name": "my-llm"} + }, + name_transformations={"chat *": "llm.openai.chat"}, + traceloop_attributes={ + "traceloop.entity.name": "MyLLMEntity" + } +) +provider.add_span_processor(processor) +trace.set_tracer_provider(provider) + +``` + +## Transformation Rules + +### Attributes +- **Remove**: `"remove": ["field1", "field2"]` +- **Rename**: `"rename": {"old_name": "new_name"}` +- **Add**: `"add": {"key": "value"}` + +### Span Names +- **Direct**: `"old name": "new name"` +- **Pattern**: `"chat *": "llm.chat"` (wildcard matching) \ No newline at end of file diff --git a/util/opentelemetry-util-genai-dev/TRANSLATOR_README.md b/util/opentelemetry-util-genai-dev/TRANSLATOR_README.md new file mode 100644 index 0000000000..46b59dc6d9 --- /dev/null +++ b/util/opentelemetry-util-genai-dev/TRANSLATOR_README.md @@ -0,0 +1,41 @@ +# Translator + +## Automatic Span Processing (Recommended) + +Add `TraceloopSpanProcessor` to your TracerProvider to automatically transform all matching spans: + +```python +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.util.genai.processors import TraceloopSpanProcessor + +# Set up tracer provider +provider = TracerProvider() + +# Add processor - transforms all matching spans automatically +processor = TraceloopSpanProcessor( + attribute_transformations={ + "remove": ["debug_info"], + "rename": {"model_ver": "llm.model.version"}, + "add": {"service.name": "my-llm"} + }, + name_transformations={"chat *": "llm.openai.chat"}, + traceloop_attributes={ + "traceloop.entity.name": "MyLLMEntity" + } +) +provider.add_span_processor(processor) +trace.set_tracer_provider(provider) + +``` + +## Transformation Rules + +### Attributes +- **Remove**: `"remove": ["field1", "field2"]` +- **Rename**: `"rename": {"old_name": "new_name"}` +- **Add**: `"add": {"key": "value"}` + +### Span Names +- **Direct**: `"old name": "new name"` +- **Pattern**: `"chat *": "llm.chat"` (wildcard matching) \ No newline at end of file diff --git a/util/opentelemetry-util-genai-dev/examples/traceloop_span_transformation/traceloop_rules_example.py b/util/opentelemetry-util-genai-dev/examples/traceloop_span_transformation/traceloop_rules_example.py new file mode 100644 index 0000000000..5432f55a10 --- /dev/null +++ b/util/opentelemetry-util-genai-dev/examples/traceloop_span_transformation/traceloop_rules_example.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +from opentelemetry import trace +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.sdk.trace.export import ( + SimpleSpanProcessor, + ConsoleSpanExporter, +) + +from opentelemetry.util.genai.handler import get_telemetry_handler +from opentelemetry.util.genai.types import ( + LLMInvocation, + InputMessage, + OutputMessage, + Text, +) + + +def run_example(): + provider = TracerProvider() + provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter())) + trace.set_tracer_provider(provider) + + # Build a telemetry handler (singleton) – emitters are chosen via env vars + handler = get_telemetry_handler(tracer_provider=provider) + + # Include a few illustrative Traceloop-style attributes. + # These will be mapped/prefixed automatically by the Traceloop compat emitter. + invocation = LLMInvocation( + request_model="gpt-4", + input_messages=[InputMessage(role="user", parts=[Text("Hello")])], + attributes={ + "custom.attribute": "value", # arbitrary user attribute + "traceloop.entity.name": "ChatLLM", + "traceloop.workflow.name": "main_flow", + "traceloop.entity.path": "root/branch/leaf", + "traceloop.entity.input": "Hi" + }, + ) + + handler.start_llm(invocation) + # Simulate model output + invocation.output_messages = [ + OutputMessage( + role="assistant", parts=[Text("Hi there!")], finish_reason="stop" + ) + ] + handler.stop_llm(invocation) + + print("\nInvocation complete. Check exporter output above for:" + "\n * SemanticConvention span containing promoted gen_ai.* keys" + "\n * Traceloop compat span (legacy format)" + "\nIf translator emitter enabled, attributes like gen_ai.agent.name should be present.\n") + + +if __name__ == "__main__": + run_example() diff --git a/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/configuration.py b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/configuration.py index d66d45c00a..62ca98577a 100644 --- a/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/configuration.py +++ b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/configuration.py @@ -1,6 +1,7 @@ from __future__ import annotations import logging +import os from dataclasses import dataclass from types import MethodType from typing import Any, Dict, Iterable, List, Sequence @@ -99,6 +100,25 @@ def _register(spec: EmitterSpec) -> None: target.append(spec) spec_registry[spec.name] = spec + if os.getenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR"): + try: + from .traceloop_translator import ( + TraceloopTranslatorEmitter, # type: ignore + ) + + _register( + EmitterSpec( + name="TraceloopTranslator", + category=_CATEGORY_SPAN, + factory=lambda ctx: TraceloopTranslatorEmitter(), + mode="prepend", # ensure it runs before semantic span emitter + ) + ) + except Exception: # pragma: no cover - defensive + _logger.exception( + "Failed to initialize TraceloopTranslator emitter despite flag set" + ) + if settings.enable_span and not settings.only_traceloop_compat: _register( EmitterSpec( diff --git a/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/span.py b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/span.py index 6130405e8b..674bae968c 100644 --- a/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/span.py +++ b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/span.py @@ -254,10 +254,16 @@ def on_start( elif isinstance(invocation, EmbeddingInvocation): self._start_embedding(invocation) else: - # Use operation field for span name (defaults to "chat") - operation = getattr(invocation, "operation", "chat") - model_name = invocation.request_model - span_name = f"{operation} {model_name}" + # Use override if processor supplied one; else operation+model + override = getattr(invocation, "attributes", {}).get( + "gen_ai.override.span_name" + ) + if override: + span_name = str(override) + else: + operation = getattr(invocation, "operation", "chat") + model_name = invocation.request_model + span_name = f"{operation} {model_name}" cm = self._tracer.start_as_current_span( span_name, kind=SpanKind.CLIENT, end_on_exit=False ) diff --git a/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/traceloop_translator.py b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/traceloop_translator.py new file mode 100644 index 0000000000..c861ecfd22 --- /dev/null +++ b/util/opentelemetry-util-genai-dev/src/opentelemetry/util/genai/emitters/traceloop_translator.py @@ -0,0 +1,120 @@ +"""Traceloop -> GenAI Semantic Convention translation emitter. + +This emitter runs early in the span category chain and *mutates* the invocation +attributes in-place, translating a subset of legacy ``traceloop.*`` attributes +into semantic convention (``gen_ai.*``) or structured invocation fields so that +subsequent emitters (e.g. the primary semconv span emitter) naturally record +the standardized form. + +It intentionally does NOT emit its own span. It simply rewrites data. + +If both the original TraceloopCompatEmitter and this translator are enabled, +the pipeline order should be: translator -> semconv span -> traceloop compat span. +The translator only promotes data; it does not delete the legacy attributes by +default (configurable via env var in future if needed). +""" + +from __future__ import annotations + +import os +from typing import Any, Dict, Optional + +from .spec import EmitterFactoryContext, EmitterSpec +from ..interfaces import EmitterMeta +from ..types import LLMInvocation + +# Mapping from traceloop attribute key (without prefix) to either: +# - a gen_ai semantic convention attribute key +# - a special handler function name (prefixed with "@") for structured placement. +_TRACELOOP_TO_SEMCONV: Dict[str, str] = { + "workflow.name": "gen_ai.workflow.name", # custom (not in spec yet) + "entity.name": "gen_ai.agent.name", # approximate: treat entity as agent name + "entity.path": "gen_ai.workflow.path", # custom placeholder (maps from traceloop.entity.path or entity.path) + # callback metadata (custom placeholders until standardized) + "callback.name": "gen_ai.callback.name", + "callback.id": "gen_ai.callback.id", + # span.kind is redundant (SpanKind already encodes); omitted +} + +# Input/output content attributes – when present we map them to message serialization +# helpers by copying into invocation.attributes under semconv-like provisional keys. +_CONTENT_MAPPING = { + "entity.input": "gen_ai.input.messages", + "entity.output": "gen_ai.output.messages", +} + + +_STRIP_FLAG = "OTEL_GENAI_TRACELOOP_TRANSLATOR_STRIP_LEGACY" + + +class TraceloopTranslatorEmitter(EmitterMeta): + role = "span" + name = "traceloop_translator" + + def __init__(self) -> None: # no tracer needed – we do not create spans + pass + + def handles(self, obj: object) -> bool: # only care about LLM invocations + return isinstance(obj, LLMInvocation) + + def on_start(self, invocation: LLMInvocation) -> None: # mutate attributes + attrs = getattr(invocation, "attributes", None) + if not attrs: + return + strip_legacy = bool(os.getenv(_STRIP_FLAG)) + for key in list(attrs.keys()): + value = attrs.get(key) + is_prefixed = False + if key.startswith("traceloop."): + raw_key = key[len("traceloop.") :] + is_prefixed = True + elif key in _TRACELOOP_TO_SEMCONV or key in _CONTENT_MAPPING: + raw_key = key + else: + continue + + # Content mapping + if raw_key in _CONTENT_MAPPING: + target = _CONTENT_MAPPING[raw_key] + attrs.setdefault(target, value) + else: + mapped = _TRACELOOP_TO_SEMCONV.get(raw_key) + if mapped: + attrs.setdefault(mapped, value) + if raw_key == "callback.name" and isinstance(value, str): + attrs.setdefault("gen_ai.operation.source", value) + + # Optionally remove legacy prefixed variant after promotion + if strip_legacy and is_prefixed: + try: + attrs.pop(key, None) + except Exception: # pragma: no cover - defensive + pass + + # No-op finish & error hooks – translation is only needed once. + def on_end(self, invocation: LLMInvocation) -> None: # pragma: no cover - trivial + return + + def on_error(self, error, invocation: LLMInvocation) -> None: # pragma: no cover - trivial + return + + +def traceloop_translator_emitters() -> list[EmitterSpec]: + def _factory(ctx: EmitterFactoryContext) -> TraceloopTranslatorEmitter: + return TraceloopTranslatorEmitter() + + return [ + EmitterSpec( + name="TraceloopTranslator", + category="span", + factory=_factory, + mode="prepend", # ensure earliest so promotion happens before SemanticConvSpan is added + after=(), + ) + ] + + +__all__ = [ + "TraceloopTranslatorEmitter", + "traceloop_translator_emitters", +] diff --git a/util/opentelemetry-util-genai-dev/tests/test_traceloop_translator_emitter.py b/util/opentelemetry-util-genai-dev/tests/test_traceloop_translator_emitter.py new file mode 100644 index 0000000000..2e04fbd1a5 --- /dev/null +++ b/util/opentelemetry-util-genai-dev/tests/test_traceloop_translator_emitter.py @@ -0,0 +1,137 @@ +# Copyright The OpenTelemetry Authors +# SPDX-License-Identifier: Apache-2.0 + +from __future__ import annotations + +import os +import importlib + +import pytest + +from opentelemetry.util.genai.handler import TelemetryHandler +from opentelemetry.util.genai.types import LLMInvocation, InputMessage, OutputMessage, Text + + +@pytest.fixture(autouse=True) +def clear_env(monkeypatch): + # Ensure flags start unset each test + for key in [ + "OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", + "OTEL_GENAI_TRACELOOP_TRANSLATOR_STRIP_LEGACY", + "OTEL_INSTRUMENTATION_GENAI_EMITTERS", + ]: + monkeypatch.delenv(key, raising=False) + yield + for key in [ + "OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", + "OTEL_GENAI_TRACELOOP_TRANSLATOR_STRIP_LEGACY", + "OTEL_INSTRUMENTATION_GENAI_EMITTERS", + ]: + monkeypatch.delenv(key, raising=False) + + +def _fresh_handler(): + # Force re-parse of env + pipeline rebuild by reloading config-dependent modules + import opentelemetry.util.genai.emitters.configuration as cfg + importlib.reload(cfg) + import opentelemetry.util.genai.handler as handler_mod + importlib.reload(handler_mod) + return handler_mod.TelemetryHandler() + + +def test_translator_promotes_prefixed(monkeypatch): + monkeypatch.setenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", "1") + # Ensure standard span + compat so we can observe merged attributes + monkeypatch.setenv("OTEL_INSTRUMENTATION_GENAI_EMITTERS", "span,traceloop_compat") + + handler = _fresh_handler() + inv = LLMInvocation( + request_model="gpt-4", + input_messages=[InputMessage(role="user", parts=[Text("hi")])], + attributes={ + "traceloop.workflow.name": "main_flow", + "traceloop.entity.name": "AgentX", + "traceloop.entity.path": "root/branch/leaf", + "traceloop.callback.name": "root_chain", + "traceloop.callback.id": "cb-123", + }, + ) + handler.start_llm(inv) + # Translator runs on start; attributes should be promoted now + assert inv.attributes.get("gen_ai.workflow.name") == "main_flow" + assert inv.attributes.get("gen_ai.agent.name") == "AgentX" + assert inv.attributes.get("gen_ai.workflow.path") == "root/branch/leaf" + assert inv.attributes.get("gen_ai.callback.name") == "root_chain" + assert inv.attributes.get("gen_ai.callback.id") == "cb-123" + # Original keys retained by default + assert "traceloop.entity.path" in inv.attributes + + +def test_translator_promotes_raw(monkeypatch): + monkeypatch.setenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", "1") + monkeypatch.setenv("OTEL_INSTRUMENTATION_GENAI_EMITTERS", "span") + handler = _fresh_handler() + inv = LLMInvocation( + request_model="gpt-4", + input_messages=[], + attributes={ + "workflow.name": "flow_raw", + "entity.name": "AgentRaw", + "entity.path": "a/b/c", + }, + ) + handler.start_llm(inv) + assert inv.attributes.get("gen_ai.workflow.name") == "flow_raw" + assert inv.attributes.get("gen_ai.agent.name") == "AgentRaw" + assert inv.attributes.get("gen_ai.workflow.path") == "a/b/c" + + +def test_translator_does_not_overwrite(monkeypatch): + monkeypatch.setenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", "1") + monkeypatch.setenv("OTEL_INSTRUMENTATION_GENAI_EMITTERS", "span") + handler = _fresh_handler() + inv = LLMInvocation( + request_model="gpt-4", + input_messages=[], + attributes={ + "traceloop.workflow.name": "legacy_name", + "gen_ai.workflow.name": "canonical_name", + }, + ) + handler.start_llm(inv) + # Existing canonical value preserved + assert inv.attributes.get("gen_ai.workflow.name") == "canonical_name" + + +def test_translator_strip_legacy(monkeypatch): + monkeypatch.setenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", "1") + monkeypatch.setenv("OTEL_GENAI_TRACELOOP_TRANSLATOR_STRIP_LEGACY", "1") + monkeypatch.setenv("OTEL_INSTRUMENTATION_GENAI_EMITTERS", "span") + handler = _fresh_handler() + inv = LLMInvocation( + request_model="gpt-4", + input_messages=[], + attributes={ + "traceloop.entity.path": "strip/me", + }, + ) + handler.start_llm(inv) + assert inv.attributes.get("gen_ai.workflow.path") == "strip/me" + # Legacy removed + assert "traceloop.entity.path" not in inv.attributes + + +def test_callback_sets_operation_source(monkeypatch): + monkeypatch.setenv("OTEL_GENAI_ENABLE_TRACELOOP_TRANSLATOR", "1") + monkeypatch.setenv("OTEL_INSTRUMENTATION_GENAI_EMITTERS", "span") + handler = _fresh_handler() + inv = LLMInvocation( + request_model="gpt-4", + input_messages=[], + attributes={ + "traceloop.callback.name": "chain_node", + }, + ) + handler.start_llm(inv) + assert inv.attributes.get("gen_ai.callback.name") == "chain_node" + assert inv.attributes.get("gen_ai.operation.source") == "chain_node"