WIP - Enable OpenTelemetry integration with existing span and hook architecture #2755

tcatling · 2025-11-11T21:35:46Z

This is a first draft of enabling opentelemetry integration with inspect spans, and using inspect hooks to propagate those to model calls (or across other system boundaries). Opentelemetry is the dominant provider for distributed tracing - see screenshots below for examples of how it can be useful. We actually use AWS X-Ray within AISI, but they are compatible and otel is an open standard.

This creates one otel 'trace' per sample, and each inspect span within that will become an otel 'span'. Looking at the output in jaeger, you get something like the following:

These are all traces.

Clicking into a trace gives you more detailed info:

The solver which produced this has a few custom inspect spans which you can see above translated into otel spans:

    # Define a custom solver that creates nested spans
    @solver
    def custom_solver():
        async def solve(state: TaskState, generate: Generate) -> TaskState:
            # Create a custom span to demonstrate nesting
            async with span("custom_processing", type="processing"):
                # Add some metadata
                state.metadata["processed"] = True

                # Create another nested span
                async with span("validation", type="validation"):
                    state.metadata["validated"] = True

            return state

        return solve

Trace visualisation is also useful for surfacing errors:

From:

    # Define a solver that intentionally raises an exception
    @solver
    def error_solver():
        async def solve(state: TaskState, generate: Generate) -> TaskState:
            async with span("before_error", type="processing"):
                state.metadata["before"] = True

            # This span will have an exception recorded
            async with span("error_span", type="processing"):
                state.metadata["about_to_fail"] = True
                raise ValueError("Intentional test error!")

            # This won't be reached
            async with span("after_error", type="processing"):
                state.metadata["after"] = True

            return state

        return solve

I've validated that this does successfully insert otel headers in httpx requests. These look like traceparent=00-6e94d6fe73078499fe0ab315cb7ea7d0-95d43b4b053d0c23-01, which can be explained like:

  00-6e94d6fe73078499fe0ab315cb7ea7d0-95d43b4b053d0c23-01
  │  │                                │                │
  │  │                                │                └─ flags: 01 (sampled/recorded)
  │  │                                └─ parent_span_id: 95d43b4b053d0c23
  │  └─ trace_id: 6e94d6fe73078499fe0ab315cb7ea7d0
  └─ version: 00 (W3C Trace Context v1.0)

This allows propagating trace info across system boundaries so, assuming i'm collecting the emitted data from both systems, I can view correlated activity in a single place.

I'm quite excited about this because it allows linking span-level inspect info with network-level activity from a platform point of view. For example, it will make it far easier to separate and group network activity from different agents. In the future it would be interesting to think about propagating this trace info into sandboxes.

I'm sure there's loads of things i've missed with this implementation, but please let me know if you think this is a direction worth pursuing. All feedback very welcome.

Setup Notes

A common pattern with tracing is to have a local process (or sidecar etc) acting as a collector, which ships trace data elsewhere. For example to create the above, I ran the following docker compose:

version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger
    ports:
      - "16686:16686"  # Jaeger UI
      - "14250:14250"  # Jaeger gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    restart: unless-stopped

  otel-collector:
    image: otel/opentelemetry-collector:latest
    container_name: otel-collector
    command: ["--config=/etc/otel-collector-config.yml"]
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
      - "13133:13133" # health_check extension
    depends_on:
      - jaeger
    restart: unless-stopped

and configured inspect via:

    configure_opentelemetry(
        enabled=True,
        service_name="inspect_ai_test",
        exporter="otlp",
        endpoint="http://localhost:4317",
    )

Before running the eval.

These requests are on localhost (very normal for a trace collector) so they should be fast. You can see we're also using BatchSpanProcessor so I think the performance implications of this should be minimal.

'Recording' (sending to a collector) is actually independent of trace ID generation and propagation; I fully expect most users will never care about this and will not enable recording. However, within AISI (and I think probably other places with model proxies and centralised platforms like METR hawk) it would still be super valuable to have a trace ID injected into our platform systems (where we are recording) which can be correlated with eval logs.

tcatling added 5 commits November 11, 2025 21:03

WIP - enabling otel integration with existing span and hook architecture

bab59f5

Dont touch existing HttpxHooks

63eda6b

Linting

685129a

Ruff

f537440

Un-lint

df8411e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Enable OpenTelemetry integration with existing span and hook architecture #2755

WIP - Enable OpenTelemetry integration with existing span and hook architecture #2755

Uh oh!

tcatling commented Nov 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WIP - Enable OpenTelemetry integration with existing span and hook architecture #2755

Are you sure you want to change the base?

WIP - Enable OpenTelemetry integration with existing span and hook architecture #2755

Uh oh!

Conversation

tcatling commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcatling commented Nov 11, 2025 •

edited

Loading