AgentOS provides opt-in OpenTelemetry (OTEL) spans, metrics, and log correlation/export hooks. AgentOS itself does not start an OTEL SDK. Your host application owns exporters, sampling, and context propagation.
- Overview
- Opt-In Policy
- Enable via AgentOS Config (Recommended)
- Enable via Environment Variables
- Host Setup (Node.js Example)
- What AgentOS Emits
- Logging (Pino + OTEL Logs)
- Privacy & Cardinality
- Performance Notes
- SOTA Techniques (TypeScript Agentic AI)
Defaults (all OFF):
- Manual AgentOS spans
- AgentOS metrics
- Trace IDs in streamed responses
- Log correlation (
trace_id,span_id) - OTEL LogRecord export
When enabled, AgentOS emits:
- Spans around turns and tool-result handling
- Metrics for turn/tool counters + histograms
- Optional: trace correlation in logs and streamed response metadata
- Optional: OTEL LogRecords (exported by your host, via OTLP)
There are two layers:
-
Host OTEL SDK (required for export)
- In Node:
@opentelemetry/sdk-node+ exporters/instrumentations. - In browsers: the web OTEL SDK (if you choose to export from the client).
- In Node:
-
AgentOS instrumentation toggles (controls what AgentOS emits)
- Env flags (global defaults)
AgentOSConfig.observability(per-agent control)
Precedence:
AgentOSConfig.observability.enabled = falsehard-disables all AgentOS observability helpers (even if env is set).- Otherwise, config fields override env fields, and env provides defaults.
import { AgentOS } from '@framers/agentos';
const agentos = new AgentOS();
await agentos.initialize({
// ...your normal config...
observability: {
// Master switch: when true, defaults to enabling tracing/metrics + log correlation.
// Keep explicit per-signal toggles if you want a tighter blast radius.
// enabled: true,
tracing: {
enabled: true,
includeTraceInResponses: true, // adds metadata.trace to select streamed chunks
},
metrics: {
enabled: true,
},
logging: {
includeTraceIds: true, // adds trace_id/span_id to pino log meta
exportToOtel: false, // keep OFF unless you want OTLP log export
// otelLoggerName: '@framers/agentos',
},
},
});# Master switch: enables tracing + metrics + log trace_id/span_id injection defaults
AGENTOS_OBSERVABILITY_ENABLED=true
# Optional fine-grained toggles
AGENTOS_TRACING_ENABLED=true
AGENTOS_METRICS_ENABLED=true
AGENTOS_TRACE_IDS_IN_RESPONSES=true
AGENTOS_LOG_TRACE_IDS=true
# Optional: emit OTEL LogRecords (still requires a host SDK + logs exporter)
AGENTOS_OTEL_LOGS_ENABLED=true
# Names (advanced; usually keep defaults)
AGENTOS_OTEL_TRACER_NAME=@framers/agentos
AGENTOS_OTEL_METER_NAME=@framers/agentos
AGENTOS_OTEL_LOGGER_NAME=@framers/agentosAgentOS only uses OTEL APIs; your host must install/start an OTEL SDK to export anything.
Typical Node setup:
- Install dependencies:
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node- Configure env (OTLP/HTTP collector example):
OTEL_SERVICE_NAME=my-agent-host
OTEL_TRACES_EXPORTER=otlp
OTEL_METRICS_EXPORTER=otlp
# OTEL_LOGS_EXPORTER=otlp # keep explicit opt-in for log export
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
# Optional sampling
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1- Start the SDK early (before most imports) so auto-instrumentation can patch libraries:
// otel.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
instrumentations: [getNodeAutoInstrumentations()],
});
export async function startOtel(): Promise<void> {
await sdk.start();
}
export async function shutdownOtel(): Promise<void> {
await sdk.shutdown();
}In this monorepo, the backend bootstrap lives at backend/src/observability/otel.ts.
When tracing is enabled, AgentOS emits spans such as:
agentos.turnagentos.gmi.get_or_createagentos.gmi.process_turn_streamagentos.tool_resultagentos.gmi.handle_tool_resultagentos.conversation.save(stage-tagged)
When metrics are enabled, AgentOS records:
agentos.turns(counter)agentos.turn.duration_ms(histogram)agentos.turn.tokens.total|prompt|completion(histograms; only when usage is available)agentos.turn.cost.usd(histogram; only when cost is available)agentos.tool_results(counter)agentos.tool_result.duration_ms(histogram)
When enabled, AgentOS attaches trace metadata to select streamed chunks:
{
"metadata": {
"trace": {
"traceId": "...",
"spanId": "...",
"traceparent": "00-...-...-01"
}
}
}AgentOS uses pino for structured logs. When includeTraceIds is enabled and an active span exists, AgentOS adds:
trace_idspan_id
to log metadata to make correlation easy in any log backend.
If you want logs to flow through the OTLP pipeline (instead of, or in addition to, stdout shipping):
-
Enable AgentOS OTEL log emission:
AGENTOS_OTEL_LOGS_ENABLED=true, orobservability.logging.exportToOtel = true
-
Enable a host logs exporter (Node example):
OTEL_LOGS_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobufRecommendation:
- Keep OTEL log export OFF by default.
- Use stdout logs + trace correlation for most deployments.
- Turn on OTEL log export when you explicitly want one unified OTLP pipeline for traces/metrics/logs.
Defaults are conservative:
- Prompts, model outputs, and tool arguments are not recorded by default.
- Prefer safe metadata only (durations, status, tool names, model/provider ids, token usage, cost).
Cardinality guidance:
- Avoid labeling metrics/spans with high-cardinality values (user ids, conversation ids, raw URLs, prompt text).
- Keep attributes stable and low-cardinality (e.g.
status,tool_name,persona_id).
- AgentOS observability helpers are a safe no-op when disabled.
- Traces/metrics are typically low overhead when sampling is enabled.
- OTEL log export can add noticeable CPU/network overhead at
debugvolume; use it intentionally.
Patterns that work well in practice:
- Structured event stream: emit agent lifecycle events (turn started, tool called, tool returned, policy decision, final output) as strongly-typed records (AgentOS already streams chunks; persist them if you need audits).
- W3C context propagation: propagate
traceparentacross inbound HTTP, SSE/WebSocket streaming, and tool calls; use OTEL context managers (AsyncLocalStorage) in Node. - GenAI semantic conventions: add
gen_ai.*attributes/events to spans when instrumenting model calls, tool calls, and token usage; keep raw content behind explicit opt-in and redaction. - Redaction and data classification: treat prompt/tool args/output as sensitive by default; add allowlists + hashing for debugging without content exfiltration.
Common library choices:
- Telemetry:
@opentelemetry/sdk-node,@opentelemetry/auto-instrumentations-node - Logging:
pino(+@opentelemetry/instrumentation-pinoif you want automatic injection everywhere) - Agent/LLM observability layers: Langfuse, Helicone, Sentry AI monitoring, OpenLIT, OpenLLMetry-js