Skip to content

Commit c1a4015

Browse files
tcdentdevin-ai-integration[bot]travis@agentops.aidot-agiclaude
authored
Refactor Agents SDK instrumentation. (#854)
* Fix: Improve serialization of completions/responses in Agents SDK instrumentation Co-Authored-By: [email protected] <[email protected]> * Fix: Improve serialization of completions/responses in Agents SDK instrumentation Co-Authored-By: [email protected] <[email protected]> * Tests for completions. * Separate OpenAI tests into `completion` and `responses` * Refactor completions and responses unit tests. * agents SDK test using semantic conventions. * semantic conventions in openai completions and responses tests * Exporter refactor and generalization. standardization and simplification of version of values into semantic types. * Continued refactor of Agents instrumentor. Usurp third-party implementation. * Semantic conventions for messages. * Tools for generating real test data from OpenAI Agents. * support tool calls and set of responses. missing import * reasoning tokens, semantic conventions, and implementation in OpenAI agent responses. * populate agents SDK tests with fixture data. Simplify fixture data generation tooling. increased test coverage * Add chat completion support to openai_agents. Cleanup OpenAI agents instrumentation. * Agents instrumentor cleanup. * Cleanup. * Cleanup init. * absolute import. * fix breaking error. * Correct naming * rename * Refactor completions to always use semantic conventions. * More robust output * use openai_agents tracing api to gather span data. * Agents associates spans with a parent span and exports. * OpenAi responses instrumentor. * Delete examples/agents-examples/basic/hello_world.py * pass strings to serialize and return them early. * deduplication and better hierarchy. simplification of tests. separation of concerns. * Notes and working documents that should not make it into main. * more descriptive debug messaging in OpenAI Agents instrumentor * pertinent testing information in claude.md. * better version determination for the library. * Test for generation tokens as well. * Cleanup attribute formatting to use modular function format with specific responsibilites. Spans are now nested and started/ended at the correct time. Tests generate fixture data from the live API for OpenAI agents. * Remove duplicated model export from processor. * nest all spans under the parent_trace root span and open and close the root span only after execution is complete * clean up common attributes parsing helpers. * Simplify processor. * Cleanup exporter. * Cleanup instrumentor * Cleanup attributes * Update README and SPANS definition. Add example with tool usage. * Fix tool usage example. * Get completion data on outputs. * Delete notes * Fix tests for attributes. Rewmove debug statements. * Implement tests for OpenAi agents. * Better naming for spans. * Openai Response type parsing improvements. * Cleanup exporter imports and naming. * Handoff agent example. * Cleanup imports on common. * Disable openai completions/responses tests. TODO probably delete these. * Disable openai responses intrumentor; it is handled inside openai_agents exclusively for now. * Add note about enabling chat.completions api instead of responses. * Move exporter convention notes to README * Update tests. * Disable openai responses instrumentation test. * Skip `parse` serialization tests. * Cleanup openai responses instrumention and tests; will be included in a separate PR. * Resolve type checking errors. * get correct library version * remove debug statements and import LIBRARY_VERSION * Log deeplink to trace on AgentOps dashboard. (#879) * Log deeplink to trace on AgentOps dashboard. * Test coverage, type checking. * Get app_url from config. * Don't format trace_id in the URL as a UUID, just a hex string. --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Pratyush Shukla <[email protected]> Co-authored-by: Claude <[email protected]>
1 parent a59b664 commit c1a4015

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+5387
-1642
lines changed

agentops/helpers/serialization.py

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,64 @@ def serialize_uuid(obj: UUID) -> str:
7272
return str(obj)
7373

7474

75+
def model_to_dict(obj: Any) -> dict:
76+
"""Convert a model object to a dictionary safely.
77+
78+
Handles various model types including:
79+
- Pydantic models (model_dump/dict methods)
80+
- Dictionary-like objects
81+
- API response objects with parse method
82+
- Objects with __dict__ attribute
83+
84+
Args:
85+
obj: The model object to convert to dictionary
86+
87+
Returns:
88+
Dictionary representation of the object, or empty dict if conversion fails
89+
"""
90+
if obj is None:
91+
return {}
92+
if isinstance(obj, dict):
93+
return obj
94+
if hasattr(obj, "model_dump"): # Pydantic v2
95+
return obj.model_dump()
96+
elif hasattr(obj, "dict"): # Pydantic v1
97+
return obj.dict()
98+
# TODO this is causing recursion on nested objects.
99+
# elif hasattr(obj, "parse"): # Raw API response
100+
# return model_to_dict(obj.parse())
101+
else:
102+
# Try to use __dict__ as fallback
103+
try:
104+
return obj.__dict__
105+
except:
106+
return {}
107+
108+
75109
def safe_serialize(obj: Any) -> Any:
76-
"""Safely serialize an object to JSON-compatible format"""
110+
"""Safely serialize an object to JSON-compatible format
111+
112+
This function handles complex objects by:
113+
1. Returning strings untouched (even if they contain JSON)
114+
2. Converting models to dictionaries
115+
3. Using custom JSON encoder to handle special types
116+
4. Falling back to string representation only when necessary
117+
118+
Args:
119+
obj: The object to serialize
120+
121+
Returns:
122+
If obj is a string, returns the original string untouched.
123+
Otherwise, returns a JSON string representation of the object.
124+
"""
125+
# Return strings untouched
126+
if isinstance(obj, str):
127+
return obj
128+
129+
# Convert any model objects to dictionaries
130+
if hasattr(obj, "model_dump") or hasattr(obj, "dict") or hasattr(obj, "parse"):
131+
obj = model_to_dict(obj)
132+
77133
try:
78134
return json.dumps(obj, cls=AgentOpsJSONEncoder)
79135
except (TypeError, ValueError) as e:
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# OpenTelemetry Implementation Notes
2+
3+
This document outlines best practices and implementation details for OpenTelemetry in AgentOps instrumentations.
4+
5+
## Key Concepts
6+
7+
### Context Propagation
8+
9+
OpenTelemetry relies on proper context propagation to maintain parent-child relationships between spans. This is essential for:
10+
11+
- Creating accurate trace waterfalls in visualizations
12+
- Ensuring all spans from the same logical operation share a trace ID
13+
- Allowing proper querying and filtering of related operations
14+
15+
### Core Patterns
16+
17+
When implementing instrumentations that need to maintain context across different execution contexts:
18+
19+
1. **Store span contexts in dictionaries:**
20+
```python
21+
# Use weakref dictionaries to avoid memory leaks
22+
self._span_contexts = weakref.WeakKeyDictionary()
23+
self._trace_root_contexts = weakref.WeakKeyDictionary()
24+
```
25+
26+
2. **Create spans with explicit parent contexts:**
27+
```python
28+
parent_context = self._get_parent_context(trace_obj)
29+
with trace.start_as_current_span(
30+
name=span_name,
31+
context=parent_context,
32+
kind=trace.SpanKind.CLIENT,
33+
attributes=attributes,
34+
) as span:
35+
# Span operations here
36+
# Store the span's context for future reference
37+
context = trace.set_span_in_context(span)
38+
self._span_contexts[span_obj] = context
39+
```
40+
41+
3. **Implement helper methods to retrieve appropriate parent contexts:**
42+
```python
43+
def _get_parent_context(self, trace_obj):
44+
# Try to get the trace's root context if it exists
45+
if trace_obj in self._trace_root_contexts:
46+
return self._trace_root_contexts[trace_obj]
47+
48+
# Otherwise, use the current context
49+
return context_api.context.get_current()
50+
```
51+
52+
4. **Debug trace continuity:**
53+
```python
54+
current_span = trace.get_current_span()
55+
span_context = current_span.get_span_context()
56+
trace_id = format_trace_id(span_context.trace_id)
57+
logging.debug(f"Current span trace ID: {trace_id}")
58+
```
59+
60+
## Common Pitfalls
61+
62+
1. **Naming conflicts:** Avoid using `trace` as a parameter name when you're also importing the OpenTelemetry `trace` module
63+
```python
64+
# Bad
65+
def on_trace_start(self, trace):
66+
# This will cause conflicts with the imported trace module
67+
68+
# Good
69+
def on_trace_start(self, trace_obj):
70+
# No conflicts with OpenTelemetry's trace module
71+
```
72+
73+
2. **Missing parent contexts:** Always explicitly provide parent contexts when available, don't rely on current context alone
74+
75+
3. **Memory leaks:** Use `weakref.WeakKeyDictionary()` for storing spans to allow garbage collection
76+
77+
4. **Lost context:** When calling async or callback functions, be sure to preserve and pass the context
78+
79+
## Testing Context Propagation
80+
81+
To verify proper context propagation:
82+
83+
1. Enable debug logging for trace IDs
84+
2. Run a simple end-to-end test that generates multiple spans
85+
3. Verify all spans share the same trace ID
86+
4. Check that parent-child relationships are correctly established
87+
88+
```python
89+
# Example debug logging
90+
logging.debug(f"Span {span.name} has trace ID: {format_trace_id(span.get_span_context().trace_id)}")
91+
```
92+
93+
## Timestamp Handling in OpenTelemetry
94+
95+
When working with OpenTelemetry spans and timestamps:
96+
97+
1. **Automatic Timestamp Tracking:** OpenTelemetry automatically tracks timestamps for spans. When a span is created with `tracer.start_span()` or `tracer.start_as_current_span()`, the start time is captured automatically. When `span.end()` is called, the end time is recorded.
98+
99+
2. **No Manual Timestamp Setting Required:** The standard instrumentation pattern does not require manually setting timestamp attributes on spans. Instead, OpenTelemetry handles this internally through the SpanProcessor and Exporter classes.
100+
101+
3. **Timestamp Representation:** In the OpenTelemetry data model, timestamps are stored as nanoseconds since the Unix epoch (January 1, 1970).
102+
103+
4. **Serialization Responsibility:** The serialization of timestamps from OTel spans to output formats like JSON is handled by the Exporter components. If timestamps aren't appearing correctly in output APIs, the issue is likely in the API exporter, not in the span creation code.
104+
105+
5. **Debugging Timestamps:** To debug timestamp issues, verify that spans are properly starting and ending, rather than manually setting timestamp attributes:
106+
107+
```python
108+
# Good pattern - timestamps handled by OpenTelemetry automatically
109+
with tracer.start_as_current_span("my_operation") as span:
110+
# Do work
111+
pass # span.end() is called automatically
112+
```
113+
114+
Note: If timestamps are missing in API output (e.g., empty "start_time" fields), focus on fixes in the exporter and serialization layer, not by manually tracking timestamps in instrumentation code.
115+
116+
## Attributes in OpenTelemetry
117+
118+
When working with span attributes in OpenTelemetry:
119+
120+
1. **Root Attributes Node:** The root `attributes` object in the API output JSON should always be empty. This is by design. All attribute data should be stored in the `span_attributes` object.
121+
122+
2. **Span Attributes:** The `span_attributes` object is where all user-defined and semantic attribute data should be stored. This allows for a structured, hierarchical representation of attributes.
123+
124+
3. **Structure Difference:** While the root `attributes` appears as an empty object in the API output, this is normal and expected. Do not attempt to populate this object directly or duplicate data from `span_attributes` into it.
125+
126+
4. **Setting Attributes:** Always set span attributes using the semantic conventions defined in the `agentops/semconv` module:
127+
128+
```python
129+
from agentops.semconv import agent
130+
131+
# Good pattern - using semantic conventions
132+
span.set_attribute(agent.AGENT_NAME, "My Agent")
133+
```

agentops/instrumentation/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,8 @@ def get_instance(self) -> BaseInstrumentor:
6868
provider_import_name="crewai",
6969
),
7070
InstrumentorLoader(
71-
module_name="opentelemetry.instrumentation.agents",
72-
class_name="AgentsInstrumentor",
71+
module_name="agentops.instrumentation.openai_agents",
72+
class_name="OpenAIAgentsInstrumentor",
7373
provider_import_name="agents",
7474
),
7575
]
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# OpenAI Agents SDK Instrumentation
2+
3+
This module provides automatic instrumentation for the OpenAI Agents SDK, adding telemetry that follows OpenTelemetry semantic conventions for Generative AI systems.
4+
5+
## Architecture Overview
6+
7+
The OpenAI Agents SDK instrumentor works by:
8+
9+
1. Intercepting the Agents SDK's trace processor interface to capture Agent, Function, Generation, and other span types
10+
2. Monkey-patching the Agents SDK `Runner` class to capture the full execution lifecycle, including streaming operations
11+
3. Converting all captured data to OpenTelemetry spans and metrics following semantic conventions
12+
13+
The instrumentation is organized into several key components:
14+
15+
1. **Instrumentor (`instrumentor.py`)**: The entry point that patches the Agents SDK and configures trace capture
16+
2. **Processor (`processor.py`)**: Receives events from the SDK and prepares them for export
17+
3. **Exporter (`exporter.py`)**: Converts SDK spans to OpenTelemetry spans and exports them
18+
4. **Attributes Module (`attributes/`)**: Specialized modules for extracting and formatting span attributes
19+
20+
## Attribute Processing Modules
21+
22+
The attribute modules extract and format OpenTelemetry-compatible attributes from span data:
23+
24+
- **Common (`attributes/common.py`)**: Core attribute extraction functions for all span types and utility functions
25+
- **Completion (`attributes/completion.py`)**: Handles different completion content formats (Chat Completions API, Response API, Agents SDK)
26+
- **Model (`attributes/model.py`)**: Extracts model information and parameters
27+
- **Tokens (`attributes/tokens.py`)**: Processes token usage data and metrics
28+
- **Response (`attributes/response.py`)**: Handles interpretation of Response API objects
29+
30+
Each getter function in these modules is focused on a single responsibility and does not modify global state. Functions are designed to be composable, allowing different attribute types to be combined as needed in the exporter.
31+
32+
## Span Types
33+
34+
The instrumentor captures the following span types:
35+
36+
- **Trace**: The root span representing an entire agent workflow execution
37+
- Created using `get_base_trace_attributes()` to initialize with standard fields
38+
- Captures workflow name, trace ID, and workflow-level metadata
39+
40+
- **Agent**: Represents an agent's execution lifecycle
41+
- Processed using `get_agent_span_attributes()` with `AGENT_SPAN_ATTRIBUTES` mapping
42+
- Uses `SpanKind.CONSUMER` to indicate an agent receiving a request
43+
- Captures agent name, input, output, tools, and other metadata
44+
45+
- **Function**: Represents a tool/function call
46+
- Processed using `get_function_span_attributes()` with `FUNCTION_SPAN_ATTRIBUTES` mapping
47+
- Uses `SpanKind.CLIENT` to indicate an outbound call to a function
48+
- Captures function name, input arguments, output results, and from_agent information
49+
50+
- **Generation**: Captures details of model generation
51+
- Processed using `get_generation_span_attributes()` with `GENERATION_SPAN_ATTRIBUTES` mapping
52+
- Uses `SpanKind.CLIENT` to indicate an outbound call to an LLM
53+
- Captures model name, configuration, usage statistics, and response content
54+
55+
- **Response**: Lightweight span for tracking model response data
56+
- Processed using `get_response_span_attributes()` with `RESPONSE_SPAN_ATTRIBUTES` mapping
57+
- Extracts response content and metadata from different API formats
58+
59+
- **Handoff**: Represents control transfer between agents
60+
- Processed using `get_handoff_span_attributes()` with `HANDOFF_SPAN_ATTRIBUTES` mapping
61+
- Tracks from_agent and to_agent information
62+
63+
## Span Lifecycle Management
64+
65+
The exporter (`exporter.py`) handles the full span lifecycle:
66+
67+
1. **Start Events**:
68+
- Create spans but DO NOT END them
69+
- Store span references in tracking dictionaries
70+
- Use OpenTelemetry's start_span to control when spans end
71+
- Leave status as UNSET to indicate in-progress
72+
73+
2. **End Events**:
74+
- Look up existing span by ID in tracking dictionaries
75+
- If found and not ended:
76+
- Update span with all final attributes
77+
- Set status to OK or ERROR based on task outcome
78+
- End the span manually
79+
- If not found or already ended:
80+
- Create a new complete span with all data
81+
- End it immediately
82+
83+
3. **Error Handling**:
84+
- Check if spans are already ended before attempting updates
85+
- Provide informative log messages about span lifecycle
86+
- Properly clean up tracking resources
87+
88+
This approach is essential because:
89+
- Agents SDK sends separate start and end events for each task
90+
- We need to maintain a single span for the entire task lifecycle to get accurate timing
91+
- Final data (outputs, token usage, etc.) is only available at the end event
92+
- We want to avoid creating duplicate spans for the same task
93+
- Spans must be properly created and ended to avoid leaks
94+
95+
The span lifecycle management ensures spans have:
96+
- Accurate start and end times (preserving the actual task duration)
97+
- Complete attribute data from both start and end events
98+
- Proper status reflecting task completion
99+
- All final outputs, errors, and metrics
100+
- Clean resource management with no memory leaks
101+
102+
## Key Design Patterns
103+
104+
### Semantic Conventions
105+
106+
All attribute names follow the OpenTelemetry semantic conventions defined in `agentops.semconv`:
107+
108+
```python
109+
# Using constants from semconv module
110+
attributes[CoreAttributes.TRACE_ID] = trace_id
111+
attributes[WorkflowAttributes.WORKFLOW_NAME] = trace.name
112+
attributes[SpanAttributes.LLM_SYSTEM] = "openai"
113+
attributes[MessageAttributes.COMPLETION_CONTENT.format(i=0)] = content
114+
```
115+
116+
### Target → Source Attribute Mapping
117+
118+
We use a consistent pattern for attribute extraction with typed mapping dictionaries:
119+
120+
```python
121+
# Attribute mapping example
122+
AGENT_SPAN_ATTRIBUTES: AttributeMap = {
123+
# target_attribute: source_attribute
124+
AgentAttributes.AGENT_NAME: "name",
125+
WorkflowAttributes.WORKFLOW_INPUT: "input",
126+
WorkflowAttributes.FINAL_OUTPUT: "output",
127+
# ...
128+
}
129+
```
130+
131+
### Structured Attribute Handling
132+
133+
- Always use MessageAttributes semantic conventions for content and tool calls
134+
- For chat completions, use MessageAttributes.COMPLETION_CONTENT.format(i=0)
135+
- For tool calls, use MessageAttributes.TOOL_CALL_NAME.format(i=0, j=0), etc.
136+
- Never try to combine or aggregate contents into a single attribute
137+
- Each message component should have its own properly formatted attribute
138+
- This ensures proper display in OpenTelemetry backends and dashboards
139+
140+
### Serialization Rules
141+
142+
1. We do not serialize data structures arbitrarily; everything has a semantic convention
143+
2. Span attributes should use semantic conventions and avoid complex serialized structures
144+
3. Keep all string data in its original form - do not parse JSON within strings
145+
4. If a function has JSON attributes for its arguments, do not parse that JSON - keep as string
146+
5. If a completion or response body text/content contains JSON, keep it as a string
147+
7. Function arguments and tool call arguments should remain in their raw string form
148+
149+
### Critical Notes for Attribute Handling
150+
151+
- NEVER manually set the root completion attributes (`SpanAttributes.LLM_COMPLETIONS` or "gen_ai.completion")
152+
- Let OpenTelemetry backend derive these values from the detailed attributes
153+
- Setting root completion attributes creates duplication and inconsistency
154+
- Tests should verify attribute existence using MessageAttributes constants
155+
- Do not check for the presence of SpanAttributes.LLM_COMPLETIONS
156+
- Verify individual content/tool attributes instead of root attributes

0 commit comments

Comments
 (0)