Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
b248d24
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] Mar 14, 2025
30eb11e
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] Mar 14, 2025
d6c2f8a
Tests for completions.
tcdent Mar 15, 2025
9283b83
Separate OpenAI tests into `completion` and `responses`
tcdent Mar 15, 2025
770b37a
Refactor completions and responses unit tests.
tcdent Mar 15, 2025
29a115f
agents SDK test using semantic conventions.
tcdent Mar 15, 2025
a67deb7
semantic conventions in openai completions and responses tests
tcdent Mar 15, 2025
6f1e77a
Exporter refactor and generalization. standardization and simplificat…
tcdent Mar 15, 2025
5b4e940
Continued refactor of Agents instrumentor. Usurp third-party implemen…
tcdent Mar 15, 2025
0169502
Semantic conventions for messages.
tcdent Mar 15, 2025
960a01f
Tools for generating real test data from OpenAI Agents.
tcdent Mar 15, 2025
124a469
support tool calls and set of responses. missing import
tcdent Mar 15, 2025
ce5b122
reasoning tokens, semantic conventions, and implementation in OpenAI …
tcdent Mar 15, 2025
039978b
populate agents SDK tests with fixture data. Simplify fixture data ge…
tcdent Mar 15, 2025
1fa5fb6
Add chat completion support to openai_agents. Cleanup OpenAI agents i…
tcdent Mar 15, 2025
72ab339
Agents instrumentor cleanup.
tcdent Mar 15, 2025
d206b67
Cleanup.
tcdent Mar 15, 2025
4661fa5
Cleanup init.
tcdent Mar 15, 2025
e44a509
absolute import.
tcdent Mar 15, 2025
cf73879
Merge branch 'main' into serialization-fix-test
dot-agi Mar 15, 2025
913d18b
fix breaking error.
tcdent Mar 15, 2025
d5ac88d
Correct naming
tcdent Mar 15, 2025
734b15d
rename
tcdent Mar 15, 2025
9e8c845
Refactor completions to always use semantic conventions.
tcdent Mar 15, 2025
c6e9bff
More robust output
tcdent Mar 15, 2025
4c17725
use openai_agents tracing api to gather span data.
tcdent Mar 16, 2025
1e140cf
Agents associates spans with a parent span and exports.
tcdent Mar 16, 2025
6d268ec
OpenAi responses instrumentor.
tcdent Mar 16, 2025
cccfef8
Merge branch 'main' into serialization-fix-test
dot-agi Mar 16, 2025
91fea4f
Delete examples/agents-examples/basic/hello_world.py
tcdent Mar 16, 2025
8c9ec5c
pass strings to serialize and return them early.
tcdent Mar 16, 2025
11cc97d
deduplication and better hierarchy. simplification of tests. separati…
tcdent Mar 16, 2025
f01d6dd
Notes and working documents that should not make it into main.
tcdent Mar 16, 2025
1ac8077
Merge main into serialization-fix-test-drafts
tcdent Mar 18, 2025
59a4fc7
more descriptive debug messaging in OpenAI Agents instrumentor
tcdent Mar 18, 2025
1ad9fd7
pertinent testing information in claude.md.
tcdent Mar 18, 2025
c4cb26e
better version determination for the library.
tcdent Mar 18, 2025
c60e29a
Test for generation tokens as well.
tcdent Mar 18, 2025
1d2e4f7
Cleanup attribute formatting to use modular function format with spec…
tcdent Mar 19, 2025
32d7e88
Remove duplicated model export from processor.
tcdent Mar 20, 2025
4256384
nest all spans under the parent_trace root span and open and close th…
tcdent Mar 20, 2025
016172a
clean up common attributes parsing helpers.
tcdent Mar 20, 2025
be9448a
Simplify processor.
tcdent Mar 20, 2025
60392a0
Cleanup exporter.
tcdent Mar 20, 2025
99cd3c5
Cleanup instrumentor
tcdent Mar 20, 2025
62f3bf5
Cleanup attributes
tcdent Mar 20, 2025
8f0f44d
Update README and SPANS definition. Add example with tool usage.
tcdent Mar 20, 2025
cd9954d
Fix tool usage example.
tcdent Mar 21, 2025
d4fe0e8
Get completion data on outputs.
tcdent Mar 21, 2025
8bee74e
Delete notes
tcdent Mar 21, 2025
830f504
Fix tests for attributes. Rewmove debug statements.
tcdent Mar 21, 2025
9bfda9f
Implement tests for OpenAi agents.
tcdent Mar 21, 2025
bd30017
Merge branch 'main' into serialization-fix-test
dot-agi Mar 21, 2025
14c9837
Better naming for spans.
tcdent Mar 21, 2025
1465821
Openai Response type parsing improvements.
tcdent Mar 21, 2025
89d9683
Cleanup exporter imports and naming.
tcdent Mar 21, 2025
9f13810
Handoff agent example.
tcdent Mar 21, 2025
c98bbbf
Cleanup imports on common.
tcdent Mar 21, 2025
6afe3fc
Disable openai completions/responses tests. TODO probably delete these.
tcdent Mar 21, 2025
f325529
Disable openai responses intrumentor; it is handled inside openai_age…
tcdent Mar 21, 2025
7fb5725
Add note about enabling chat.completions api instead of responses.
tcdent Mar 21, 2025
80e30e8
Move exporter convention notes to README
tcdent Mar 21, 2025
3f1a793
Update tests.
tcdent Mar 21, 2025
314cb88
Disable openai responses instrumentation test.
tcdent Mar 21, 2025
528e5b3
Skip `parse` serialization tests.
tcdent Mar 21, 2025
bb71461
Cleanup openai responses instrumention and tests; will be included in…
tcdent Mar 24, 2025
9e3208f
Resolve type checking errors.
tcdent Mar 24, 2025
c91d78c
get correct library version
dot-agi Mar 24, 2025
7c29e96
remove debug statements and import LIBRARY_VERSION
dot-agi Mar 24, 2025
ef7dc3e
Merge branch 'main' into serialization-fix-test
tcdent Mar 24, 2025
cc14de9
Log deeplink to trace on AgentOps dashboard. (#879)
tcdent Mar 24, 2025
be94e41
Merge branch 'main' into serialization-fix-test
tcdent Mar 24, 2025
c8a6531
Merge branch 'main' into serialization-fix-test
dot-agi Mar 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion agentops/helpers/serialization.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,64 @@
return str(obj)


def model_to_dict(obj: Any) -> dict:
"""Convert a model object to a dictionary safely.

Handles various model types including:
- Pydantic models (model_dump/dict methods)
- Dictionary-like objects
- API response objects with parse method
- Objects with __dict__ attribute

Args:
obj: The model object to convert to dictionary

Returns:
Dictionary representation of the object, or empty dict if conversion fails
"""
if obj is None:
return {}
if isinstance(obj, dict):
return obj
if hasattr(obj, "model_dump"): # Pydantic v2
return obj.model_dump()
elif hasattr(obj, "dict"): # Pydantic v1
return obj.dict()
# TODO this is causing recursion on nested objects.
# elif hasattr(obj, "parse"): # Raw API response
# return model_to_dict(obj.parse())
else:
# Try to use __dict__ as fallback
try:
return obj.__dict__
except:
return {}

Check warning on line 106 in agentops/helpers/serialization.py

View check run for this annotation

Codecov / codecov/patch

agentops/helpers/serialization.py#L105-L106

Added lines #L105 - L106 were not covered by tests


def safe_serialize(obj: Any) -> Any:
"""Safely serialize an object to JSON-compatible format"""
"""Safely serialize an object to JSON-compatible format

This function handles complex objects by:
1. Returning strings untouched (even if they contain JSON)
2. Converting models to dictionaries
3. Using custom JSON encoder to handle special types
4. Falling back to string representation only when necessary

Args:
obj: The object to serialize

Returns:
If obj is a string, returns the original string untouched.
Otherwise, returns a JSON string representation of the object.
"""
# Return strings untouched
if isinstance(obj, str):
return obj

# Convert any model objects to dictionaries
if hasattr(obj, "model_dump") or hasattr(obj, "dict") or hasattr(obj, "parse"):
obj = model_to_dict(obj)

try:
return json.dumps(obj, cls=AgentOpsJSONEncoder)
except (TypeError, ValueError) as e:
Expand Down
133 changes: 133 additions & 0 deletions agentops/instrumentation/OpenTelemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# OpenTelemetry Implementation Notes

This document outlines best practices and implementation details for OpenTelemetry in AgentOps instrumentations.

## Key Concepts

### Context Propagation

OpenTelemetry relies on proper context propagation to maintain parent-child relationships between spans. This is essential for:

- Creating accurate trace waterfalls in visualizations
- Ensuring all spans from the same logical operation share a trace ID
- Allowing proper querying and filtering of related operations

### Core Patterns

When implementing instrumentations that need to maintain context across different execution contexts:

1. **Store span contexts in dictionaries:**
```python
# Use weakref dictionaries to avoid memory leaks
self._span_contexts = weakref.WeakKeyDictionary()
self._trace_root_contexts = weakref.WeakKeyDictionary()
```

2. **Create spans with explicit parent contexts:**
```python
parent_context = self._get_parent_context(trace_obj)
with trace.start_as_current_span(
name=span_name,
context=parent_context,
kind=trace.SpanKind.CLIENT,
attributes=attributes,
) as span:
# Span operations here
# Store the span's context for future reference
context = trace.set_span_in_context(span)
self._span_contexts[span_obj] = context
```

3. **Implement helper methods to retrieve appropriate parent contexts:**
```python
def _get_parent_context(self, trace_obj):
# Try to get the trace's root context if it exists
if trace_obj in self._trace_root_contexts:
return self._trace_root_contexts[trace_obj]

# Otherwise, use the current context
return context_api.context.get_current()
```

4. **Debug trace continuity:**
```python
current_span = trace.get_current_span()
span_context = current_span.get_span_context()
trace_id = format_trace_id(span_context.trace_id)
logging.debug(f"Current span trace ID: {trace_id}")
```

## Common Pitfalls

1. **Naming conflicts:** Avoid using `trace` as a parameter name when you're also importing the OpenTelemetry `trace` module
```python
# Bad
def on_trace_start(self, trace):
# This will cause conflicts with the imported trace module

# Good
def on_trace_start(self, trace_obj):
# No conflicts with OpenTelemetry's trace module
```

2. **Missing parent contexts:** Always explicitly provide parent contexts when available, don't rely on current context alone

3. **Memory leaks:** Use `weakref.WeakKeyDictionary()` for storing spans to allow garbage collection

4. **Lost context:** When calling async or callback functions, be sure to preserve and pass the context

## Testing Context Propagation

To verify proper context propagation:

1. Enable debug logging for trace IDs
2. Run a simple end-to-end test that generates multiple spans
3. Verify all spans share the same trace ID
4. Check that parent-child relationships are correctly established

```python
# Example debug logging
logging.debug(f"Span {span.name} has trace ID: {format_trace_id(span.get_span_context().trace_id)}")
```

## Timestamp Handling in OpenTelemetry

When working with OpenTelemetry spans and timestamps:

1. **Automatic Timestamp Tracking:** OpenTelemetry automatically tracks timestamps for spans. When a span is created with `tracer.start_span()` or `tracer.start_as_current_span()`, the start time is captured automatically. When `span.end()` is called, the end time is recorded.

2. **No Manual Timestamp Setting Required:** The standard instrumentation pattern does not require manually setting timestamp attributes on spans. Instead, OpenTelemetry handles this internally through the SpanProcessor and Exporter classes.

3. **Timestamp Representation:** In the OpenTelemetry data model, timestamps are stored as nanoseconds since the Unix epoch (January 1, 1970).

4. **Serialization Responsibility:** The serialization of timestamps from OTel spans to output formats like JSON is handled by the Exporter components. If timestamps aren't appearing correctly in output APIs, the issue is likely in the API exporter, not in the span creation code.

5. **Debugging Timestamps:** To debug timestamp issues, verify that spans are properly starting and ending, rather than manually setting timestamp attributes:

```python
# Good pattern - timestamps handled by OpenTelemetry automatically
with tracer.start_as_current_span("my_operation") as span:
# Do work
pass # span.end() is called automatically
```

Note: If timestamps are missing in API output (e.g., empty "start_time" fields), focus on fixes in the exporter and serialization layer, not by manually tracking timestamps in instrumentation code.

## Attributes in OpenTelemetry

When working with span attributes in OpenTelemetry:

1. **Root Attributes Node:** The root `attributes` object in the API output JSON should always be empty. This is by design. All attribute data should be stored in the `span_attributes` object.

2. **Span Attributes:** The `span_attributes` object is where all user-defined and semantic attribute data should be stored. This allows for a structured, hierarchical representation of attributes.

3. **Structure Difference:** While the root `attributes` appears as an empty object in the API output, this is normal and expected. Do not attempt to populate this object directly or duplicate data from `span_attributes` into it.

4. **Setting Attributes:** Always set span attributes using the semantic conventions defined in the `agentops/semconv` module:

```python
from agentops.semconv import agent

# Good pattern - using semantic conventions
span.set_attribute(agent.AGENT_NAME, "My Agent")
```
4 changes: 2 additions & 2 deletions agentops/instrumentation/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ def get_instance(self) -> BaseInstrumentor:
provider_import_name="crewai",
),
InstrumentorLoader(
module_name="opentelemetry.instrumentation.agents",
class_name="AgentsInstrumentor",
module_name="agentops.instrumentation.openai_agents",
class_name="OpenAIAgentsInstrumentor",
provider_import_name="agents",
),
]
Expand Down
156 changes: 156 additions & 0 deletions agentops/instrumentation/openai_agents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# OpenAI Agents SDK Instrumentation

This module provides automatic instrumentation for the OpenAI Agents SDK, adding telemetry that follows OpenTelemetry semantic conventions for Generative AI systems.

## Architecture Overview

The OpenAI Agents SDK instrumentor works by:

1. Intercepting the Agents SDK's trace processor interface to capture Agent, Function, Generation, and other span types
2. Monkey-patching the Agents SDK `Runner` class to capture the full execution lifecycle, including streaming operations
3. Converting all captured data to OpenTelemetry spans and metrics following semantic conventions

The instrumentation is organized into several key components:

1. **Instrumentor (`instrumentor.py`)**: The entry point that patches the Agents SDK and configures trace capture
2. **Processor (`processor.py`)**: Receives events from the SDK and prepares them for export
3. **Exporter (`exporter.py`)**: Converts SDK spans to OpenTelemetry spans and exports them
4. **Attributes Module (`attributes/`)**: Specialized modules for extracting and formatting span attributes

## Attribute Processing Modules

The attribute modules extract and format OpenTelemetry-compatible attributes from span data:

- **Common (`attributes/common.py`)**: Core attribute extraction functions for all span types and utility functions
- **Completion (`attributes/completion.py`)**: Handles different completion content formats (Chat Completions API, Response API, Agents SDK)
- **Model (`attributes/model.py`)**: Extracts model information and parameters
- **Tokens (`attributes/tokens.py`)**: Processes token usage data and metrics
- **Response (`attributes/response.py`)**: Handles interpretation of Response API objects

Each getter function in these modules is focused on a single responsibility and does not modify global state. Functions are designed to be composable, allowing different attribute types to be combined as needed in the exporter.

## Span Types

The instrumentor captures the following span types:

- **Trace**: The root span representing an entire agent workflow execution
- Created using `get_base_trace_attributes()` to initialize with standard fields
- Captures workflow name, trace ID, and workflow-level metadata

- **Agent**: Represents an agent's execution lifecycle
- Processed using `get_agent_span_attributes()` with `AGENT_SPAN_ATTRIBUTES` mapping
- Uses `SpanKind.CONSUMER` to indicate an agent receiving a request
- Captures agent name, input, output, tools, and other metadata

- **Function**: Represents a tool/function call
- Processed using `get_function_span_attributes()` with `FUNCTION_SPAN_ATTRIBUTES` mapping
- Uses `SpanKind.CLIENT` to indicate an outbound call to a function
- Captures function name, input arguments, output results, and from_agent information

- **Generation**: Captures details of model generation
- Processed using `get_generation_span_attributes()` with `GENERATION_SPAN_ATTRIBUTES` mapping
- Uses `SpanKind.CLIENT` to indicate an outbound call to an LLM
- Captures model name, configuration, usage statistics, and response content

- **Response**: Lightweight span for tracking model response data
- Processed using `get_response_span_attributes()` with `RESPONSE_SPAN_ATTRIBUTES` mapping
- Extracts response content and metadata from different API formats

- **Handoff**: Represents control transfer between agents
- Processed using `get_handoff_span_attributes()` with `HANDOFF_SPAN_ATTRIBUTES` mapping
- Tracks from_agent and to_agent information

## Span Lifecycle Management

The exporter (`exporter.py`) handles the full span lifecycle:

1. **Start Events**:
- Create spans but DO NOT END them
- Store span references in tracking dictionaries
- Use OpenTelemetry's start_span to control when spans end
- Leave status as UNSET to indicate in-progress

2. **End Events**:
- Look up existing span by ID in tracking dictionaries
- If found and not ended:
- Update span with all final attributes
- Set status to OK or ERROR based on task outcome
- End the span manually
- If not found or already ended:
- Create a new complete span with all data
- End it immediately

3. **Error Handling**:
- Check if spans are already ended before attempting updates
- Provide informative log messages about span lifecycle
- Properly clean up tracking resources

This approach is essential because:
- Agents SDK sends separate start and end events for each task
- We need to maintain a single span for the entire task lifecycle to get accurate timing
- Final data (outputs, token usage, etc.) is only available at the end event
- We want to avoid creating duplicate spans for the same task
- Spans must be properly created and ended to avoid leaks

The span lifecycle management ensures spans have:
- Accurate start and end times (preserving the actual task duration)
- Complete attribute data from both start and end events
- Proper status reflecting task completion
- All final outputs, errors, and metrics
- Clean resource management with no memory leaks

## Key Design Patterns

### Semantic Conventions

All attribute names follow the OpenTelemetry semantic conventions defined in `agentops.semconv`:

```python
# Using constants from semconv module
attributes[CoreAttributes.TRACE_ID] = trace_id
attributes[WorkflowAttributes.WORKFLOW_NAME] = trace.name
attributes[SpanAttributes.LLM_SYSTEM] = "openai"
attributes[MessageAttributes.COMPLETION_CONTENT.format(i=0)] = content
```

### Target → Source Attribute Mapping

We use a consistent pattern for attribute extraction with typed mapping dictionaries:

```python
# Attribute mapping example
AGENT_SPAN_ATTRIBUTES: AttributeMap = {
# target_attribute: source_attribute
AgentAttributes.AGENT_NAME: "name",
WorkflowAttributes.WORKFLOW_INPUT: "input",
WorkflowAttributes.FINAL_OUTPUT: "output",
# ...
}
```

### Structured Attribute Handling

- Always use MessageAttributes semantic conventions for content and tool calls
- For chat completions, use MessageAttributes.COMPLETION_CONTENT.format(i=0)
- For tool calls, use MessageAttributes.TOOL_CALL_NAME.format(i=0, j=0), etc.
- Never try to combine or aggregate contents into a single attribute
- Each message component should have its own properly formatted attribute
- This ensures proper display in OpenTelemetry backends and dashboards

### Serialization Rules

1. We do not serialize data structures arbitrarily; everything has a semantic convention
2. Span attributes should use semantic conventions and avoid complex serialized structures
3. Keep all string data in its original form - do not parse JSON within strings
4. If a function has JSON attributes for its arguments, do not parse that JSON - keep as string
5. If a completion or response body text/content contains JSON, keep it as a string
7. Function arguments and tool call arguments should remain in their raw string form

### Critical Notes for Attribute Handling

- NEVER manually set the root completion attributes (`SpanAttributes.LLM_COMPLETIONS` or "gen_ai.completion")
- Let OpenTelemetry backend derive these values from the detailed attributes
- Setting root completion attributes creates duplication and inconsistency
- Tests should verify attribute existence using MessageAttributes constants
- Do not check for the presence of SpanAttributes.LLM_COMPLETIONS
- Verify individual content/tool attributes instead of root attributes
Loading
Loading