-
Notifications
You must be signed in to change notification settings - Fork 509
Refactor Agents SDK instrumentation. #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
b248d24
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] 30eb11e
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] d6c2f8a
Tests for completions.
tcdent 9283b83
Separate OpenAI tests into `completion` and `responses`
tcdent 770b37a
Refactor completions and responses unit tests.
tcdent 29a115f
agents SDK test using semantic conventions.
tcdent a67deb7
semantic conventions in openai completions and responses tests
tcdent 6f1e77a
Exporter refactor and generalization. standardization and simplificat…
tcdent 5b4e940
Continued refactor of Agents instrumentor. Usurp third-party implemen…
tcdent 0169502
Semantic conventions for messages.
tcdent 960a01f
Tools for generating real test data from OpenAI Agents.
tcdent 124a469
support tool calls and set of responses. missing import
tcdent ce5b122
reasoning tokens, semantic conventions, and implementation in OpenAI …
tcdent 039978b
populate agents SDK tests with fixture data. Simplify fixture data ge…
tcdent 1fa5fb6
Add chat completion support to openai_agents. Cleanup OpenAI agents i…
tcdent 72ab339
Agents instrumentor cleanup.
tcdent d206b67
Cleanup.
tcdent 4661fa5
Cleanup init.
tcdent e44a509
absolute import.
tcdent cf73879
Merge branch 'main' into serialization-fix-test
dot-agi 913d18b
fix breaking error.
tcdent d5ac88d
Correct naming
tcdent 734b15d
rename
tcdent 9e8c845
Refactor completions to always use semantic conventions.
tcdent c6e9bff
More robust output
tcdent 4c17725
use openai_agents tracing api to gather span data.
tcdent 1e140cf
Agents associates spans with a parent span and exports.
tcdent 6d268ec
OpenAi responses instrumentor.
tcdent cccfef8
Merge branch 'main' into serialization-fix-test
dot-agi 91fea4f
Delete examples/agents-examples/basic/hello_world.py
tcdent 8c9ec5c
pass strings to serialize and return them early.
tcdent 11cc97d
deduplication and better hierarchy. simplification of tests. separati…
tcdent f01d6dd
Notes and working documents that should not make it into main.
tcdent 1ac8077
Merge main into serialization-fix-test-drafts
tcdent 59a4fc7
more descriptive debug messaging in OpenAI Agents instrumentor
tcdent 1ad9fd7
pertinent testing information in claude.md.
tcdent c4cb26e
better version determination for the library.
tcdent c60e29a
Test for generation tokens as well.
tcdent 1d2e4f7
Cleanup attribute formatting to use modular function format with spec…
tcdent 32d7e88
Remove duplicated model export from processor.
tcdent 4256384
nest all spans under the parent_trace root span and open and close th…
tcdent 016172a
clean up common attributes parsing helpers.
tcdent be9448a
Simplify processor.
tcdent 60392a0
Cleanup exporter.
tcdent 99cd3c5
Cleanup instrumentor
tcdent 62f3bf5
Cleanup attributes
tcdent 8f0f44d
Update README and SPANS definition. Add example with tool usage.
tcdent cd9954d
Fix tool usage example.
tcdent d4fe0e8
Get completion data on outputs.
tcdent 8bee74e
Delete notes
tcdent 830f504
Fix tests for attributes. Rewmove debug statements.
tcdent 9bfda9f
Implement tests for OpenAi agents.
tcdent bd30017
Merge branch 'main' into serialization-fix-test
dot-agi 14c9837
Better naming for spans.
tcdent 1465821
Openai Response type parsing improvements.
tcdent 89d9683
Cleanup exporter imports and naming.
tcdent 9f13810
Handoff agent example.
tcdent c98bbbf
Cleanup imports on common.
tcdent 6afe3fc
Disable openai completions/responses tests. TODO probably delete these.
tcdent f325529
Disable openai responses intrumentor; it is handled inside openai_age…
tcdent 7fb5725
Add note about enabling chat.completions api instead of responses.
tcdent 80e30e8
Move exporter convention notes to README
tcdent 3f1a793
Update tests.
tcdent 314cb88
Disable openai responses instrumentation test.
tcdent 528e5b3
Skip `parse` serialization tests.
tcdent bb71461
Cleanup openai responses instrumention and tests; will be included in…
tcdent 9e3208f
Resolve type checking errors.
tcdent c91d78c
get correct library version
dot-agi 7c29e96
remove debug statements and import LIBRARY_VERSION
dot-agi ef7dc3e
Merge branch 'main' into serialization-fix-test
tcdent cc14de9
Log deeplink to trace on AgentOps dashboard. (#879)
tcdent be94e41
Merge branch 'main' into serialization-fix-test
tcdent c8a6531
Merge branch 'main' into serialization-fix-test
dot-agi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| # OpenTelemetry Implementation Notes | ||
|
|
||
| This document outlines best practices and implementation details for OpenTelemetry in AgentOps instrumentations. | ||
|
|
||
| ## Key Concepts | ||
|
|
||
| ### Context Propagation | ||
|
|
||
| OpenTelemetry relies on proper context propagation to maintain parent-child relationships between spans. This is essential for: | ||
|
|
||
| - Creating accurate trace waterfalls in visualizations | ||
| - Ensuring all spans from the same logical operation share a trace ID | ||
| - Allowing proper querying and filtering of related operations | ||
|
|
||
| ### Core Patterns | ||
|
|
||
| When implementing instrumentations that need to maintain context across different execution contexts: | ||
|
|
||
| 1. **Store span contexts in dictionaries:** | ||
| ```python | ||
| # Use weakref dictionaries to avoid memory leaks | ||
| self._span_contexts = weakref.WeakKeyDictionary() | ||
| self._trace_root_contexts = weakref.WeakKeyDictionary() | ||
| ``` | ||
|
|
||
| 2. **Create spans with explicit parent contexts:** | ||
| ```python | ||
| parent_context = self._get_parent_context(trace_obj) | ||
| with trace.start_as_current_span( | ||
| name=span_name, | ||
| context=parent_context, | ||
| kind=trace.SpanKind.CLIENT, | ||
| attributes=attributes, | ||
| ) as span: | ||
| # Span operations here | ||
| # Store the span's context for future reference | ||
| context = trace.set_span_in_context(span) | ||
| self._span_contexts[span_obj] = context | ||
| ``` | ||
|
|
||
| 3. **Implement helper methods to retrieve appropriate parent contexts:** | ||
| ```python | ||
| def _get_parent_context(self, trace_obj): | ||
| # Try to get the trace's root context if it exists | ||
| if trace_obj in self._trace_root_contexts: | ||
| return self._trace_root_contexts[trace_obj] | ||
|
|
||
| # Otherwise, use the current context | ||
| return context_api.context.get_current() | ||
| ``` | ||
|
|
||
| 4. **Debug trace continuity:** | ||
| ```python | ||
| current_span = trace.get_current_span() | ||
| span_context = current_span.get_span_context() | ||
| trace_id = format_trace_id(span_context.trace_id) | ||
| logging.debug(f"Current span trace ID: {trace_id}") | ||
| ``` | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| 1. **Naming conflicts:** Avoid using `trace` as a parameter name when you're also importing the OpenTelemetry `trace` module | ||
| ```python | ||
| # Bad | ||
| def on_trace_start(self, trace): | ||
| # This will cause conflicts with the imported trace module | ||
|
|
||
| # Good | ||
| def on_trace_start(self, trace_obj): | ||
| # No conflicts with OpenTelemetry's trace module | ||
| ``` | ||
|
|
||
| 2. **Missing parent contexts:** Always explicitly provide parent contexts when available, don't rely on current context alone | ||
|
|
||
| 3. **Memory leaks:** Use `weakref.WeakKeyDictionary()` for storing spans to allow garbage collection | ||
|
|
||
| 4. **Lost context:** When calling async or callback functions, be sure to preserve and pass the context | ||
|
|
||
| ## Testing Context Propagation | ||
|
|
||
| To verify proper context propagation: | ||
|
|
||
| 1. Enable debug logging for trace IDs | ||
| 2. Run a simple end-to-end test that generates multiple spans | ||
| 3. Verify all spans share the same trace ID | ||
| 4. Check that parent-child relationships are correctly established | ||
|
|
||
| ```python | ||
| # Example debug logging | ||
| logging.debug(f"Span {span.name} has trace ID: {format_trace_id(span.get_span_context().trace_id)}") | ||
| ``` | ||
|
|
||
| ## Timestamp Handling in OpenTelemetry | ||
|
|
||
| When working with OpenTelemetry spans and timestamps: | ||
|
|
||
| 1. **Automatic Timestamp Tracking:** OpenTelemetry automatically tracks timestamps for spans. When a span is created with `tracer.start_span()` or `tracer.start_as_current_span()`, the start time is captured automatically. When `span.end()` is called, the end time is recorded. | ||
|
|
||
| 2. **No Manual Timestamp Setting Required:** The standard instrumentation pattern does not require manually setting timestamp attributes on spans. Instead, OpenTelemetry handles this internally through the SpanProcessor and Exporter classes. | ||
|
|
||
| 3. **Timestamp Representation:** In the OpenTelemetry data model, timestamps are stored as nanoseconds since the Unix epoch (January 1, 1970). | ||
|
|
||
| 4. **Serialization Responsibility:** The serialization of timestamps from OTel spans to output formats like JSON is handled by the Exporter components. If timestamps aren't appearing correctly in output APIs, the issue is likely in the API exporter, not in the span creation code. | ||
|
|
||
| 5. **Debugging Timestamps:** To debug timestamp issues, verify that spans are properly starting and ending, rather than manually setting timestamp attributes: | ||
|
|
||
| ```python | ||
| # Good pattern - timestamps handled by OpenTelemetry automatically | ||
| with tracer.start_as_current_span("my_operation") as span: | ||
| # Do work | ||
| pass # span.end() is called automatically | ||
| ``` | ||
|
|
||
| Note: If timestamps are missing in API output (e.g., empty "start_time" fields), focus on fixes in the exporter and serialization layer, not by manually tracking timestamps in instrumentation code. | ||
|
|
||
| ## Attributes in OpenTelemetry | ||
|
|
||
| When working with span attributes in OpenTelemetry: | ||
|
|
||
| 1. **Root Attributes Node:** The root `attributes` object in the API output JSON should always be empty. This is by design. All attribute data should be stored in the `span_attributes` object. | ||
|
|
||
| 2. **Span Attributes:** The `span_attributes` object is where all user-defined and semantic attribute data should be stored. This allows for a structured, hierarchical representation of attributes. | ||
|
|
||
| 3. **Structure Difference:** While the root `attributes` appears as an empty object in the API output, this is normal and expected. Do not attempt to populate this object directly or duplicate data from `span_attributes` into it. | ||
|
|
||
| 4. **Setting Attributes:** Always set span attributes using the semantic conventions defined in the `agentops/semconv` module: | ||
|
|
||
| ```python | ||
| from agentops.semconv import agent | ||
|
|
||
| # Good pattern - using semantic conventions | ||
| span.set_attribute(agent.AGENT_NAME, "My Agent") | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,156 @@ | ||
| # OpenAI Agents SDK Instrumentation | ||
|
|
||
| This module provides automatic instrumentation for the OpenAI Agents SDK, adding telemetry that follows OpenTelemetry semantic conventions for Generative AI systems. | ||
|
|
||
| ## Architecture Overview | ||
|
|
||
| The OpenAI Agents SDK instrumentor works by: | ||
|
|
||
| 1. Intercepting the Agents SDK's trace processor interface to capture Agent, Function, Generation, and other span types | ||
| 2. Monkey-patching the Agents SDK `Runner` class to capture the full execution lifecycle, including streaming operations | ||
| 3. Converting all captured data to OpenTelemetry spans and metrics following semantic conventions | ||
|
|
||
| The instrumentation is organized into several key components: | ||
|
|
||
| 1. **Instrumentor (`instrumentor.py`)**: The entry point that patches the Agents SDK and configures trace capture | ||
| 2. **Processor (`processor.py`)**: Receives events from the SDK and prepares them for export | ||
| 3. **Exporter (`exporter.py`)**: Converts SDK spans to OpenTelemetry spans and exports them | ||
| 4. **Attributes Module (`attributes/`)**: Specialized modules for extracting and formatting span attributes | ||
|
|
||
| ## Attribute Processing Modules | ||
|
|
||
| The attribute modules extract and format OpenTelemetry-compatible attributes from span data: | ||
|
|
||
| - **Common (`attributes/common.py`)**: Core attribute extraction functions for all span types and utility functions | ||
| - **Completion (`attributes/completion.py`)**: Handles different completion content formats (Chat Completions API, Response API, Agents SDK) | ||
| - **Model (`attributes/model.py`)**: Extracts model information and parameters | ||
| - **Tokens (`attributes/tokens.py`)**: Processes token usage data and metrics | ||
| - **Response (`attributes/response.py`)**: Handles interpretation of Response API objects | ||
|
|
||
| Each getter function in these modules is focused on a single responsibility and does not modify global state. Functions are designed to be composable, allowing different attribute types to be combined as needed in the exporter. | ||
|
|
||
| ## Span Types | ||
|
|
||
| The instrumentor captures the following span types: | ||
|
|
||
| - **Trace**: The root span representing an entire agent workflow execution | ||
| - Created using `get_base_trace_attributes()` to initialize with standard fields | ||
| - Captures workflow name, trace ID, and workflow-level metadata | ||
|
|
||
| - **Agent**: Represents an agent's execution lifecycle | ||
| - Processed using `get_agent_span_attributes()` with `AGENT_SPAN_ATTRIBUTES` mapping | ||
| - Uses `SpanKind.CONSUMER` to indicate an agent receiving a request | ||
| - Captures agent name, input, output, tools, and other metadata | ||
|
|
||
| - **Function**: Represents a tool/function call | ||
| - Processed using `get_function_span_attributes()` with `FUNCTION_SPAN_ATTRIBUTES` mapping | ||
| - Uses `SpanKind.CLIENT` to indicate an outbound call to a function | ||
| - Captures function name, input arguments, output results, and from_agent information | ||
|
|
||
| - **Generation**: Captures details of model generation | ||
| - Processed using `get_generation_span_attributes()` with `GENERATION_SPAN_ATTRIBUTES` mapping | ||
| - Uses `SpanKind.CLIENT` to indicate an outbound call to an LLM | ||
| - Captures model name, configuration, usage statistics, and response content | ||
|
|
||
| - **Response**: Lightweight span for tracking model response data | ||
| - Processed using `get_response_span_attributes()` with `RESPONSE_SPAN_ATTRIBUTES` mapping | ||
| - Extracts response content and metadata from different API formats | ||
|
|
||
| - **Handoff**: Represents control transfer between agents | ||
| - Processed using `get_handoff_span_attributes()` with `HANDOFF_SPAN_ATTRIBUTES` mapping | ||
| - Tracks from_agent and to_agent information | ||
|
|
||
| ## Span Lifecycle Management | ||
|
|
||
| The exporter (`exporter.py`) handles the full span lifecycle: | ||
|
|
||
| 1. **Start Events**: | ||
| - Create spans but DO NOT END them | ||
| - Store span references in tracking dictionaries | ||
| - Use OpenTelemetry's start_span to control when spans end | ||
| - Leave status as UNSET to indicate in-progress | ||
|
|
||
| 2. **End Events**: | ||
| - Look up existing span by ID in tracking dictionaries | ||
| - If found and not ended: | ||
| - Update span with all final attributes | ||
| - Set status to OK or ERROR based on task outcome | ||
| - End the span manually | ||
| - If not found or already ended: | ||
| - Create a new complete span with all data | ||
| - End it immediately | ||
|
|
||
| 3. **Error Handling**: | ||
| - Check if spans are already ended before attempting updates | ||
| - Provide informative log messages about span lifecycle | ||
| - Properly clean up tracking resources | ||
|
|
||
| This approach is essential because: | ||
| - Agents SDK sends separate start and end events for each task | ||
| - We need to maintain a single span for the entire task lifecycle to get accurate timing | ||
| - Final data (outputs, token usage, etc.) is only available at the end event | ||
| - We want to avoid creating duplicate spans for the same task | ||
| - Spans must be properly created and ended to avoid leaks | ||
|
|
||
| The span lifecycle management ensures spans have: | ||
| - Accurate start and end times (preserving the actual task duration) | ||
| - Complete attribute data from both start and end events | ||
| - Proper status reflecting task completion | ||
| - All final outputs, errors, and metrics | ||
| - Clean resource management with no memory leaks | ||
|
|
||
| ## Key Design Patterns | ||
|
|
||
| ### Semantic Conventions | ||
|
|
||
| All attribute names follow the OpenTelemetry semantic conventions defined in `agentops.semconv`: | ||
|
|
||
| ```python | ||
| # Using constants from semconv module | ||
| attributes[CoreAttributes.TRACE_ID] = trace_id | ||
| attributes[WorkflowAttributes.WORKFLOW_NAME] = trace.name | ||
| attributes[SpanAttributes.LLM_SYSTEM] = "openai" | ||
| attributes[MessageAttributes.COMPLETION_CONTENT.format(i=0)] = content | ||
| ``` | ||
|
|
||
| ### Target → Source Attribute Mapping | ||
|
|
||
| We use a consistent pattern for attribute extraction with typed mapping dictionaries: | ||
|
|
||
| ```python | ||
| # Attribute mapping example | ||
| AGENT_SPAN_ATTRIBUTES: AttributeMap = { | ||
| # target_attribute: source_attribute | ||
| AgentAttributes.AGENT_NAME: "name", | ||
| WorkflowAttributes.WORKFLOW_INPUT: "input", | ||
| WorkflowAttributes.FINAL_OUTPUT: "output", | ||
| # ... | ||
| } | ||
| ``` | ||
|
|
||
| ### Structured Attribute Handling | ||
|
|
||
| - Always use MessageAttributes semantic conventions for content and tool calls | ||
| - For chat completions, use MessageAttributes.COMPLETION_CONTENT.format(i=0) | ||
| - For tool calls, use MessageAttributes.TOOL_CALL_NAME.format(i=0, j=0), etc. | ||
| - Never try to combine or aggregate contents into a single attribute | ||
| - Each message component should have its own properly formatted attribute | ||
| - This ensures proper display in OpenTelemetry backends and dashboards | ||
|
|
||
| ### Serialization Rules | ||
|
|
||
| 1. We do not serialize data structures arbitrarily; everything has a semantic convention | ||
| 2. Span attributes should use semantic conventions and avoid complex serialized structures | ||
| 3. Keep all string data in its original form - do not parse JSON within strings | ||
| 4. If a function has JSON attributes for its arguments, do not parse that JSON - keep as string | ||
| 5. If a completion or response body text/content contains JSON, keep it as a string | ||
| 7. Function arguments and tool call arguments should remain in their raw string form | ||
|
|
||
| ### Critical Notes for Attribute Handling | ||
|
|
||
| - NEVER manually set the root completion attributes (`SpanAttributes.LLM_COMPLETIONS` or "gen_ai.completion") | ||
| - Let OpenTelemetry backend derive these values from the detailed attributes | ||
| - Setting root completion attributes creates duplication and inconsistency | ||
| - Tests should verify attribute existence using MessageAttributes constants | ||
| - Do not check for the presence of SpanAttributes.LLM_COMPLETIONS | ||
| - Verify individual content/tool attributes instead of root attributes |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.