-
Notifications
You must be signed in to change notification settings - Fork 514
Refactor Agents SDK instrumentation. #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
b248d24
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] 30eb11e
Fix: Improve serialization of completions/responses in Agents SDK ins…
devin-ai-integration[bot] d6c2f8a
Tests for completions.
tcdent 9283b83
Separate OpenAI tests into `completion` and `responses`
tcdent 770b37a
Refactor completions and responses unit tests.
tcdent 29a115f
agents SDK test using semantic conventions.
tcdent a67deb7
semantic conventions in openai completions and responses tests
tcdent 6f1e77a
Exporter refactor and generalization. standardization and simplificat…
tcdent 5b4e940
Continued refactor of Agents instrumentor. Usurp third-party implemen…
tcdent 0169502
Semantic conventions for messages.
tcdent 960a01f
Tools for generating real test data from OpenAI Agents.
tcdent 124a469
support tool calls and set of responses. missing import
tcdent ce5b122
reasoning tokens, semantic conventions, and implementation in OpenAI …
tcdent 039978b
populate agents SDK tests with fixture data. Simplify fixture data ge…
tcdent 1fa5fb6
Add chat completion support to openai_agents. Cleanup OpenAI agents i…
tcdent 72ab339
Agents instrumentor cleanup.
tcdent d206b67
Cleanup.
tcdent 4661fa5
Cleanup init.
tcdent e44a509
absolute import.
tcdent cf73879
Merge branch 'main' into serialization-fix-test
dot-agi 913d18b
fix breaking error.
tcdent d5ac88d
Correct naming
tcdent 734b15d
rename
tcdent 9e8c845
Refactor completions to always use semantic conventions.
tcdent c6e9bff
More robust output
tcdent 4c17725
use openai_agents tracing api to gather span data.
tcdent 1e140cf
Agents associates spans with a parent span and exports.
tcdent 6d268ec
OpenAi responses instrumentor.
tcdent cccfef8
Merge branch 'main' into serialization-fix-test
dot-agi 91fea4f
Delete examples/agents-examples/basic/hello_world.py
tcdent 8c9ec5c
pass strings to serialize and return them early.
tcdent 11cc97d
deduplication and better hierarchy. simplification of tests. separati…
tcdent f01d6dd
Notes and working documents that should not make it into main.
tcdent 1ac8077
Merge main into serialization-fix-test-drafts
tcdent 59a4fc7
more descriptive debug messaging in OpenAI Agents instrumentor
tcdent 1ad9fd7
pertinent testing information in claude.md.
tcdent c4cb26e
better version determination for the library.
tcdent c60e29a
Test for generation tokens as well.
tcdent 1d2e4f7
Cleanup attribute formatting to use modular function format with spec…
tcdent 32d7e88
Remove duplicated model export from processor.
tcdent 4256384
nest all spans under the parent_trace root span and open and close th…
tcdent 016172a
clean up common attributes parsing helpers.
tcdent be9448a
Simplify processor.
tcdent 60392a0
Cleanup exporter.
tcdent 99cd3c5
Cleanup instrumentor
tcdent 62f3bf5
Cleanup attributes
tcdent 8f0f44d
Update README and SPANS definition. Add example with tool usage.
tcdent cd9954d
Fix tool usage example.
tcdent d4fe0e8
Get completion data on outputs.
tcdent 8bee74e
Delete notes
tcdent 830f504
Fix tests for attributes. Rewmove debug statements.
tcdent 9bfda9f
Implement tests for OpenAi agents.
tcdent bd30017
Merge branch 'main' into serialization-fix-test
dot-agi 14c9837
Better naming for spans.
tcdent 1465821
Openai Response type parsing improvements.
tcdent 89d9683
Cleanup exporter imports and naming.
tcdent 9f13810
Handoff agent example.
tcdent c98bbbf
Cleanup imports on common.
tcdent 6afe3fc
Disable openai completions/responses tests. TODO probably delete these.
tcdent f325529
Disable openai responses intrumentor; it is handled inside openai_age…
tcdent 7fb5725
Add note about enabling chat.completions api instead of responses.
tcdent 80e30e8
Move exporter convention notes to README
tcdent 3f1a793
Update tests.
tcdent 314cb88
Disable openai responses instrumentation test.
tcdent 528e5b3
Skip `parse` serialization tests.
tcdent bb71461
Cleanup openai responses instrumention and tests; will be included in…
tcdent 9e3208f
Resolve type checking errors.
tcdent c91d78c
get correct library version
dot-agi 7c29e96
remove debug statements and import LIBRARY_VERSION
dot-agi ef7dc3e
Merge branch 'main' into serialization-fix-test
tcdent cc14de9
Log deeplink to trace on AgentOps dashboard. (#879)
tcdent be94e41
Merge branch 'main' into serialization-fix-test
tcdent c8a6531
Merge branch 'main' into serialization-fix-test
dot-agi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| """ | ||
| AgentOps instrumentation utilities for OpenAI | ||
|
|
||
| This module provides shared utilities for instrumenting various OpenAI products and APIs. | ||
| It centralizes common functions and behaviors to ensure consistent instrumentation | ||
| across all OpenAI-related components. | ||
|
|
||
| IMPORTANT DISTINCTION BETWEEN OPENAI API FORMATS: | ||
| 1. OpenAI Completions API - The traditional API format using prompt_tokens/completion_tokens | ||
| 2. OpenAI Response API - The newer format used by the Agents SDK using input_tokens/output_tokens | ||
| 3. Agents SDK - The framework that uses Response API format | ||
|
|
||
| This module implements utilities that handle both formats consistently. | ||
| """ | ||
|
|
||
| import logging | ||
| from typing import Any, Dict, List, Optional, Union | ||
|
|
||
| # Import span attributes from semconv | ||
| from agentops.semconv import SpanAttributes | ||
|
|
||
| # Logger | ||
| logger = logging.getLogger(__name__) | ||
|
|
||
| def get_value(data: Dict[str, Any], keys: Union[str, List[str]]) -> Optional[Any]: | ||
| """ | ||
| Get a value from a dictionary using a key or prioritized list of keys. | ||
|
|
||
| Args: | ||
| data: Source dictionary | ||
| keys: A single key or list of keys in priority order | ||
|
|
||
| Returns: | ||
| The value if found, or None if not found | ||
| """ | ||
| if isinstance(keys, str): | ||
| return data.get(keys) | ||
|
|
||
| for key in keys: | ||
| if key in data: | ||
| return data[key] | ||
|
|
||
| return None | ||
|
|
||
| def process_token_usage(usage: Dict[str, Any], attributes: Dict[str, Any]) -> None: | ||
| """ | ||
| Process token usage metrics from any OpenAI API response and add them to span attributes. | ||
|
|
||
| This function maps token usage fields from various API formats to standardized | ||
| attribute names according to OpenTelemetry semantic conventions: | ||
|
|
||
| - OpenAI ChatCompletion API uses: prompt_tokens, completion_tokens, total_tokens | ||
| - OpenAI Response API uses: input_tokens, output_tokens, total_tokens | ||
|
|
||
| Both formats are mapped to the standardized OTel attributes. | ||
|
|
||
| Args: | ||
| usage: Dictionary containing token usage metrics from an OpenAI API | ||
| attributes: The span attributes dictionary where the metrics will be added | ||
| """ | ||
| if not usage or not isinstance(usage, dict): | ||
| return | ||
|
|
||
| # Define mapping for standard usage metrics (target → source) | ||
| token_mapping = { | ||
| # Standard tokens mapping (target attribute → source field) | ||
| SpanAttributes.LLM_USAGE_TOTAL_TOKENS: "total_tokens", | ||
tcdent marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| SpanAttributes.LLM_USAGE_PROMPT_TOKENS: ["prompt_tokens", "input_tokens"], | ||
| SpanAttributes.LLM_USAGE_COMPLETION_TOKENS: ["completion_tokens", "output_tokens"], | ||
| } | ||
|
|
||
| # Apply the mapping for all token usage fields | ||
| for target_attr, source_keys in token_mapping.items(): | ||
| value = get_value(usage, source_keys) | ||
| if value is not None: | ||
| attributes[target_attr] = value | ||
|
|
||
| # Process output_tokens_details if present | ||
| if "output_tokens_details" in usage and isinstance(usage["output_tokens_details"], dict): | ||
| process_token_details(usage["output_tokens_details"], attributes) | ||
|
|
||
|
|
||
| def process_token_details(details: Dict[str, Any], attributes: Dict[str, Any]) -> None: | ||
| """ | ||
| Process detailed token metrics from OpenAI API responses and add them to span attributes. | ||
|
|
||
| This function maps token detail fields (like reasoning_tokens) to standardized attribute names | ||
| according to semantic conventions, ensuring consistent telemetry across the system. | ||
|
|
||
| Args: | ||
| details: Dictionary containing token detail metrics from an OpenAI API | ||
| attributes: The span attributes dictionary where the metrics will be added | ||
| """ | ||
| if not details or not isinstance(details, dict): | ||
| return | ||
|
|
||
| # Token details attribute mapping for standardized token metrics | ||
| # Maps standardized attribute names to API-specific token detail keys (target → source) | ||
| token_details_mapping = { | ||
| f"{SpanAttributes.LLM_USAGE_TOTAL_TOKENS}.reasoning": "reasoning_tokens", | ||
tcdent marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| # Add more mappings here as OpenAI introduces new token detail types | ||
| } | ||
|
|
||
| # Process all token detail fields | ||
| for detail_key, detail_value in details.items(): | ||
| # First check if there's a mapping for this key | ||
| mapped = False | ||
| for target_attr, source_key in token_details_mapping.items(): | ||
| if source_key == detail_key: | ||
| attributes[target_attr] = detail_value | ||
| mapped = True | ||
| break | ||
|
|
||
| # For unknown token details, use generic naming format | ||
| if not mapped: | ||
| attributes[f"{SpanAttributes.LLM_USAGE_TOTAL_TOKENS}.{detail_key}"] = detail_value | ||
tcdent marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # OpenAI Agents SDK Instrumentation | ||
|
|
||
| This module provides automatic instrumentation for the OpenAI Agents SDK, adding telemetry that follows OpenTelemetry semantic conventions for Generative AI systems. | ||
|
|
||
| ## Architecture Overview | ||
|
|
||
| The OpenAI Agents SDK instrumentor works by: | ||
|
|
||
| 1. Intercepting the Agents SDK's trace processor interface to capture Agent, Function, Generation, and other span types | ||
| 2. Monkey-patching the Agents SDK `Runner` class to capture the full execution lifecycle, including streaming operations | ||
| 3. Converting all captured data to OpenTelemetry spans and metrics following semantic conventions | ||
|
|
||
| ## Span Types | ||
|
|
||
| The instrumentor captures the following span types: | ||
|
|
||
| - **Trace**: The root span representing an entire agent workflow execution | ||
| - Implementation: `_export_trace()` method in `exporter.py` | ||
| - Creates a span with the trace name, ID, and workflow metadata | ||
dot-agi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| - **Agent**: Represents an agent's execution lifecycle | ||
| - Implementation: `_process_agent_span()` method in `exporter.py` | ||
| - Uses `SpanKind.CONSUMER` to indicate an agent receiving a request | ||
| - Captures agent name, input, output, tools, and other metadata | ||
|
|
||
| - **Function**: Represents a tool/function call | ||
| - Implementation: `_process_function_span()` method in `exporter.py` | ||
| - Uses `SpanKind.CLIENT` to indicate an outbound call to a function | ||
| - Captures function name, input arguments, output results, and error information | ||
|
|
||
| - **Generation**: Captures details of model generation | ||
| - Implementation: `_process_generation_span()` method in `exporter.py` | ||
| - Uses `SpanKind.CLIENT` to indicate an outbound call to an LLM | ||
| - Captures model name, configuration, usage statistics, and response content | ||
|
|
||
| - **Response**: Lightweight span for tracking model response IDs | ||
| - Implementation: Handled within `_process_response_api()` and `_process_completions()` methods | ||
dot-agi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Extracts response IDs and metadata from both Chat Completion API and Response API formats | ||
|
|
||
| - **Handoff**: Represents control transfer between agents | ||
| - Implementation: Captured through the `AgentAttributes.HANDOFFS` attribute | ||
dot-agi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Maps from the Agents SDK's "handoffs" field to standardized attribute name | ||
|
|
||
| ## Metrics | ||
|
|
||
| The instrumentor collects the following metrics: | ||
|
|
||
| - **Agent Runs**: Number of agent runs | ||
| - Implementation: `_agent_run_counter` in `instrumentor.py` | ||
| - Incremented at the start of each agent run with metadata about the agent and run configuration | ||
|
|
||
| - **Agent Turns**: Number of agent turns | ||
| - Implementation: Inferred from raw responses processing | ||
| - Each raw response represents a turn in the conversation | ||
|
|
||
| - **Agent Execution Time**: Time taken for agent execution | ||
| - Implementation: `_agent_execution_time_histogram` in `instrumentor.py` | ||
| - Measured from the start of an agent run to its completion | ||
|
|
||
| - **Token Usage**: Number of input and output tokens used | ||
| - Implementation: `_agent_token_usage_histogram` in `instrumentor.py` | ||
| - Records both prompt and completion tokens separately with appropriate labels | ||
|
|
||
| ## Key Design Patterns | ||
|
|
||
| ### Target → Source Mapping Pattern | ||
|
|
||
| We use a consistent pattern for attribute mapping where dictionary keys represent the target attribute names (what we want in the final span), and values represent the source field names (where the data comes from): | ||
|
|
||
| ```python | ||
| _CONFIG_MAPPING = { | ||
| # Target semantic convention → source field | ||
| <SemanticConvention>: Union[str, list[str]], | ||
| # ... | ||
| } | ||
| ``` | ||
|
|
||
| This pattern makes it easy to maintain mappings and apply them consistently. | ||
|
|
||
| ### Multi-API Format Support | ||
|
|
||
| The instrumentor handles both OpenAI API formats: | ||
|
|
||
| 1. **Chat Completion API**: Traditional format with "choices" array and prompt_tokens/completion_tokens | ||
| 2. **Response API**: Newer format with "output" array and input_tokens/output_tokens | ||
|
|
||
| The implementation intelligently detects which format is being used and processes accordingly. | ||
|
|
||
|
|
||
| ### Streaming Operation Tracking | ||
|
|
||
| When instrumenting streaming operations, we: | ||
|
|
||
| 1. Track active streaming operations using unique IDs | ||
| 2. Handle proper flushing of spans to ensure metrics are recorded | ||
| 3. Create separate spans for token usage metrics to avoid premature span closure | ||
|
|
||
| ### Response API Content Extraction | ||
|
|
||
| The Response API has a nested structure for content: | ||
|
|
||
| ``` | ||
| output → message → content → [items] → text | ||
| ``` | ||
|
|
||
| Extracting the actual text requires special handling: | ||
|
|
||
| ```python | ||
| # From _process_response_api in exporter.py | ||
| if isinstance(content_items, list): | ||
| # Combine text from all text items | ||
| texts = [] | ||
| for content_item in content_items: | ||
| if content_item.get("type") == "output_text" and "text" in content_item: | ||
| texts.append(content_item["text"]) | ||
|
|
||
| # Join texts (even if empty) | ||
| attributes[f"{prefix}.content"] = " ".join(texts) | ||
| ``` | ||
|
|
||
|
|
||
| ## TODO | ||
| - Add support for additional semantic conventions | ||
| - `gen_ai` doesn't have conventions for response data beyond `role` and `content` | ||
| - We're shoehorning `responses` into `completions` since the spec doesn't | ||
| have a convention in place for this yet. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| """ | ||
| AgentOps Instrumentor for OpenAI Agents SDK | ||
|
|
||
| This module provides automatic instrumentation for the OpenAI Agents SDK when AgentOps is imported. | ||
| It implements a clean, maintainable implementation that follows semantic conventions. | ||
|
|
||
| IMPORTANT DISTINCTION BETWEEN OPENAI API FORMATS: | ||
| 1. OpenAI Completions API - The traditional API format using prompt_tokens/completion_tokens | ||
| 2. OpenAI Response API - The newer format used by the Agents SDK using input_tokens/output_tokens | ||
| 3. Agents SDK - The framework that uses Response API format | ||
|
|
||
| The Agents SDK uses the Response API format, which we handle using shared utilities from | ||
| agentops.instrumentation.openai. | ||
| """ | ||
| from typing import Optional | ||
| import importlib.metadata | ||
| from agentops.logging import logger | ||
|
|
||
| def get_version(): | ||
| """Get the version of the agents SDK, or 'unknown' if not found""" | ||
| try: | ||
| installed_version = importlib.metadata.version("agents") | ||
| return installed_version | ||
| except importlib.metadata.PackageNotFoundError: | ||
| logger.debug("`agents` package not found; unable to determine installed version.") | ||
| return None | ||
|
|
||
| LIBRARY_NAME = "agents-sdk" | ||
tcdent marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| LIBRARY_VERSION: Optional[str] = get_version() # Actual OpenAI Agents SDK version | ||
|
|
||
| # Import after defining constants to avoid circular imports | ||
| from .instrumentor import AgentsInstrumentor | ||
|
|
||
| __all__ = [ | ||
| "LIBRARY_NAME", | ||
| "LIBRARY_VERSION", | ||
| "SDK_VERSION", | ||
| "AgentsInstrumentor", | ||
| ] | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.