Skip to content

Commit c721898

Browse files
authored
chore(llmobs): [MLOB-3622] reuse tool call tracker for other agentic frameworks (#14277)
This PR updates the Pydantic AI, Crew AI, and LangChain integrations to support tool call tracking which is used to link LLM and Tool spans in the LLM Obs service. Currently, this works for Open AI Chat completions + tools from the aforementioned integrations, but we can add support for other LLM / tool spans in the near future. This PR is based on the ToolCallTracker which was already being used to link LLM --> Tool --> LLM spans in the Open AI Agents integration. The ToolCallTracker contains the logic for 1. registering tool call tracking information when we detect an LLM has chosen to run a particular tool (`on_llm_tool_choice`) 2. linking the LLM span that generated the tool call with the Tool span itself when it is executed (`on_tool_call`) 3. linking the Tool span to any downstream LLM spans that use the tool result as an input (`on_tool_call_output_used`) This PR also adds support for ReAct tool invocations made via the Open AI Chat API which are tool calls formatted via the content string rather than a normal tool call struct. ## Important Notes - I have removed Agent --> Tool --> Agent span links for Pydantic AI since these span links were mainly for visualization purposes but really the new LLM --> Tool --> LLM should be used instead. - I left out tests for CrewAI x OpenAI span linking because I was running into the same issue as with LiteLLM where patching OpenAI for a single test works when running that single test but fails to patch OpenAI correctly when running the entire suite of tests together. In order to reduce flakiness issues, I have opted for manual testing instead. ## Manual Testing ### Pydantic AI ``` from pydantic_ai import Agent def calculate_square(x: int) -> int: return x * x agent = Agent( 'openai:gpt-4o', system_prompt='Use one of the tools provided to calculate the mathematical operation.', tools=[calculate_square], ) result = agent.run_sync('What is the square of 5?') print(result.output) ``` [resulting trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZjjHzgXMlRnHgAAABhBWmpqSHpnWEFBQ0J3dk1ja25PekFBQUEAAAAkZjE5OGUzMWYtNjJhMS00YzhkLTkzYTUtYTE2YzE3YWM0NGQwAAAECg%22%7D%5D&spanId=11240522994332064438&start=1756156882647&end=1756157782647&paused=false) ### Crew AI ``` from crewai import Agent, Task, Crew, LLM from crewai.tools import tool llm = LLM( model="gpt-4o", ) @tool("calculate_sum") def calculate_sum(min: int, max: int) -> int: """Calculate the sum of all numbers between min and max.""" return sum(range(min, max+1)) calculator = Agent( role='Mathematical Calculator', goal='Perform accurate mathematical calculations', backstory='You are an expert mathematician who can solve complex calculations with precision.', llm=llm, verbose=True, tools=[calculate_sum], ) calculation_task = Task( description='Calculate the sum of all numbers from 1 to 100', agent=calculator, expected_output='The sum of all numbers from 1 to 100', guardrails="hello", ) crew = Crew( agents=[calculator], tasks=[calculation_task], ) result = crew.kickoff() ``` [resulting trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZjjMje1G3UMxAAAABhBWmpqTWplMUFBRFVCbnhNVk5VZkFBQUEAAAAkZjE5OGUzMzItNWVjZS00MzYzLTlkZTAtMWZhYzA3MzEzMzhmAAACwQ%22%7D%5D&spanId=11094600582484716815&start=1756157503005&end=1756158403005&paused=false) I also tried running this same app with streaming enabled to verify that plain text tool calls within the streamed response were captured properly and spans were linked as expected. Here is the resulting [trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZjjMBA1pthr5QAAABhBWmpqTUJBMUFBQjU2a1B0RTR6X0FBQUEAAAAkZjE5OGUzMzAtMzg0OS00N2RjLWEwZmYtMGRkOWJlN2UzOGY5AAADEQ%22%7D%5D&spanId=13023585481275335674&start=1756157395103&end=1756158295103&paused=false). ### OpenAI Agents ``` from agents import Agent, function_tool from agents.run import Runner import asyncio @function_tool def math_help() -> str: """ Answers questions about math. """ return "The answer to the question is 42" math_tutor_agent = Agent( name="Math Tutor", handoff_description="Specialist agent for math questions", instructions="You provide help with math problems. Please use the tools to find the answer.", model="o3-mini", tools=[math_help], ) triage_agent = Agent( name="Triage Agent", instructions="You determine which agent to use based on the user's homework question", handoffs=[math_tutor_agent], ) async def main(): result = await Runner.run(triage_agent, "what is the sum of the numbers between 1 and 100?", max_turns=3) print(result.final_output) if __name__ == "__main__": asyncio.run(main()) ``` [resulting trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=false&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZjjZACIFuYb9wAAABhBWmpqWkFDSUFBQTJRZUZWTk9zakFBQUEAAAAkZjE5OGUzNjQtMzM1NS00NjIxLWE3NmQtMTJjNTlmYTlmYTc5AAABKw%22%7D%5D&spanId=18255958774617952305&start=1756161171215&end=1756162071215&paused=false) ### LangGraph ``` from langgraph.prebuilt import create_react_agent from langchain_openai import ChatOpenAI import os from langchain_core.tools import tool @tool def add(a: int, b: int) -> int: """Adds two numbers together""" return a + b model = ChatOpenAI( model="gpt-4o-mini", temperature=0.5, api_key=os.getenv("OPENAI_API_KEY", "<not-a-real-key>"), ) agent = create_react_agent( model=model, tools=[add], prompt="You are a helpful assistant who talks with a Boston accent but is also very nice. You speak in full sentences with at least 15 words. Please use the tools to answer the question.", name="not_your_average_bostonian", ) agent.invoke({"messages": [{"role": "user", "content": "What is 2 + 2?"}]}) ``` [resulting trace](https://app.datadoghq.com/llm/traces?query=%40ml_app%3Anicole-test%20%40event_type%3Aspan%20%40parent_id%3Aundefined&agg_m=count&agg_m_source=base&agg_t=count&fromUser=true&llmPanels=%5B%7B%22t%22%3A%22sampleDetailPanel%22%2C%22rEID%22%3A%22AwAAAZjjbAqZtOj3JQAAABhBWmpqYkFxWkFBQTlySDExekN0bUFBQUEAAAAkMDE5OGUzNmMtMTYwNC00Y2MzLWFlMWUtNGU0MTE4YWY3OWYxAAABlQ%22%7D%5D&spanId=10602121188312712730&start=1756161275437&end=1756162175437&paused=false) ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
1 parent 4b9e06a commit c721898

40 files changed

+1157
-464
lines changed

.riot/requirements/c215040.txt renamed to .riot/requirements/1010ab9.txt

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
# This file is autogenerated by pip-compile with Python 3.13
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/c215040.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/1010ab9.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
99
attrs==25.3.0
1010
certifi==2025.8.3
11-
charset-normalizer==3.4.2
12-
coverage[toml]==7.10.2
11+
charset-normalizer==3.4.3
12+
coverage[toml]==7.10.4
1313
distro==1.9.0
1414
h11==0.16.0
1515
httpcore==1.0.9
@@ -20,17 +20,19 @@ iniconfig==2.1.0
2020
jiter==0.10.0
2121
jsonpatch==1.33
2222
jsonpointer==3.0.0
23-
langchain-core==0.3.72
24-
langchain-openai==0.3.28
25-
langgraph==0.6.3
23+
langchain==0.3.27
24+
langchain-core==0.3.74
25+
langchain-openai==0.3.30
26+
langchain-text-splitters==0.3.9
27+
langgraph==0.6.5
2628
langgraph-checkpoint==2.1.1
27-
langgraph-prebuilt==0.6.3
28-
langgraph-sdk==0.2.0
29-
langsmith==0.4.10
29+
langgraph-prebuilt==0.6.4
30+
langgraph-sdk==0.2.2
31+
langsmith==0.4.14
3032
mock==5.2.0
31-
openai==1.98.0
33+
openai==1.100.2
3234
opentracing==2.4.0
33-
orjson==3.11.1
35+
orjson==3.11.2
3436
ormsgpack==1.10.0
3537
packaging==25.0
3638
pluggy==1.6.0
@@ -43,15 +45,16 @@ pytest-cov==6.2.1
4345
pytest-mock==3.14.1
4446
pyyaml==6.0.2
4547
regex==2025.7.34
46-
requests==2.32.4
48+
requests==2.32.5
4749
requests-toolbelt==1.0.0
4850
sniffio==1.3.1
4951
sortedcontainers==2.4.0
52+
sqlalchemy==2.0.43
5053
tenacity==9.1.2
51-
tiktoken==0.9.0
54+
tiktoken==0.11.0
5255
tqdm==4.67.1
5356
typing-extensions==4.14.1
5457
typing-inspection==0.4.1
5558
urllib3==2.5.0
5659
xxhash==3.5.0
57-
zstandard==0.23.0
60+
zstandard==0.24.0

.riot/requirements/7fc9a52.txt renamed to .riot/requirements/10f2f3e.txt

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,16 @@
22
# This file is autogenerated by pip-compile with Python 3.10
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/7fc9a52.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/10f2f3e.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
9+
async-timeout==4.0.3
910
attrs==25.3.0
1011
backports-asyncio-runner==1.2.0
1112
certifi==2025.8.3
12-
charset-normalizer==3.4.2
13-
coverage[toml]==7.10.2
13+
charset-normalizer==3.4.3
14+
coverage[toml]==7.10.4
1415
distro==1.9.0
1516
exceptiongroup==1.3.0
1617
h11==0.16.0
@@ -22,17 +23,19 @@ iniconfig==2.1.0
2223
jiter==0.10.0
2324
jsonpatch==1.33
2425
jsonpointer==3.0.0
25-
langchain-core==0.3.72
26-
langchain-openai==0.3.28
27-
langgraph==0.6.3
26+
langchain==0.3.27
27+
langchain-core==0.3.74
28+
langchain-openai==0.3.30
29+
langchain-text-splitters==0.3.9
30+
langgraph==0.6.5
2831
langgraph-checkpoint==2.1.1
29-
langgraph-prebuilt==0.6.3
30-
langgraph-sdk==0.2.0
31-
langsmith==0.4.10
32+
langgraph-prebuilt==0.6.4
33+
langgraph-sdk==0.2.2
34+
langsmith==0.4.14
3235
mock==5.2.0
33-
openai==1.98.0
36+
openai==1.100.2
3437
opentracing==2.4.0
35-
orjson==3.11.1
38+
orjson==3.11.2
3639
ormsgpack==1.10.0
3740
packaging==25.0
3841
pluggy==1.6.0
@@ -45,16 +48,17 @@ pytest-cov==6.2.1
4548
pytest-mock==3.14.1
4649
pyyaml==6.0.2
4750
regex==2025.7.34
48-
requests==2.32.4
51+
requests==2.32.5
4952
requests-toolbelt==1.0.0
5053
sniffio==1.3.1
5154
sortedcontainers==2.4.0
55+
sqlalchemy==2.0.43
5256
tenacity==9.1.2
53-
tiktoken==0.9.0
57+
tiktoken==0.11.0
5458
tomli==2.2.1
5559
tqdm==4.67.1
5660
typing-extensions==4.14.1
5761
typing-inspection==0.4.1
5862
urllib3==2.5.0
5963
xxhash==3.5.0
60-
zstandard==0.23.0
64+
zstandard==0.24.0

.riot/requirements/1014e75.txt renamed to .riot/requirements/11e37fa.txt

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
# This file is autogenerated by pip-compile with Python 3.13
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/1014e75.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/11e37fa.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
99
attrs==25.3.0
1010
certifi==2025.8.3
11-
charset-normalizer==3.4.2
12-
coverage[toml]==7.10.2
11+
charset-normalizer==3.4.3
12+
coverage[toml]==7.10.4
1313
distro==1.9.0
1414
h11==0.16.0
1515
httpcore==1.0.9
@@ -20,17 +20,19 @@ iniconfig==2.1.0
2020
jiter==0.10.0
2121
jsonpatch==1.33
2222
jsonpointer==3.0.0
23-
langchain-core==0.3.72
24-
langchain-openai==0.3.28
23+
langchain==0.3.27
24+
langchain-core==0.3.74
25+
langchain-openai==0.3.30
26+
langchain-text-splitters==0.3.9
2527
langgraph==0.3.22
2628
langgraph-checkpoint==2.1.1
2729
langgraph-prebuilt==0.1.8
2830
langgraph-sdk==0.1.74
29-
langsmith==0.4.10
31+
langsmith==0.4.14
3032
mock==5.2.0
31-
openai==1.98.0
33+
openai==1.100.2
3234
opentracing==2.4.0
33-
orjson==3.11.1
35+
orjson==3.11.2
3436
ormsgpack==1.10.0
3537
packaging==25.0
3638
pluggy==1.6.0
@@ -43,15 +45,16 @@ pytest-cov==6.2.1
4345
pytest-mock==3.14.1
4446
pyyaml==6.0.2
4547
regex==2025.7.34
46-
requests==2.32.4
48+
requests==2.32.5
4749
requests-toolbelt==1.0.0
4850
sniffio==1.3.1
4951
sortedcontainers==2.4.0
52+
sqlalchemy==2.0.43
5053
tenacity==9.1.2
51-
tiktoken==0.9.0
54+
tiktoken==0.11.0
5255
tqdm==4.67.1
5356
typing-extensions==4.14.1
5457
typing-inspection==0.4.1
5558
urllib3==2.5.0
5659
xxhash==3.5.0
57-
zstandard==0.23.0
60+
zstandard==0.24.0

.riot/requirements/60408b5.txt renamed to .riot/requirements/148cc44.txt

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,16 @@
22
# This file is autogenerated by pip-compile with Python 3.10
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/60408b5.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/148cc44.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
9+
async-timeout==4.0.3
910
attrs==25.3.0
1011
backports-asyncio-runner==1.2.0
1112
certifi==2025.8.3
12-
charset-normalizer==3.4.2
13-
coverage[toml]==7.10.2
13+
charset-normalizer==3.4.3
14+
coverage[toml]==7.10.4
1415
distro==1.9.0
1516
exceptiongroup==1.3.0
1617
h11==0.16.0
@@ -22,16 +23,18 @@ iniconfig==2.1.0
2223
jiter==0.10.0
2324
jsonpatch==1.33
2425
jsonpointer==3.0.0
25-
langchain-core==0.3.72
26-
langchain-openai==0.3.28
26+
langchain==0.3.27
27+
langchain-core==0.3.74
28+
langchain-openai==0.3.30
29+
langchain-text-splitters==0.3.9
2730
langgraph==0.2.23
2831
langgraph-checkpoint==1.0.12
29-
langsmith==0.4.10
32+
langsmith==0.4.14
3033
mock==5.2.0
3134
msgpack==1.1.1
32-
openai==1.98.0
35+
openai==1.100.2
3336
opentracing==2.4.0
34-
orjson==3.11.1
37+
orjson==3.11.2
3538
packaging==25.0
3639
pluggy==1.6.0
3740
pydantic==2.11.7
@@ -43,15 +46,16 @@ pytest-cov==6.2.1
4346
pytest-mock==3.14.1
4447
pyyaml==6.0.2
4548
regex==2025.7.34
46-
requests==2.32.4
49+
requests==2.32.5
4750
requests-toolbelt==1.0.0
4851
sniffio==1.3.1
4952
sortedcontainers==2.4.0
53+
sqlalchemy==2.0.43
5054
tenacity==9.1.2
51-
tiktoken==0.9.0
55+
tiktoken==0.11.0
5256
tomli==2.2.1
5357
tqdm==4.67.1
5458
typing-extensions==4.14.1
5559
typing-inspection==0.4.1
5660
urllib3==2.5.0
57-
zstandard==0.23.0
61+
zstandard==0.24.0

.riot/requirements/3fed352.txt renamed to .riot/requirements/14bb28e.txt

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
# This file is autogenerated by pip-compile with Python 3.11
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/3fed352.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/14bb28e.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
99
attrs==25.3.0
1010
certifi==2025.8.3
11-
charset-normalizer==3.4.2
12-
coverage[toml]==7.10.2
11+
charset-normalizer==3.4.3
12+
coverage[toml]==7.10.4
1313
distro==1.9.0
1414
h11==0.16.0
1515
httpcore==1.0.9
@@ -20,16 +20,18 @@ iniconfig==2.1.0
2020
jiter==0.10.0
2121
jsonpatch==1.33
2222
jsonpointer==3.0.0
23-
langchain-core==0.3.72
24-
langchain-openai==0.3.28
23+
langchain==0.3.27
24+
langchain-core==0.3.74
25+
langchain-openai==0.3.30
26+
langchain-text-splitters==0.3.9
2527
langgraph==0.2.23
2628
langgraph-checkpoint==1.0.12
27-
langsmith==0.4.10
29+
langsmith==0.4.14
2830
mock==5.2.0
2931
msgpack==1.1.1
30-
openai==1.98.0
32+
openai==1.100.2
3133
opentracing==2.4.0
32-
orjson==3.11.1
34+
orjson==3.11.2
3335
packaging==25.0
3436
pluggy==1.6.0
3537
pydantic==2.11.7
@@ -41,14 +43,15 @@ pytest-cov==6.2.1
4143
pytest-mock==3.14.1
4244
pyyaml==6.0.2
4345
regex==2025.7.34
44-
requests==2.32.4
46+
requests==2.32.5
4547
requests-toolbelt==1.0.0
4648
sniffio==1.3.1
4749
sortedcontainers==2.4.0
50+
sqlalchemy==2.0.43
4851
tenacity==9.1.2
49-
tiktoken==0.9.0
52+
tiktoken==0.11.0
5053
tqdm==4.67.1
5154
typing-extensions==4.14.1
5255
typing-inspection==0.4.1
5356
urllib3==2.5.0
54-
zstandard==0.23.0
57+
zstandard==0.24.0

.riot/requirements/5858798.txt renamed to .riot/requirements/153586b.txt

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,14 @@
22
# This file is autogenerated by pip-compile with Python 3.11
33
# by the following command:
44
#
5-
# pip-compile --allow-unsafe --no-annotate .riot/requirements/5858798.in
5+
# pip-compile --allow-unsafe --no-annotate .riot/requirements/153586b.in
66
#
77
annotated-types==0.7.0
88
anyio==4.10.0
99
attrs==25.3.0
1010
certifi==2025.8.3
11-
charset-normalizer==3.4.2
12-
coverage[toml]==7.10.2
11+
charset-normalizer==3.4.3
12+
coverage[toml]==7.10.4
1313
distro==1.9.0
1414
h11==0.16.0
1515
httpcore==1.0.9
@@ -20,17 +20,19 @@ iniconfig==2.1.0
2020
jiter==0.10.0
2121
jsonpatch==1.33
2222
jsonpointer==3.0.0
23-
langchain-core==0.3.72
24-
langchain-openai==0.3.28
23+
langchain==0.3.27
24+
langchain-core==0.3.74
25+
langchain-openai==0.3.30
26+
langchain-text-splitters==0.3.9
2527
langgraph==0.3.22
2628
langgraph-checkpoint==2.1.1
2729
langgraph-prebuilt==0.1.8
2830
langgraph-sdk==0.1.74
29-
langsmith==0.4.10
31+
langsmith==0.4.14
3032
mock==5.2.0
31-
openai==1.98.0
33+
openai==1.100.2
3234
opentracing==2.4.0
33-
orjson==3.11.1
35+
orjson==3.11.2
3436
ormsgpack==1.10.0
3537
packaging==25.0
3638
pluggy==1.6.0
@@ -43,15 +45,16 @@ pytest-cov==6.2.1
4345
pytest-mock==3.14.1
4446
pyyaml==6.0.2
4547
regex==2025.7.34
46-
requests==2.32.4
48+
requests==2.32.5
4749
requests-toolbelt==1.0.0
4850
sniffio==1.3.1
4951
sortedcontainers==2.4.0
52+
sqlalchemy==2.0.43
5053
tenacity==9.1.2
51-
tiktoken==0.9.0
54+
tiktoken==0.11.0
5255
tqdm==4.67.1
5356
typing-extensions==4.14.1
5457
typing-inspection==0.4.1
5558
urllib3==2.5.0
5659
xxhash==3.5.0
57-
zstandard==0.23.0
60+
zstandard==0.24.0

0 commit comments

Comments
 (0)