Skip to content

Commit ed239d0

Browse files
committed
Add documentation for Langgraph and Swarm examples from testing.
1 parent fb45477 commit ed239d0

File tree

2 files changed

+269
-0
lines changed

2 files changed

+269
-0
lines changed

docs/testing/Examples/langgraph.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
title: LangGraph
3+
---
4+
5+
# Intro
6+
7+
LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
8+
9+
## Agent code
10+
11+
You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py).
12+
13+
This can be invoked as:
14+
15+
```python
16+
from langchain_core.messages import HumanMessage
17+
18+
from .weather_agent import WeatherAgent
19+
20+
invocation_response = WeatherAgent().get_graph().invoke(
21+
{"messages": [HumanMessage(content="what is the weather in sf")]},
22+
config={"configurable": {"thread_id": 42}},
23+
)
24+
```
25+
26+
27+
## Running example tests
28+
29+
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
30+
31+
```bash
32+
poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agent.py --push --dataset_name langgraph_weather_agent
33+
```
34+
35+
!!! note
36+
37+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
38+
as higihlighted in the terminal.
39+
40+
## Unit tests
41+
42+
### Test 1:
43+
44+
<div class='tiles'>
45+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/1" class='tile'>
46+
<span class='tile-title'>Open in Explorer →</span>
47+
<span class='tile-description'>See this example in the Invariant Explorer</span>
48+
</a>
49+
</div>
50+
51+
```python
52+
def test_weather_agent_with_only_sf(weather_agent):
53+
"""Test the weather agent with San Francisco."""
54+
invocation_response = weather_agent.invoke(
55+
{"messages": [HumanMessage(content="what is the weather in sf")]},
56+
config={"configurable": {"thread_id": 42}},
57+
)
58+
59+
trace = TraceFactory.from_langgraph(invocation_response)
60+
61+
with trace.as_context():
62+
find_weather_tool_calls = trace.tool_calls(name="_find_weather")
63+
assert_true(F.len(find_weather_tool_calls) == 1)
64+
assert_true(
65+
find_weather_tool_calls[0]["function"]["arguments"].contains(
66+
"San Francisco"
67+
)
68+
)
69+
70+
find_weather_tool_outputs = trace.messages(role="tool")
71+
assert_true(F.len(find_weather_tool_outputs) == 1)
72+
assert_true(
73+
find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
74+
)
75+
76+
assert_true(trace.messages(-1)["content"].contains("60 degrees and foggy"))
77+
```
78+
79+
We first use the `tool_calls()` method to retrieve all tool calls where the name is `_find_weather`, and we assert that there is exactly one such call. We also verify that the argument passed to the tool call includes `San Francisco`.
80+
81+
Next, we use the `messages()` method with the `role="tool"` filter to check the output for `_find_weather` tool call, ensuring that the content of this output contains our desired answer.
82+
83+
Finally, we confirm that the last message also includes our desired answer.
84+
85+
### Test 2:
86+
87+
<div class='tiles'>
88+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/2" class='tile'>
89+
<span class='tile-title'>Open in Explorer →</span>
90+
<span class='tile-description'>See this example in the Invariant Explorer</span>
91+
</a>
92+
</div>
93+
94+
```python
95+
def test_weather_agent_with_sf_and_nyc(weather_agent):
96+
"""Test the weather agent with San Francisco and New York City."""
97+
_ = weather_agent.invoke(
98+
{"messages": [HumanMessage(content="what is the weather in sf")]},
99+
config={"configurable": {"thread_id": 41}},
100+
)
101+
invocation_response = weather_agent.invoke(
102+
{"messages": [HumanMessage(content="what is the weather in nyc")]},
103+
config={"configurable": {"thread_id": 41}},
104+
)
105+
106+
trace = TraceFactory.from_langgraph(invocation_response)
107+
108+
with trace.as_context():
109+
find_weather_tool_calls = trace.tool_calls(name="_find_weather")
110+
assert_true(len(find_weather_tool_calls) == 2)
111+
find_weather_tool_call_args = str(
112+
F.map(lambda x: x["function"]["arguments"], find_weather_tool_calls)
113+
)
114+
assert_true(
115+
"San Francisco" in find_weather_tool_call_args
116+
and "New York City" in find_weather_tool_call_args
117+
)
118+
119+
find_weather_tool_outputs = trace.messages(role="tool")
120+
assert_true(F.len(find_weather_tool_outputs) == 2)
121+
assert_true(
122+
find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
123+
)
124+
assert_true(
125+
find_weather_tool_outputs[1]["content"].contains("90 degrees and sunny")
126+
)
127+
128+
assistant_response_messages = F.filter(
129+
lambda m: m.get("tool_calls") is None, trace.messages(role="assistant")
130+
)
131+
assert_true(len(assistant_response_messages) == 2)
132+
assert_true(
133+
assistant_response_messages[0]["content"].contains(
134+
"weather in San Francisco is"
135+
)
136+
)
137+
assert_true(
138+
assistant_response_messages[1]["content"].contains(
139+
"weather in New York City is"
140+
)
141+
)
142+
```
143+
In this test, we use `F.map` to extract the arguments of the tool calls from the list of tool calls. We then assert that both our queries are present in the arguments list.
144+
145+
There are two types of messages with `role="assistant"`: those where tool calls are made and those corresponding to the final response back to the caller. We use `F.filter` to filter out messages where `role="assistant"` but `tool_calls` is `None`. Finally, we assert that these response messages contain the results of the weather queries.
146+
147+
## Conclusion
148+
149+
We have seen how to to write unit tests for specific test cases when building an agent with the Langgraph library.

docs/testing/Examples/swarm.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
title: OpenAI Swarm
3+
---
4+
5+
# Intro
6+
7+
OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
8+
9+
## Agent code
10+
You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py).
11+
12+
This can be invoked as:
13+
14+
```python
15+
from invariant.testing import SwarmWrapper
16+
from swarm import Swarm
17+
18+
from .capital_finder_agent import create_agent
19+
20+
swarm_wrapper = SwarmWrapper(Swarm())
21+
agent = create_agent()
22+
messages = [{"role": "user", "content": "What is the capital of France?"}]
23+
response = swarm_wrapper.run(
24+
agent=agent,
25+
messages=messages,
26+
)
27+
```
28+
29+
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
30+
31+
## Running example tests
32+
33+
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
34+
35+
```bash
36+
poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
37+
```
38+
39+
!!! note
40+
41+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
42+
as higihlighted in the terminal.
43+
44+
## Unit tests
45+
46+
### Test 1: Capital is correctly returned by the Agent
47+
48+
<div class='tiles'>
49+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/1" class='tile'>
50+
<span class='tile-title'>Open in Explorer →</span>
51+
<span class='tile-description'>See this example in the Invariant Explorer</span>
52+
</a>
53+
</div>
54+
55+
```python
56+
def test_capital_finder_agent_when_capital_found(swarm_wrapper):
57+
"""Test the capital finder agent when the capital is found."""
58+
agent = create_agent()
59+
messages = [{"role": "user", "content": "What is the capital of France?"}]
60+
response = swarm_wrapper.run(
61+
agent=agent,
62+
messages=messages,
63+
)
64+
trace = SwarmWrapper.to_invariant_trace(response)
65+
66+
with trace.as_context():
67+
get_capital_tool_calls = trace.tool_calls(name="get_capital")
68+
assert_true(F.len(get_capital_tool_calls) == 1)
69+
assert_equals(
70+
"France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
71+
)
72+
73+
assert_true(trace.messages(-1)["content"].contains("Paris"))
74+
```
75+
76+
We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer.
77+
78+
### Test 2: Capital is not found by the Agent
79+
80+
<div class='tiles'>
81+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/2" class='tile'>
82+
<span class='tile-title'>Open in Explorer →</span>
83+
<span class='tile-description'>See this example in the Invariant Explorer</span>
84+
</a>
85+
</div>
86+
87+
```python
88+
def test_capital_finder_agent_when_capital_not_found(swarm_wrapper):
89+
"""Test the capital finder agent when the capital is not found."""
90+
agent = create_agent()
91+
messages = [{"role": "user", "content": "What is the capital of Spain?"}]
92+
response = swarm_wrapper.run(
93+
agent=agent,
94+
messages=messages,
95+
)
96+
trace = SwarmWrapper.to_invariant_trace(response)
97+
98+
with trace.as_context():
99+
get_capital_tool_calls = trace.tool_calls(name="get_capital")
100+
assert_true(F.len(get_capital_tool_calls) == 1)
101+
assert_equals(
102+
"Spain", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
103+
)
104+
105+
tool_outputs = trace.tool_outputs(tool_name="get_capital")
106+
assert_true(F.len(tool_outputs) == 1)
107+
assert_true(tool_outputs[0]["content"].contains("not_found"))
108+
109+
assert_false(trace.messages(-1)["content"].contains("Madrid"))
110+
```
111+
112+
We use the `tool_calls()` method to retrieve all calls with the name `get_capital`, asserting that there is exactly one such call and that the argument `country_name` is `Spain`.
113+
114+
Next, we use the `tool_outputs()` method to check the outputs for `get_capital` calls, confirming that the call returned `not_found`, as the agent's local dictionary of country-to-capital mappings does not include `Spain`.
115+
116+
Finally, we verify that the last message does not contain `Madrid`, consistent with the absence of `Spain` in the agent's limited mapping.
117+
118+
## Conclusion
119+
120+
We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.

0 commit comments

Comments
 (0)