Skip to content

Commit 40e7a7c

Browse files
Merge pull request #2 from invariantlabs-ai/swarm-langgraph-docs
Add documentation for Langgraph and Swarm examples from testing.
2 parents 19e1790 + 746b75f commit 40e7a7c

File tree

2 files changed

+304
-0
lines changed

2 files changed

+304
-0
lines changed

docs/testing/Examples/langgraph.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
title: LangGraph
3+
---
4+
5+
# LangGraph Agents
6+
7+
<div class="subtitle">
8+
Write tests for your <code>langgraph</code> applications.
9+
</div>
10+
11+
LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
12+
13+
## Setup
14+
To use `langgraph`, you need to need to install the corresponding package:
15+
16+
```bash
17+
pip install langgraph
18+
```
19+
20+
## Agent code
21+
22+
You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py).
23+
24+
This can be invoked as:
25+
26+
```python
27+
from langchain_core.messages import HumanMessage
28+
29+
from .weather_agent import WeatherAgent
30+
31+
invocation_response = WeatherAgent().get_graph().invoke(
32+
{"messages": [HumanMessage(content="what is the weather in sf")]},
33+
config={"configurable": {"thread_id": 42}},
34+
)
35+
```
36+
37+
38+
## Running example tests
39+
40+
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
41+
42+
```bash
43+
poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agent.py --push --dataset_name langgraph_weather_agent
44+
```
45+
46+
!!! note
47+
48+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
49+
as higihlighted in the terminal.
50+
51+
## Unit tests
52+
53+
We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
54+
55+
1. The agent can correctly answer a query about the weather in San Francisco.
56+
57+
2. The agent can correctly answer queries when asked about both the weather in San Francisco and New York City.
58+
59+
For this, we will use `TraceFactory` to create traces from the invocation response and then use the corresponding `Trace` methods to examine the resulting runtime traces.
60+
61+
### Test 1:
62+
63+
<div class='tiles'>
64+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/1" class='tile'>
65+
<span class='tile-title'>Open in Explorer →</span>
66+
<span class='tile-description'>See this example in the Invariant Explorer</span>
67+
</a>
68+
</div>
69+
70+
```python
71+
def test_weather_agent_with_only_sf(weather_agent):
72+
"""Test the weather agent with San Francisco."""
73+
invocation_response = weather_agent.invoke(
74+
{"messages": [HumanMessage(content="what is the weather in sf")]},
75+
config={"configurable": {"thread_id": 42}},
76+
)
77+
78+
trace = TraceFactory.from_langgraph(invocation_response)
79+
80+
with trace.as_context():
81+
find_weather_tool_calls = trace.tool_calls(name="_find_weather")
82+
assert_true(F.len(find_weather_tool_calls) == 1)
83+
assert_true(
84+
find_weather_tool_calls[0]["function"]["arguments"].contains(
85+
"San francisco"
86+
)
87+
)
88+
89+
find_weather_tool_outputs = trace.messages(role="tool")
90+
assert_true(F.len(find_weather_tool_outputs) == 1)
91+
assert_true(
92+
find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
93+
)
94+
95+
assert_true(trace.messages(-1)["content"].contains("60 degrees and foggy"))
96+
```
97+
98+
We first use the `tool_calls()` method to retrieve all tool calls where the name is `_find_weather`, and we assert that there is exactly one such call. We also verify that the argument passed to the tool call includes `San Francisco`.
99+
100+
Next, we use the `messages()` method with the `role="tool"` filter to check the output for `_find_weather` tool call, ensuring that the content of this output contains our desired answer.
101+
102+
Finally, we confirm that the last message also includes our desired answer.
103+
104+
### Test 2:
105+
106+
<div class='tiles'>
107+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/2" class='tile'>
108+
<span class='tile-title'>Open in Explorer →</span>
109+
<span class='tile-description'>See this example in the Invariant Explorer</span>
110+
</a>
111+
</div>
112+
113+
```python
114+
def test_weather_agent_with_sf_and_nyc(weather_agent):
115+
"""Test the weather agent with San Francisco and New York City."""
116+
_ = weather_agent.invoke(
117+
{"messages": [HumanMessage(content="what is the weather in sf")]},
118+
config={"configurable": {"thread_id": 41}},
119+
)
120+
invocation_response = weather_agent.invoke(
121+
{"messages": [HumanMessage(content="what is the weather in nyc")]},
122+
config={"configurable": {"thread_id": 41}},
123+
)
124+
125+
trace = TraceFactory.from_langgraph(invocation_response)
126+
127+
with trace.as_context():
128+
find_weather_tool_calls = trace.tool_calls(name="_find_weather")
129+
assert_true(len(find_weather_tool_calls) == 2)
130+
find_weather_tool_call_args = str(
131+
F.map(lambda x: x.argument(), find_weather_tool_calls)
132+
)
133+
assert_true(
134+
"San Francisco" in find_weather_tool_call_args
135+
and "New York City" in find_weather_tool_call_args
136+
)
137+
138+
find_weather_tool_outputs = trace.messages(role="tool")
139+
assert_true(F.len(find_weather_tool_outputs) == 2)
140+
assert_true(
141+
find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
142+
)
143+
assert_true(
144+
find_weather_tool_outputs[1]["content"].contains("90 degrees and sunny")
145+
)
146+
147+
assistant_response_messages = F.filter(
148+
lambda m: m.get("tool_calls") is None, trace.messages(role="assistant")
149+
)
150+
assert_true(len(assistant_response_messages) == 2)
151+
assert_true(
152+
assistant_response_messages[0]["content"].contains(
153+
"weather in San Francisco is"
154+
)
155+
)
156+
assert_true(
157+
assistant_response_messages[1]["content"].contains(
158+
"weather in New York City is"
159+
)
160+
)
161+
```
162+
In this test, we use `F.map` to extract the arguments of the tool calls from the list of tool calls. We then assert that both our queries are present in the arguments list.
163+
164+
There are two types of messages with `role="assistant"`: those where tool calls are made and those corresponding to the final response back to the caller. We use `F.filter` to filter out messages where `role="assistant"` but `tool_calls` is `None`. Finally, we assert that these response messages contain the results of the weather queries.
165+
166+
## Conclusion
167+
168+
We have seen how to to write unit tests for specific test cases when building an agent with the Langgraph library.

docs/testing/Examples/swarm.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: OpenAI Swarm
3+
---
4+
5+
# Swarm Agents
6+
7+
<div class="subtitle">
8+
Test your OpenAI <code>swarm</code> agents.
9+
</div>
10+
11+
OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
12+
13+
## Setup
14+
To use `Swarm`, you need to need to install the corresponding package:
15+
16+
```bash
17+
pip install openai-swarm
18+
```
19+
20+
## Agent code
21+
You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py).
22+
23+
This can be invoked as:
24+
25+
```python
26+
from invariant.wrappers.swarm_wrapper import SwarmWrapper
27+
from swarm import Swarm
28+
29+
from .capital_finder_agent import create_agent
30+
31+
swarm_wrapper = SwarmWrapper(Swarm())
32+
agent = create_agent()
33+
messages = [{"role": "user", "content": "What is the capital of France?"}]
34+
response = swarm_wrapper.run(
35+
agent=agent,
36+
messages=messages,
37+
)
38+
```
39+
40+
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
41+
42+
## Running example tests
43+
44+
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
45+
46+
```bash
47+
poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
48+
```
49+
50+
!!! note
51+
52+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
53+
as higihlighted in the terminal.
54+
55+
## Unit tests
56+
57+
We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
58+
59+
1. The agent can correctly answer a query about the capital of France.
60+
2. The agent handles correctly when a given capital cannot be determined.
61+
62+
### Test 1: Capital is correctly returned by the Agent
63+
64+
<div class='tiles'>
65+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/1" class='tile'>
66+
<span class='tile-title'>Open in Explorer →</span>
67+
<span class='tile-description'>See this example in the Invariant Explorer</span>
68+
</a>
69+
</div>
70+
71+
```python
72+
def test_capital_finder_agent_when_capital_found(swarm_wrapper):
73+
"""Test the capital finder agent when the capital is found."""
74+
agent = create_agent()
75+
messages = [{"role": "user", "content": "What is the capital of France?"}]
76+
response = swarm_wrapper.run(
77+
agent=agent,
78+
messages=messages,
79+
)
80+
trace = SwarmWrapper.to_invariant_trace(response)
81+
82+
with trace.as_context():
83+
get_capital_tool_calls = trace.tool_calls(name="get_capital")
84+
assert_true(F.len(get_capital_tool_calls) == 1)
85+
assert_equals(
86+
"France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
87+
)
88+
89+
assert_true(trace.messages(-1)["content"].contains("paris"))
90+
```
91+
92+
We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer.
93+
94+
### Test 2: Capital is not found by the Agent
95+
96+
<div class='tiles'>
97+
<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/2" class='tile'>
98+
<span class='tile-title'>Open in Explorer →</span>
99+
<span class='tile-description'>See this example in the Invariant Explorer</span>
100+
</a>
101+
</div>
102+
103+
```python
104+
def test_capital_finder_agent_when_capital_not_found(swarm_wrapper):
105+
"""Test the capital finder agent when the capital is not found."""
106+
agent = create_agent()
107+
messages = [{"role": "user", "content": "What is the capital of Spain?"}]
108+
response = swarm_wrapper.run(
109+
agent=agent,
110+
messages=messages,
111+
)
112+
trace = SwarmWrapper.to_invariant_trace(response)
113+
114+
with trace.as_context():
115+
get_capital_tool_calls = trace.tool_calls(name="get_capital")
116+
assert_true(F.len(get_capital_tool_calls) == 1)
117+
assert_equals(
118+
"Spain", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
119+
)
120+
121+
tool_outputs = trace.tool_outputs(tool_name="get_capital")
122+
assert_true(F.len(tool_outputs) == 1)
123+
assert_true(tool_outputs[0]["content"].contains("not_found"))
124+
125+
assert_false(trace.messages(-1)["content"].contains("Madrid"))
126+
```
127+
128+
We use the `tool_calls()` method to retrieve all calls with the name `get_capital`, asserting that there is exactly one such call and that the argument `country_name` is `Spain`.
129+
130+
Next, we use the `tool_outputs()` method to check the outputs for `get_capital` calls, confirming that the call returned `not_found`, as the agent's local dictionary of country-to-capital mappings does not include `Spain`.
131+
132+
Finally, we verify that the last message does not contain `Madrid`, consistent with the absence of `Spain` in the agent's limited mapping.
133+
134+
## Conclusion
135+
136+
We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.

0 commit comments

Comments
 (0)