Skip to content

Commit 894f66c

Browse files
improve swarm example
1 parent ef583e1 commit 894f66c

File tree

3 files changed

+105
-17
lines changed

3 files changed

+105
-17
lines changed

docs/testing/Examples/computer-use.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Test your Computer Use agent with <code>testing</code>
99
</div>
1010

1111
Anthropic has recently announced a [Computer Use Agent](https://docs.anthropic.com/en/docs/build-with-claude/computer-use), an AI Agent capable
12-
of interacting with a computer desktop environment. For this example, we prompt the agent to act as a QA engineer with the knowledge about the documentation of
12+
of interacting with a computer desktop environment. For this example, we prompt the agent to act as a QA engineer with the knowledge about the documentation of
1313
the Invariant SDK and the Invariant Explorer UI, and we ask it to perform tasks related to testing the agent.
1414

1515
## Running the example
@@ -22,7 +22,7 @@ poetry run invariant test sample_tests/demos/computer_use_agent.py --push --data
2222

2323
!!! note
2424

25-
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
25+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
2626
as higihlighted in the terminal.
2727

2828
## Global assertions
@@ -41,6 +41,7 @@ def does_not_click_on_firefox_menu(trace: Trace):
4141
```
4242

4343
Next, we can make sure that tool outputs do not contain `ModuleNotFoundError`, which typically indicates coding mistakes that the agent made.
44+
4445
```python
4546
def does_not_make_python_error(trace: Trace):
4647
"""Agent should not produce code that results in ModuleNotFoundError."""
@@ -49,6 +50,7 @@ def does_not_make_python_error(trace: Trace):
4950
```
5051

5152
We also noticed that the agent often overwrites the existing files using the `create` command. We can add a check for that:
53+
5254
```python
5355
def does_not_make_file_edit_errors(trace: Trace):
5456
"""Given a trace, assert that the agent does not make a file edit error."""
@@ -81,10 +83,10 @@ def test_annotation():
8183
with trace.as_context():
8284
trace.run_assertions(global_asserts)
8385
assert_true(trace.messages(0)["content"].contains("nice nice"))
84-
86+
8587
expect_true(max(F.frequency(
8688
F.filter(
87-
lambda x: "http" in x.value,
89+
lambda x: "http" in x.value,
8890
F.map(lambda tc: tc["function"]["arguments"]["text"], trace.tool_calls({"arguments.action": "type", "name": "computer"}))
8991
)
9092
).values()) <= 1)
@@ -115,7 +117,7 @@ def test_firefox_menu():
115117
trace.run_assertions(global_asserts)
116118
```
117119

118-
### Task 3: Empty dataset and upload traces using SDK
120+
### Task 3: Empty dataset and upload traces using SDK
119121

120122
<div class='tiles'>
121123
<a href="https://explorer.invariantlabs.ai/u/mbalunovic/computer_use_agent-1733382354/t/3" class='tile'>
@@ -130,7 +132,7 @@ contains `create_request_and_push_trace` string.
130132

131133
```python
132134
def test_food_dataset():
133-
trace = run_agent("""create an empty dataset "chats-about-food", then use sdk to push 4 different traces
135+
trace = run_agent("""create an empty dataset "chats-about-food", then use sdk to push 4 different traces
134136
to it and then finally use sdk to update the metadata of the dataset to have "weather="snowy day" and "mood"="great"
135137
after that go to the UI and verify that there are 4 traces and metadata is good""")
136138
with trace.as_context():
@@ -154,7 +156,7 @@ Here, we would like to assert that the dataset created using the SDK actually ap
154156

155157
```python
156158
def test_anthropic():
157-
trace = run_agent("""use https://github.com/anthropics/anthropic-sdk-python to generate some traces and upload them
159+
trace = run_agent("""use https://github.com/anthropics/anthropic-sdk-python to generate some traces and upload them
158160
to the explorer using invariant sdk. your ANTHROPIC_API_KEY is already set up with a valid key""")
159161
with trace.as_context():
160162
trace.run_assertions(global_asserts)
@@ -176,6 +178,7 @@ First, we have a simple assertion that checks whether the agent imports `anthrop
176178
using `contains_any` function.
177179

178180
For this, we need two things:
181+
179182
1. Extract the dataset name from the tool output using a regex: `Dataset: (\w+)`, for instance `dataset_name` is `claude_examples`
180183
2. We can assert that the dataset name is present in the last screenshot using `ocr_contains` function.
181184

@@ -188,14 +191,13 @@ For this, we need two things:
188191
</a>
189192
</div>
190193

191-
192194
In this test, we use the agent to create a FastAPI application with an endpoint that counts the number of words in a string.
193195
First, we assert that the agent does not run any bash command that results in a "Permission denied" error.
194196
Then, in the second part, we assert that the agent edits the same file in two different tool calls.
195197

196198
```python
197199
def test_code_agent_fastapi():
198-
trace = run_agent("""use fastapi to create a count_words api that receives a string and counts
200+
trace = run_agent("""use fastapi to create a count_words api that receives a string and counts
199201
the number of words in it, then write a small client that tests it with a couple of different inputs""")
200202

201203
with trace.as_context():
@@ -225,7 +227,7 @@ In the second part, we use `F.map` to get the `file_text` argument from the `str
225227
</div>
226228

227229
In this test, we ask the agent to write a function `compute_fibonacci(n)` that computes the n-th Fibonacci number and test it on a few inputs.
228-
We then assert that executing the code `print(compute_fibonacci(12))` results in the `144` being present in the standard output (note that this asssertion requires
230+
We then assert that executing the code `print(compute_fibonacci(12))` results in the `144` being present in the standard output (note that this asssertion requires
229231
Docker to be installed).
230232

231233
```python
260 KB
Loading

docs/testing/Examples/swarm.md

Lines changed: 93 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,25 +2,95 @@
22
title: OpenAI Swarm
33
---
44

5-
# Swarm Agents
5+
# Testing Swarm Agents
66

77
<div class="subtitle">
88
Test your OpenAI <code>swarm</code> agents.
99
</div>
1010

11-
OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
11+
OpenAI's [Swarm](https://github.com/openai/swarm) is a powerful framework for building and managing multi-agent systems. In this guide, we will build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
12+
13+
## Quickstart
14+
15+
To get started quickly with testing your Swarm agent, you can use the following code snippet. For a more in-depth explanation, please refer to the rest of the guide.
16+
17+
```python
18+
# swarm dependency
19+
from swarm import Agent, Swarm
20+
21+
# for assertions
22+
from invariant.testing import assert_true
23+
from invariant.testing import functional as F
24+
25+
# for swarm testing support
26+
from invariant.testing.wrappers.swarm_wrapper import SwarmWrapper
27+
28+
29+
def get_capital(country_name: str) -> str:
30+
"""Get the capital of a country."""
31+
pass
32+
33+
34+
agent = Agent(
35+
name="An agent that helps users learn about the capitals of countries",
36+
instructions="...",
37+
functions=[get_capital],
38+
)
39+
40+
41+
def test_agent():
42+
# prepare test input
43+
messages = [{"role": "user", "content": "What is the capital of France?"}]
44+
45+
# prepare swarm
46+
swarm = Swarm()
47+
# use Invariant's Swarm wrapper to auto-extract the agent trace
48+
swarm = SwarmWrapper(swarm)
49+
50+
# run agent
51+
response = swarm.run(
52+
agent=agent,
53+
messages=messages,
54+
)
55+
56+
# make assertions about trace
57+
trace = SwarmWrapper.to_invariant_trace(response)
58+
59+
with trace.as_context():
60+
get_capital_tool_calls = trace.tool_calls(name="get_capital")
61+
62+
# should have one 'get_capital' tool call
63+
assert_true(F.len(get_capital_tool_calls) == 1)
64+
65+
```
66+
67+
Run with
68+
69+
```bash
70+
invariant test test.py
71+
```
72+
73+
When running with `--push` you will also be able to inspect your test results in [Explorer](https://explorer.invariantlabs.ai).
74+
75+
<img src="/testing/Examples/swarm-explorer.png"
76+
alt="OpenAI Swarm agent testing"
77+
style="width: 200% !important;">
78+
79+
<center>OpenAI Swarm agent testing</center>
1280

1381
## Setup
82+
1483
To use `Swarm`, you need to need to install the corresponding package:
1584

1685
```bash
1786
pip install openai-swarm
1887
```
1988

2089
## Agent code
21-
You can view the agent code [here](https://github.com/invariantlabs-ai/invariant/blob/main/invariant/testing/sample_tests/swarm/capital_finder_agent/capital_finder_agent.py)
2290

23-
This can be invoked as:
91+
You can view the full code example of the example agent [here](https://github.com/invariantlabs-ai/invariant/blob/main/invariant/testing/sample_tests/swarm/capital_finder_agent/capital_finder_agent.py)
92+
93+
The agent can be invoked as follows.
2494

2595
```python
2696
from invariant.wrappers.swarm_wrapper import SwarmWrapper
@@ -37,14 +107,14 @@ response = swarm_wrapper.run(
37107
)
38108
```
39109

40-
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
110+
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged. This makes it easy to extract the full trace of the agent's execution.
41111

42112
## Running example tests
43113

44114
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
45115

46116
```bash
47-
poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
117+
invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
48118
```
49119

50120
!!! note
@@ -133,4 +203,20 @@ Finally, we verify that the last message does not contain `Madrid`, consistent w
133203

134204
## Conclusion
135205

136-
We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.
206+
This guide has walked you through testing an OpenAI Swarm agent using Invariant. We have seen how to write tests for an agent that finds the capital of a country. We have also seen how to use the `SwarmWrapper` to extract the trace of the agent's execution.
207+
208+
If you want to continue exploring, you can read some of the following chapters next.
209+
210+
<div class='tiles'>
211+
212+
<a href="/testing/Writing_Tests/Matchers" class='tile primary'>
213+
<span class='tile-title'>Matchers →</span>
214+
<span class='tile-description'>Learn more about Matchers to write assertions</span>
215+
</a>
216+
217+
<a href="/testing/Writing_Tests/parameterized-tests/" class='tile primary'>
218+
<span class='tile-title'>Paremeterized Tests →</span>
219+
<span class='tile-description'>Learn how to parameterize your tests for more robust testing</span>
220+
</a>
221+
222+
</div>

0 commit comments

Comments
 (0)