You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/testing/Examples/computer-use.md
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ Test your Computer Use agent with <code>testing</code>
9
9
</div>
10
10
11
11
Anthropic has recently announced a [Computer Use Agent](https://docs.anthropic.com/en/docs/build-with-claude/computer-use), an AI Agent capable
12
-
of interacting with a computer desktop environment. For this example, we prompt the agent to act as a QA engineer with the knowledge about the documentation of
12
+
of interacting with a computer desktop environment. For this example, we prompt the agent to act as a QA engineer with the knowledge about the documentation of
13
13
the Invariant SDK and the Invariant Explorer UI, and we ask it to perform tasks related to testing the agent.
14
14
15
15
## Running the example
@@ -22,7 +22,7 @@ poetry run invariant test sample_tests/demos/computer_use_agent.py --push --data
22
22
23
23
!!! note
24
24
25
-
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
25
+
If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
trace = run_agent("""create an empty dataset "chats-about-food", then use sdk to push 4 different traces
135
+
trace = run_agent("""create an empty dataset "chats-about-food", then use sdk to push 4 different traces
134
136
to it and then finally use sdk to update the metadata of the dataset to have "weather="snowy day" and "mood"="great"
135
137
after that go to the UI and verify that there are 4 traces and metadata is good""")
136
138
with trace.as_context():
@@ -154,7 +156,7 @@ Here, we would like to assert that the dataset created using the SDK actually ap
154
156
155
157
```python
156
158
deftest_anthropic():
157
-
trace = run_agent("""use https://github.com/anthropics/anthropic-sdk-python to generate some traces and upload them
159
+
trace = run_agent("""use https://github.com/anthropics/anthropic-sdk-python to generate some traces and upload them
158
160
to the explorer using invariant sdk. your ANTHROPIC_API_KEY is already set up with a valid key""")
159
161
with trace.as_context():
160
162
trace.run_assertions(global_asserts)
@@ -176,6 +178,7 @@ First, we have a simple assertion that checks whether the agent imports `anthrop
176
178
using `contains_any` function.
177
179
178
180
For this, we need two things:
181
+
179
182
1. Extract the dataset name from the tool output using a regex: `Dataset: (\w+)`, for instance `dataset_name` is `claude_examples`
180
183
2. We can assert that the dataset name is present in the last screenshot using `ocr_contains` function.
181
184
@@ -188,14 +191,13 @@ For this, we need two things:
188
191
</a>
189
192
</div>
190
193
191
-
192
194
In this test, we use the agent to create a FastAPI application with an endpoint that counts the number of words in a string.
193
195
First, we assert that the agent does not run any bash command that results in a "Permission denied" error.
194
196
Then, in the second part, we assert that the agent edits the same file in two different tool calls.
195
197
196
198
```python
197
199
deftest_code_agent_fastapi():
198
-
trace = run_agent("""use fastapi to create a count_words api that receives a string and counts
200
+
trace = run_agent("""use fastapi to create a count_words api that receives a string and counts
199
201
the number of words in it, then write a small client that tests it with a couple of different inputs""")
200
202
201
203
with trace.as_context():
@@ -225,7 +227,7 @@ In the second part, we use `F.map` to get the `file_text` argument from the `str
225
227
</div>
226
228
227
229
In this test, we ask the agent to write a function `compute_fibonacci(n)` that computes the n-th Fibonacci number and test it on a few inputs.
228
-
We then assert that executing the code `print(compute_fibonacci(12))` results in the `144` being present in the standard output (note that this asssertion requires
230
+
We then assert that executing the code `print(compute_fibonacci(12))` results in the `144` being present in the standard output (note that this asssertion requires
Copy file name to clipboardExpand all lines: docs/testing/Examples/swarm.md
+93-7Lines changed: 93 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,25 +2,95 @@
2
2
title: OpenAI Swarm
3
3
---
4
4
5
-
# Swarm Agents
5
+
# Testing Swarm Agents
6
6
7
7
<divclass="subtitle">
8
8
Test your OpenAI <code>swarm</code> agents.
9
9
</div>
10
10
11
-
OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
11
+
OpenAI's [Swarm](https://github.com/openai/swarm) is a powerful framework for building and managing multi-agent systems. In this guide, we will build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
12
+
13
+
## Quickstart
14
+
15
+
To get started quickly with testing your Swarm agent, you can use the following code snippet. For a more in-depth explanation, please refer to the rest of the guide.
16
+
17
+
```python
18
+
# swarm dependency
19
+
from swarm import Agent, Swarm
20
+
21
+
# for assertions
22
+
from invariant.testing import assert_true
23
+
from invariant.testing import functional as F
24
+
25
+
# for swarm testing support
26
+
from invariant.testing.wrappers.swarm_wrapper import SwarmWrapper
27
+
28
+
29
+
defget_capital(country_name: str) -> str:
30
+
"""Get the capital of a country."""
31
+
pass
32
+
33
+
34
+
agent = Agent(
35
+
name="An agent that helps users learn about the capitals of countries",
36
+
instructions="...",
37
+
functions=[get_capital],
38
+
)
39
+
40
+
41
+
deftest_agent():
42
+
# prepare test input
43
+
messages = [{"role": "user", "content": "What is the capital of France?"}]
44
+
45
+
# prepare swarm
46
+
swarm = Swarm()
47
+
# use Invariant's Swarm wrapper to auto-extract the agent trace
When running with `--push` you will also be able to inspect your test results in [Explorer](https://explorer.invariantlabs.ai).
74
+
75
+
<img src="/testing/Examples/swarm-explorer.png"
76
+
alt="OpenAI Swarm agent testing"
77
+
style="width: 200% !important;">
78
+
79
+
<center>OpenAI Swarm agent testing</center>
12
80
13
81
## Setup
82
+
14
83
To use `Swarm`, you need to need to install the corresponding package:
15
84
16
85
```bash
17
86
pip install openai-swarm
18
87
```
19
88
20
89
## Agent code
21
-
You can view the agent code [here](https://github.com/invariantlabs-ai/invariant/blob/main/invariant/testing/sample_tests/swarm/capital_finder_agent/capital_finder_agent.py)
22
90
23
-
This can be invoked as:
91
+
You can view the full code example of the example agent [here](https://github.com/invariantlabs-ai/invariant/blob/main/invariant/testing/sample_tests/swarm/capital_finder_agent/capital_finder_agent.py)
92
+
93
+
The agent can be invoked as follows.
24
94
25
95
```python
26
96
from invariant.wrappers.swarm_wrapper import SwarmWrapper
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
110
+
SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged. This makes it easy to extract the full trace of the agent's execution.
41
111
42
112
## Running example tests
43
113
44
114
You can run the example tests discussed in this notebook by running the following command in the root of the repository:
45
115
46
116
```bash
47
-
poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
117
+
invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
48
118
```
49
119
50
120
!!! note
@@ -133,4 +203,20 @@ Finally, we verify that the last message does not contain `Madrid`, consistent w
133
203
134
204
## Conclusion
135
205
136
-
We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.
206
+
This guide has walked you through testing an OpenAI Swarm agent using Invariant. We have seen how to write tests for an agent that finds the capital of a country. We have also seen how to use the `SwarmWrapper` to extract the trace of the agent's execution.
207
+
208
+
If you want to continue exploring, you can read some of the following chapters next.
0 commit comments