Skip to content

Commit 88dffe0

Browse files
tweaks
1 parent 3c10d0b commit 88dffe0

File tree

6 files changed

+53
-4
lines changed

6 files changed

+53
-4
lines changed

docs/testing/Writing_Tests/Matchers.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
# Matchers
22

3-
<div class='subtitle'>Use matchers for fuzzy and LLM-based checks</div>
3+
<div class='subtitle'>Test with custom checkers and LLM-based evaluation</div>
44

5-
Not all agentic behavior can be specified with precise, traditional checking methods. Instead, more often than not, we expect AI models to generalize and thus respond slightly differently to different inputs.
5+
Not all agentic behavior can be specified with precise, traditional checking methods. Instead, more often than not, we expect AI models to generalize and thus respond slightly differently everytime we invoke them.
66

77
To accommodate this, `testing` includes several different `Matcher` implementations, that allow you to write tests that rely on fuzzy, similarity-based or property-based conditions.
88

9+
Beyond that, `Matcher` is also a simple base class that allows you to write your own custom matchers, if the provided ones are not sufficient for your needs (e.g. custom properties).
10+
911
## `IsSimilar`
1012

1113
TODO
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Parameterized Tests
2+
3+
<div class='subtitle'>Use parameterized tests to test multiple scenarios</div>
4+
5+
In some cases, a certain agent functionality should generalize to multiple scenarios. For example, a weather agent should be able to answer questions about the weather in different cities.
6+
7+
In `testing`, instead of writing a separate test for each city, you can use parameterized tests to test multiple scenarios. This ensures robustness and generalization of your agent's behavior.
8+
9+
```python
10+
from invariant.testing import Trace, assert_equals, parameterized
11+
import pytest
12+
13+
@pytest.mark.parametrize(
14+
("city",),
15+
[
16+
("Paris",),
17+
("London",),
18+
("New York",),
19+
]
20+
)
21+
def test_check_weather_in(city: str):
22+
# create a Trace object from your agent trajectory
23+
trace = Trace(
24+
trace=[
25+
{"role": "user", "content": f"What is the weather like in {city}"},
26+
{"role": "agent", "content": f"The weather in {city} is 75°F and sunny."},
27+
]
28+
)
29+
30+
# make assertions about the agent's behavior
31+
with trace.as_context():
32+
# extract the locations mentioned in the agent's response
33+
locations = trace.messages()[-1]["content"].extract("locations")
34+
35+
# assert that the agent responded about the given city
36+
assert_equals(
37+
1, len(locations), "The agent should respond about one location only"
38+
)
39+
40+
assert_equals(city, locations[0], "The agent should respond about " + city)
41+
```
42+
43+
### Visualization
44+
45+
When pushing the parameterized test results to Explorer (`invariant test --push`), the resulting test instances will be listed separately:
46+
47+
<img src="../../assets/parameterized_tests.png"/>
108 KB
Loading

docs/testing/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ ________________________________________________________________________________
7373
# },
7474
# ]
7575
```
76-
The test result provides information about which assertion failed but also [localizes the assertion failure precisely](Writing_Tests/1_Traces.ipynb) in the provided list of agent messages.
76+
The test result provides information about which assertion failed but also [localizes the assertion failure precisely](./Writing_Tests/tests.md) in the provided list of agent messages.
7777

7878
**Visual Test Viewer (Explorer):**
7979

@@ -92,7 +92,7 @@ Like the terminal output, the Explorer highlights the relevant ranges, but does
9292
* Comprehensive [`Trace` API](Writing_Tests/1_Traces.ipynb) for easily navigating and checking agent traces.
9393
* [Assertions library](Writing_Tests/2_Assertions.md) to check agent behavior, including fuzzy checkers such as _Levenshtein distance_, _semantic similarity_ and _LLM-as-a-judge_ pipelines.
9494
* Full [`pytest` compatibility](Running_Tests/PyTest_Compatibility.md) for easy integration with existing test and CI/CD pipelines.
95-
* Parameterized tests for [testing multiple scenarios](Writing_Tests/3_Parameterized_Tests.md) with a single test function.
95+
* Parameterized tests for [testing multiple scenarios](Writing_Tests/parameterized-tests) with a single test function.
9696
* [Visual test viewer](Writing_Tests/4_Visual_Test_Viewer.md) for exploring large traces and debugging test failures.
9797

9898
## Next Steps

0 commit comments

Comments
 (0)