Merge pull request #2 from invariantlabs-ai/swarm-langgraph-docs

lbeurerkellner · web-flow · commit 40e7a7cc790c · 2024-12-10T18:00:33.000+01:00
Add documentation for Langgraph and Swarm examples from testing.
diff --git a/docs/testing/Examples/langgraph.md b/docs/testing/Examples/langgraph.md
@@ -0,0 +1,168 @@
+---
+title: LangGraph
+---
+
+# LangGraph Agents
+
+<div class="subtitle">
+Write tests for your <code>langgraph</code> applications.
+</div>
+
+LangGraph is a [library](https://github.com/langchain-ai/langgraph) for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. In this example, we build a weather agent that helps us answer queries about the weather by using tool calling.
+
+## Setup
+To use `langgraph`, you need to need to install the corresponding package:
+
+```bash
+pip install langgraph
+```
+
+## Agent code
+
+You can view the agent code [here](https://github.com/invariantlabs-ai/testing/blob/main/sample_tests/langgraph/weather_agent/weather_agent.py).
+
+This can be invoked as:
+
+```python
+from langchain_core.messages import HumanMessage
+
+from .weather_agent import WeatherAgent
+
+invocation_response = WeatherAgent().get_graph().invoke(
+    {"messages": [HumanMessage(content="what is the weather in sf")]},
+    config={"configurable": {"thread_id": 42}},
+)
+```
+
+
+## Running example tests
+
+You can run the example tests discussed in this notebook by running the following command in the root of the repository:
+
+```bash
+poetry run invariant test sample_tests/langgraph/weather_agent/test_weather_agent.py --push --dataset_name langgraph_weather_agent
+```
+
+!!! note
+
+    If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
+    as higihlighted in the terminal.
+
+## Unit tests
+
+We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
+
+1. The agent can correctly answer a query about the weather in San Francisco.
+
+2. The agent can correctly answer queries when asked about both the weather in San Francisco and New York City.
+
+For this, we will use `TraceFactory` to create traces from the invocation response and then use the corresponding `Trace` methods to examine the resulting runtime traces.
+
+### Test 1:
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/1" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_weather_agent_with_only_sf(weather_agent):
+    """Test the weather agent with San Francisco."""
+    invocation_response = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in sf")]},
+        config={"configurable": {"thread_id": 42}},
+    )
+
+    trace = TraceFactory.from_langgraph(invocation_response)
+
+    with trace.as_context():
+        find_weather_tool_calls = trace.tool_calls(name="_find_weather")
+        assert_true(F.len(find_weather_tool_calls) == 1)
+        assert_true(
+            find_weather_tool_calls[0]["function"]["arguments"].contains(
+                "San francisco"
+            )
+        )
+
+        find_weather_tool_outputs = trace.messages(role="tool")
+        assert_true(F.len(find_weather_tool_outputs) == 1)
+        assert_true(
+            find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
+        )
+
+        assert_true(trace.messages(-1)["content"].contains("60 degrees and foggy"))
+```
+
+We first use the `tool_calls()` method to retrieve all tool calls where the name is `_find_weather`, and we assert that there is exactly one such call. We also verify that the argument passed to the tool call includes `San Francisco`.
+
+Next, we use the `messages()` method with the `role="tool"` filter to check the output for `_find_weather` tool call, ensuring that the content of this output contains our desired answer.
+
+Finally, we confirm that the last message also includes our desired answer.
+
+### Test 2:
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/langgraph_weather_agent-1733695457/t/2" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_weather_agent_with_sf_and_nyc(weather_agent):
+    """Test the weather agent with San Francisco and New York City."""
+    _ = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in sf")]},
+        config={"configurable": {"thread_id": 41}},
+    )
+    invocation_response = weather_agent.invoke(
+        {"messages": [HumanMessage(content="what is the weather in nyc")]},
+        config={"configurable": {"thread_id": 41}},
+    )
+
+    trace = TraceFactory.from_langgraph(invocation_response)
+
+    with trace.as_context():
+        find_weather_tool_calls = trace.tool_calls(name="_find_weather")
+        assert_true(len(find_weather_tool_calls) == 2)
+        find_weather_tool_call_args = str(
+            F.map(lambda x: x.argument(), find_weather_tool_calls)
+        )
+        assert_true(
+            "San Francisco" in find_weather_tool_call_args
+            and "New York City" in find_weather_tool_call_args
+        )
+
+        find_weather_tool_outputs = trace.messages(role="tool")
+        assert_true(F.len(find_weather_tool_outputs) == 2)
+        assert_true(
+            find_weather_tool_outputs[0]["content"].contains("60 degrees and foggy")
+        )
+        assert_true(
+            find_weather_tool_outputs[1]["content"].contains("90 degrees and sunny")
+        )
+
+        assistant_response_messages = F.filter(
+            lambda m: m.get("tool_calls") is None, trace.messages(role="assistant")
+        )
+        assert_true(len(assistant_response_messages) == 2)
+        assert_true(
+            assistant_response_messages[0]["content"].contains(
+                "weather in San Francisco is"
+            )
+        )
+        assert_true(
+            assistant_response_messages[1]["content"].contains(
+                "weather in New York City is"
+            )
+        )
+```
+In this test, we use `F.map` to extract the arguments of the tool calls from the list of tool calls. We then assert that both our queries are present in the arguments list.
+
+There are two types of messages with `role="assistant"`: those where tool calls are made and those corresponding to the final response back to the caller. We use `F.filter` to filter out messages where `role="assistant"` but `tool_calls` is `None`. Finally, we assert that these response messages contain the results of the weather queries.
+
+## Conclusion
+
+We have seen how to to write unit tests for specific test cases when building an agent with the Langgraph library.
diff --git a/docs/testing/Examples/swarm.md b/docs/testing/Examples/swarm.md
@@ -0,0 +1,136 @@
+---
+title: OpenAI Swarm
+---
+
+# Swarm Agents
+
+<div class="subtitle">
+Test your OpenAI <code>swarm</code> agents.
+</div>
+
+OpenAI has introduced [Swarm](https://github.com/openai/swarm), a framework for building and managing multi-agent systems. In this example, we build a capital finder agent that uses tool calling to answer queries about finding the capital of a given country.
+
+## Setup
+To use `Swarm`, you need to need to install the corresponding package:
+
+```bash
+pip install openai-swarm
+```
+
+## Agent code
+You can view the agent code [here](sample_tests/swarm/capital_finder_agent/capital_finder_agent.py).
+
+This can be invoked as:
+
+```python
+from invariant.wrappers.swarm_wrapper import SwarmWrapper
+from swarm import Swarm
+
+from .capital_finder_agent import create_agent
+
+swarm_wrapper = SwarmWrapper(Swarm())
+agent = create_agent()
+messages = [{"role": "user", "content": "What is the capital of France?"}]
+response = swarm_wrapper.run(
+    agent=agent,
+    messages=messages,
+)
+```
+
+SwarmWrapper is a lightweight wrapper around the Swarm class. The response of its `run(...)` method includes the current Swarm response along with the history of all messages exchanged.
+
+## Running example tests
+
+You can run the example tests discussed in this notebook by running the following command in the root of the repository:
+
+```bash
+poetry run invariant test sample_tests/swarm/capital_finder_agent/test_capital_finder_agent.py --push --dataset_name swarm_capital_finder_agent
+```
+
+!!! note
+
+    If you want to run the example without sending the results to the Explorer UI, you can always run without the `--push` flag. You will still see the parts of the trace that fail
+    as higihlighted in the terminal.
+
+## Unit tests
+
+We can now use `testing` to assess the correctness of our agent. We will write two tests to verify different properties of the agents' behavior. For this, we want to verify that:
+
+1. The agent can correctly answer a query about the capital of France.
+2. The agent handles correctly when a given capital cannot be determined.
+
+### Test 1: Capital is correctly returned by the Agent
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/1" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_capital_finder_agent_when_capital_found(swarm_wrapper):
+    """Test the capital finder agent when the capital is found."""
+    agent = create_agent()
+    messages = [{"role": "user", "content": "What is the capital of France?"}]
+    response = swarm_wrapper.run(
+        agent=agent,
+        messages=messages,
+    )
+    trace = SwarmWrapper.to_invariant_trace(response)
+
+    with trace.as_context():
+        get_capital_tool_calls = trace.tool_calls(name="get_capital")
+        assert_true(F.len(get_capital_tool_calls) == 1)
+        assert_equals(
+            "France", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
+        )
+
+        assert_true(trace.messages(-1)["content"].contains("paris"))
+```
+
+We first use the `tool_calls()` method to retrieve all tool calls where the name is `get_capital`. Then, we assert that there is exactly one such tool call. We also assert that the argument `country_name` passed to the tool call is `France`. Additionally, we verify that the last message contains `Paris`, our desired answer.
+
+### Test 2: Capital is not found by the Agent
+
+<div class='tiles'>
+<a target="_blank" href="https://explorer.invariantlabs.ai/u/hemang1729/swarm_capital_finder_agent-1733695570/t/2" class='tile'>
+    <span class='tile-title'>Open in Explorer →</span>
+    <span class='tile-description'>See this example in the Invariant Explorer</span>
+</a>
+</div>
+
+```python
+def test_capital_finder_agent_when_capital_not_found(swarm_wrapper):
+    """Test the capital finder agent when the capital is not found."""
+    agent = create_agent()
+    messages = [{"role": "user", "content": "What is the capital of Spain?"}]
+    response = swarm_wrapper.run(
+        agent=agent,
+        messages=messages,
+    )
+    trace = SwarmWrapper.to_invariant_trace(response)
+
+    with trace.as_context():
+        get_capital_tool_calls = trace.tool_calls(name="get_capital")
+        assert_true(F.len(get_capital_tool_calls) == 1)
+        assert_equals(
+            "Spain", get_capital_tool_calls[0]["function"]["arguments"]["country_name"]
+        )
+
+        tool_outputs = trace.tool_outputs(tool_name="get_capital")
+        assert_true(F.len(tool_outputs) == 1)
+        assert_true(tool_outputs[0]["content"].contains("not_found"))
+
+        assert_false(trace.messages(-1)["content"].contains("Madrid"))
+```
+
+We use the `tool_calls()` method to retrieve all calls with the name `get_capital`, asserting that there is exactly one such call and that the argument `country_name` is `Spain`.
+
+Next, we use the `tool_outputs()` method to check the outputs for `get_capital` calls, confirming that the call returned `not_found`, as the agent's local dictionary of country-to-capital mappings does not include `Spain`.
+
+Finally, we verify that the last message does not contain `Madrid`, consistent with the absence of `Spain` in the agent's limited mapping.
+
+## Conclusion
+
+We have seen how to to write unit tests for specific test cases when building an agent with the Swarm framework.