-
Notifications
You must be signed in to change notification settings - Fork 207
Open
Description
I'd like to be able to build a standalone agent using the tau2 domain tools, then evaluate the standalone agent using tau2.
A few changes I can think of that would be involved:
- Separate the tools out into something like an MCP server.
- Remote agent invocation: Similar to Improving A2A Agent Integration for tau2-bench #111, it could be invoked over A2A.
- Offline evaluation: Ability to convert the remote agent's state/messages into a format that tau2 can evaluate after a run.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels