Tau2 Green Agent

A green agent for the Tau2 benchmark on the AgentBeats platform. Evaluates purple agents on customer service tasks across multiple domains (airline, retail, telecom) using simulated users and real tool environments.

How it works

The green agent runs tau2 evaluations via the A2A protocol:

Receives an evaluation request with a purple agent URL and config (domain, number of tasks, etc.)
For each task, creates a simulated user and orchestrates a multi-turn conversation between the user, the purple agent, and the domain environment (tools, databases, policies)
Evaluates whether the purple agent completed the task successfully
Returns pass rate, per-task rewards, and timing metrics

Project Structure

src/
├─ server.py      # A2A server setup and agent card
├─ executor.py    # A2A request handling
├─ agent.py       # Tau2 evaluation logic and RemoteA2AAgent wrapper
└─ messenger.py   # A2A messaging utilities
amber/
├─ amber-scenario.json5         # Amber scenario (green + purple + gateway)
├─ amber-manifest-green.json5   # Green agent manifest
├─ amber-manifest-purple.json5  # Purple agent manifest
├─ sample.env                   # Environment variable template
└─ README.md                    # Amber compile and run instructions
tests/
└─ test_agent.py  # A2A conformance tests
setup.sh          # Downloads tau2-bench data for local development
test_run.py       # Example evaluation request script
Dockerfile        # Docker image (includes tau2-bench data)

Running Locally

# Clone tau2-bench data
bash setup.sh
export TAU2_DATA_DIR=$PWD/tau2-bench/data

# Install dependencies
uv sync

# Set API key for the UserSimulator LLM
export OPENAI_API_KEY=sk-...
# Or for Gemini:
# export GEMINI_API_KEY=...

# Start the green agent
uv run src/server.py

The server starts on port 9009. You'll need a purple agent running separately (e.g. from agent-template) to send evaluation requests to.

Running with Docker

The Docker image bundles tau2-bench data, so no setup script is needed.

docker build -t tau2-green .
docker run -p 8081:8081 -e OPENAI_API_KEY=sk-... tau2-green

Running with Amber

See amber/README.md for instructions on compiling and running the full scenario (green agent + purple agent + gateway) using the Amber CLI.

Configuration

The following config parameters can be passed in the evaluation request (or via Amber's assessment_config):

Parameter	Required	Default	Description
`domain`	yes	`airline`	`airline`, `retail`, `telecom`, or `mock`
`num_tasks`	no	all	Limit number of tasks to run
`task_ids`	no	all	Specific task IDs to run
`max_steps`	no	`200`	Max orchestrator steps per task
`user_llm`	no	`openai/gpt-4o-mini`	LLM for the UserSimulator (litellm format)
`user_llm_args`	no	`{"temperature": 0.0}`	LLM arguments for the UserSimulator

To run the full benchmark, submit one evaluation per domain.

Testing

uv sync --extra test
uv run pytest --agent-url http://localhost:9009

Publishing

The CI workflow builds, tests, and publishes to GitHub Container Registry on push to main or version tags:

ghcr.io/rdi-foundation/tau2-agentbeats:latest
ghcr.io/rdi-foundation/tau2-agentbeats:1.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
amber		amber
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
amber-manifest.json5		amber-manifest.json5
pyproject.toml		pyproject.toml
setup.sh		setup.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tau2 Green Agent

How it works

Project Structure

Running Locally

Running with Docker

Running with Amber

Configuration

Testing

Publishing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tau2 Green Agent

How it works

Project Structure

Running Locally

Running with Docker

Running with Amber

Configuration

Testing

Publishing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages