Skip to content

Commit 6ca22fb

Browse files
committed
readme
1 parent 298112d commit 6ca22fb

File tree

1 file changed

+61
-53
lines changed

1 file changed

+61
-53
lines changed

README.md

Lines changed: 61 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,98 @@
1-
# A2A Agent Template
1+
# Tau2 Green Agent
22

3-
A minimal template for building [A2A (Agent-to-Agent)](https://a2a-protocol.org/latest/) green agents compatible with the [AgentBeats](https://agentbeats.dev) platform.
3+
A green agent for the [Tau2 benchmark](https://github.com/sierra-research/tau2-bench) on the [AgentBeats](https://agentbeats.dev) platform. Evaluates purple agents on customer service tasks across multiple domains (airline, retail, telecom) using simulated users and real tool environments.
4+
5+
## How it works
6+
7+
The green agent runs tau2 evaluations via the [A2A protocol](https://a2a-protocol.org/latest/):
8+
9+
1. Receives an evaluation request with a purple agent URL and config (domain, number of tasks, etc.)
10+
2. For each task, creates a simulated user and orchestrates a multi-turn conversation between the user, the purple agent, and the domain environment (tools, databases, policies)
11+
3. Evaluates whether the purple agent completed the task successfully
12+
4. Returns pass rate, per-task rewards, and timing metrics
413

514
## Project Structure
615

716
```
817
src/
9-
├─ server.py # Server setup and agent card configuration
18+
├─ server.py # A2A server setup and agent card
1019
├─ executor.py # A2A request handling
11-
├─ agent.py # Your agent implementation goes here
20+
├─ agent.py # Tau2 evaluation logic and RemoteA2AAgent wrapper
1221
└─ messenger.py # A2A messaging utilities
22+
amber/
23+
├─ amber-scenario.json5 # Amber scenario (green + purple + gateway)
24+
├─ amber-manifest-green.json5 # Green agent manifest
25+
├─ amber-manifest-purple.json5 # Purple agent manifest
26+
├─ sample.env # Environment variable template
27+
└─ README.md # Amber compile and run instructions
1328
tests/
14-
└─ test_agent.py # Agent tests
15-
Dockerfile # Docker configuration
16-
pyproject.toml # Python dependencies
17-
amber-manifest.json5 # Amber manifest
18-
.github/
19-
└─ workflows/
20-
└─ test-and-publish.yml # CI workflow
29+
└─ test_agent.py # A2A conformance tests
30+
setup.sh # Downloads tau2-bench data for local development
31+
test_run.py # Example evaluation request script
32+
Dockerfile # Docker image (includes tau2-bench data)
2133
```
2234

23-
## Getting Started
24-
25-
1. **Create your repository** - Click "Use this template" to create your own repository from this template
26-
27-
2. **Implement your agent** - Add your agent logic to [`src/agent.py`](src/agent.py)
28-
29-
3. **Configure your agent card** - Fill in your agent's metadata (name, skills, description) in [`src/server.py`](src/server.py)
30-
31-
4. **Fill out your [Amber](https://github.com/RDI-Foundation/amber) manifest** - Update [`amber-manifest.json5`](amber-manifest.json5) to use your agent in Amber scenarios
32-
33-
5. **Write your tests** - Add custom tests for your agent in [`tests/test_agent.py`](tests/test_agent.py)
34-
35-
For a concrete example of implementing a green agent using this template, see this [draft PR](https://github.com/RDI-Foundation/green-agent-template/pull/3).
36-
3735
## Running Locally
3836

3937
```bash
38+
# Clone tau2-bench data
39+
bash setup.sh
40+
export TAU2_DATA_DIR=$PWD/tau2-bench/data
41+
4042
# Install dependencies
4143
uv sync
4244

43-
# Run the server
45+
# Set API key for the UserSimulator LLM
46+
export OPENAI_API_KEY=sk-...
47+
# Or for Gemini:
48+
# export GEMINI_API_KEY=...
49+
50+
# Start the green agent
4451
uv run src/server.py
4552
```
4653

54+
The server starts on port 9009. You'll need a purple agent running separately (e.g. from [agent-template](https://github.com/RDI-Foundation/agent-template)) to send evaluation requests to.
55+
4756
## Running with Docker
4857

49-
```bash
50-
# Build the image
51-
docker build -t my-agent .
58+
The Docker image bundles tau2-bench data, so no setup script is needed.
5259

53-
# Run the container
54-
docker run -p 9009:9009 my-agent
60+
```bash
61+
docker build -t tau2-green .
62+
docker run -p 8081:8081 -e OPENAI_API_KEY=sk-... tau2-green
5563
```
5664

57-
## Testing
65+
## Running with Amber
5866

59-
Run A2A conformance tests against your agent.
67+
See [amber/README.md](amber/README.md) for instructions on compiling and running the full scenario (green agent + purple agent + gateway) using the Amber CLI.
6068

61-
```bash
62-
# Install test dependencies
63-
uv sync --extra test
69+
## Configuration
70+
71+
The following config parameters can be passed in the evaluation request (or via Amber's `assessment_config`):
6472

65-
# Start your agent (uv or docker; see above)
73+
| Parameter | Required | Default | Description |
74+
|-----------|----------|---------|-------------|
75+
| `domain` | yes | `airline` | `airline`, `retail`, `telecom`, or `mock` |
76+
| `num_tasks` | no | all | Limit number of tasks to run |
77+
| `task_ids` | no | all | Specific task IDs to run |
78+
| `max_steps` | no | `200` | Max orchestrator steps per task |
79+
| `user_llm` | no | `openai/gpt-4o-mini` | LLM for the UserSimulator (litellm format) |
80+
| `user_llm_args` | no | `{"temperature": 0.0}` | LLM arguments for the UserSimulator |
6681

67-
# Run tests against your running agent URL
82+
To run the full benchmark, submit one evaluation per domain.
83+
84+
## Testing
85+
86+
```bash
87+
uv sync --extra test
6888
uv run pytest --agent-url http://localhost:9009
6989
```
7090

7191
## Publishing
7292

73-
The repository includes a GitHub Actions workflow that automatically builds, tests, and publishes a Docker image of your agent to GitHub Container Registry.
74-
75-
If your agent needs API keys or other secrets, add them in Settings → Secrets and variables → Actions → Repository secrets. They'll be available as environment variables during CI tests.
93+
The CI workflow builds, tests, and publishes to GitHub Container Registry on push to `main` or version tags:
7694

77-
- **Push to `main`** → publishes `latest` tag:
7895
```
79-
ghcr.io/<your-username>/<your-repo-name>:latest
80-
```
81-
82-
- **Create a git tag** (e.g. `git tag v1.0.0 && git push origin v1.0.0`) → publishes version tags:
96+
ghcr.io/rdi-foundation/tau2-agentbeats:latest
97+
ghcr.io/rdi-foundation/tau2-agentbeats:1.0.0
8398
```
84-
ghcr.io/<your-username>/<your-repo-name>:1.0.0
85-
ghcr.io/<your-username>/<your-repo-name>:1
86-
```
87-
88-
Once the workflow completes, find your Docker image in the Packages section (right sidebar of your repository). Configure the package visibility in package settings.
89-
90-
> **Note:** Organization repositories may need package write permissions enabled manually (Settings → Actions → General). Version tags must follow [semantic versioning](https://semver.org/) (e.g., `v1.0.0`).

0 commit comments

Comments
 (0)