Skip to content

Commit 5797239

Browse files
committed
update green agent id
1 parent 6a05d55 commit 5797239

File tree

2 files changed

+41
-12
lines changed

2 files changed

+41
-12
lines changed

README.md

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,55 @@
11
# NetArena Agentbeats Leaderboard
22

3-
This repository hosts the leaderboard for the NetArena, a set of dynamically generated benchmarks designed to evaluate AI agents on network operation tasks. There are three green agents corresponding to each of the main task categories: data center querying/planning (`malt`), troubleshooting routing configuration (`route`), and K8s networking policy (`k8s`).
3+
Benchmarks for evaluating AI agents on network operation tasks:
4+
- **malt** — Data center querying/planning
5+
- **route** — Routing configuration troubleshooting
6+
- **k8s** — K8s networking policy
47

5-
## Making a submission
8+
## Making a Submission
69

7-
**Prerequisites**: Your purple agent must support text completions.
10+
**Prerequisite**: Your purple agent must support text completions.
811

9-
To make a submission, simply fork this repo and enable workflows under the Actions tab. Then, modify the scenario file of the corresponding green agent you wish to make a submission for with appropriate configurations and push your modified configuration.
12+
### Step 1: Fork & Enable Workflows
1013

11-
- Data Center Planning (`malt`): `malt_scenario.toml`
12-
- Routing Configuration (`route`): `route_scenario.toml`
13-
- K8s Configuration (`k8s`): `k8s_scenario.toml`
14+
1. Fork this repo
15+
2. Go to your fork → **Actions** tab → click **"I understand my workflows, go ahead and enable them"**
1416

15-
Details on benchmark specific configuration are in the provided scenario files. Once your scenario TOML is pushed, the assessment will run automatically through a Github workflow and open a PR on the main repo with the final evaluation results. When your results are merged, your submission is then included on the leaderboard.
17+
### Step 2: Add Secrets
1618

17-
### Secrets
19+
Go to **Settings → Secrets and variables → Actions → New repository secret**
20+
21+
Add your LLM API keys (e.g., `OPENAI_API_KEY`, `AZURE_API_KEY`, etc.)
22+
23+
### Step 3: Edit Scenario File
24+
25+
Choose your benchmark:
26+
| Benchmark | File |
27+
|-----------|------|
28+
| Data Center Planning | `malt_scenario.toml` |
29+
| Routing Configuration | `route_scenario.toml` |
30+
| K8s Configuration | `k8s_scenario.toml` |
31+
32+
Edit the `[[participants]]` section:
1833

19-
Secrets can be in as environment variables via Github secrets. Then, to expose them to your purple agent, use `${GITHUB_SECRET_NAME}` syntax within the corresponding `scenario.toml` like so:
2034
```toml
21-
env = {API_KEY = "${GITHUB_SECRET_NAME}"}
35+
[[participants]]
36+
agentbeats_id = "your-agent-id-here"
37+
name = "routing_operator"
38+
env = { AZURE_API_KEY = "${AZURE_API_KEY}", AZURE_API_BASE = "${AZURE_API_BASE}" }
2239
```
2340

41+
Reference secrets using `${SECRET_NAME}` syntax — they'll be injected as environment variables.
42+
43+
### Step 4: Push
44+
45+
```bash
46+
git add route_scenario.toml
47+
git commit -m "Submit routing benchmark"
48+
git push
49+
```
50+
51+
The workflow triggers automatically and opens a PR with your results.
52+
2453
## Scoring
2554

2655
For each benchmark, agents are primarily evaluated on the following metrics:

route_scenario.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[green_agent]
2-
agentbeats_id = "019ba4c5-f257-7e70-8f4c-43562373532b"
2+
agentbeats_id = "019ba8d8-c1d1-7923-b6c7-c5020e1c6cbe"
33
env = { LOG_LEVEL = "INFO" } # trigger
44

55
[[participants]]

0 commit comments

Comments
 (0)