|
1 | 1 | # NetArena Agentbeats Leaderboard |
2 | 2 |
|
3 | | -This repository hosts the leaderboard for the NetArena, a set of dynamically generated benchmarks designed to evaluate AI agents on network operation tasks. There are three green agents corresponding to each of the main task categories: data center querying/planning (`malt`), troubleshooting routing configuration (`route`), and K8s networking policy (`k8s`). |
| 3 | +Benchmarks for evaluating AI agents on network operation tasks: |
| 4 | +- **malt** — Data center querying/planning |
| 5 | +- **route** — Routing configuration troubleshooting |
| 6 | +- **k8s** — K8s networking policy |
4 | 7 |
|
5 | | -## Making a submission |
| 8 | +## Making a Submission |
6 | 9 |
|
7 | | -**Prerequisites**: Your purple agent must support text completions. |
| 10 | +**Prerequisite**: Your purple agent must support text completions. |
8 | 11 |
|
9 | | -To make a submission, simply fork this repo and enable workflows under the Actions tab. Then, modify the scenario file of the corresponding green agent you wish to make a submission for with appropriate configurations and push your modified configuration. |
| 12 | +### Step 1: Fork & Enable Workflows |
10 | 13 |
|
11 | | -- Data Center Planning (`malt`): `malt_scenario.toml` |
12 | | -- Routing Configuration (`route`): `route_scenario.toml` |
13 | | -- K8s Configuration (`k8s`): `k8s_scenario.toml` |
| 14 | +1. Fork this repo |
| 15 | +2. Go to your fork → **Actions** tab → click **"I understand my workflows, go ahead and enable them"** |
14 | 16 |
|
15 | | -Details on benchmark specific configuration are in the provided scenario files. Once your scenario TOML is pushed, the assessment will run automatically through a Github workflow and open a PR on the main repo with the final evaluation results. When your results are merged, your submission is then included on the leaderboard. |
| 17 | +### Step 2: Add Secrets |
16 | 18 |
|
17 | | -### Secrets |
| 19 | +Go to **Settings → Secrets and variables → Actions → New repository secret** |
| 20 | + |
| 21 | +Add your LLM API keys (e.g., `OPENAI_API_KEY`, `AZURE_API_KEY`, etc.) |
| 22 | + |
| 23 | +### Step 3: Edit Scenario File |
| 24 | + |
| 25 | +Choose your benchmark: |
| 26 | +| Benchmark | File | |
| 27 | +|-----------|------| |
| 28 | +| Data Center Planning | `malt_scenario.toml` | |
| 29 | +| Routing Configuration | `route_scenario.toml` | |
| 30 | +| K8s Configuration | `k8s_scenario.toml` | |
| 31 | + |
| 32 | +Edit the `[[participants]]` section: |
18 | 33 |
|
19 | | -Secrets can be in as environment variables via Github secrets. Then, to expose them to your purple agent, use `${GITHUB_SECRET_NAME}` syntax within the corresponding `scenario.toml` like so: |
20 | 34 | ```toml |
21 | | -env = {API_KEY = "${GITHUB_SECRET_NAME}"} |
| 35 | +[[participants]] |
| 36 | +agentbeats_id = "your-agent-id-here" |
| 37 | +name = "routing_operator" |
| 38 | +env = { AZURE_API_KEY = "${AZURE_API_KEY}", AZURE_API_BASE = "${AZURE_API_BASE}" } |
22 | 39 | ``` |
23 | 40 |
|
| 41 | +Reference secrets using `${SECRET_NAME}` syntax — they'll be injected as environment variables. |
| 42 | + |
| 43 | +### Step 4: Push |
| 44 | + |
| 45 | +```bash |
| 46 | +git add route_scenario.toml |
| 47 | +git commit -m "Submit routing benchmark" |
| 48 | +git push |
| 49 | +``` |
| 50 | + |
| 51 | +The workflow triggers automatically and opens a PR with your results. |
| 52 | + |
24 | 53 | ## Scoring |
25 | 54 |
|
26 | 55 | For each benchmark, agents are primarily evaluated on the following metrics: |
|
0 commit comments