2929</video >
3030
3131AgentLab is a framework for developing and evaluating agents on a variety of
32- [ benchmarks] ( #🎯 -supported-benchmarks ) supported by
32+ [ benchmarks] ( #-supported-benchmarks ) supported by
3333[ BrowserGym] ( https://github.com/ServiceNow/BrowserGym ) .
3434
3535AgentLab Features:
36- * Easy large scale parallel [ agent experiments] ( #🚀 -launch-experiments ) using [ ray] ( https://www.ray.io/ )
36+ * Easy large scale parallel [ agent experiments] ( #-launch-experiments ) using [ ray] ( https://www.ray.io/ )
3737* Building blocks for making agents over BrowserGym
3838* Unified LLM API for OpenRouter, OpenAI, Azure, or self hosted using TGI.
3939* Prefered way for running benchmarks like WebArena
4040* Various [ reproducibility features] ( #reproducibility-features )
4141* Unified LeaderBoard (soon)
4242
4343## 🎯 Supported Benchmarks
44+
4445| Benchmark | Setup <br > Link | # Task <br > Template| Seed <br > Diversity | Max <br > Step | Multi-tab | Hosted Method | BrowserGym <br > Leaderboard |
4546| -----------| ------------| ---------| ----------------| -----------| -----------| ---------------| ----------------------|
4647| [ WebArena] ( https://webarena.dev/ ) | [ setup] ( https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/webarena/README.md ) | 812 | None | 30 | yes | self hosted (docker) | soon |
@@ -53,14 +54,15 @@ AgentLab Features:
5354| [ GAIA] ( https://huggingface.co/spaces/gaia-benchmark/leaderboard ) (soon) | - | - | None | - | - | live web | soon |
5455| [ Mind2Web-live] ( https://huggingface.co/datasets/iMeanAI/Mind2Web-Live ) (soon) | - | - | None | - | - | live web | soon |
5556| [ MiniWoB] ( https://miniwob.farama.org/index.html ) | [ setup] ( https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/miniwob/README.md ) | 125 | Medium | 10 | no | self hosted (static files) | soon |
56- ## 🛠️ Setup agentlab
57+
58+ ## 🛠️ Setup AgentLab
5759
5860``` bash
5961pip install agentlab
6062```
6163
6264Make sure to prepare the required benchmark according to instructions provided in the [ setup
63- column] ( #🎯 -supported-benchmarks ) .
65+ column] ( #-supported-benchmarks ) .
6466
6567``` bash
6668export AGENTLAB_EXP_ROOT=< root directory of experiment results> # defaults to $HOME/agentlab_results
@@ -85,6 +87,7 @@ export AZURE_OPENAI_ENDPOINT=<your endpoint> # if using azure models
8587</details >
8688
8789## 🤖 UI-Assistant
90+
8891Use an assistant to work for you (at your own cost and risk).
8992
9093``` bash
0 commit comments