55/> </a >
66
77  ;  ; |  ;  ;
8- [ 🎯 Benchmarks] ( #🎯- supported-benchmarks )   ;  ; |  ;  ;
9- [ 🛠️ Setup] ( #🛠️- setup-agentlab )   ;  ; |  ;  ;
8+ [ 🎯 Benchmarks] ( #supported-benchmarks )   ;  ; |  ;  ;
9+ [ 🛠️ Setup] ( #setup-agentlab )   ;  ; |  ;  ;
1010[ 🤖 Assistant] ( #ui-assistant )   ;  ; |  ;  ;
11- [ 🚀 Launch Experiments] ( #🚀- launch-experiments )   ;  ; |  ;  ;
12- [ 🔍 Analyse Results] ( #🔍- analyse-results )   ;  ; |  ;  ;
11+ [ 🚀 Launch Experiments] ( #launch-experiments )   ;  ; |  ;  ;
12+ [ 🔍 Analyse Results] ( #analyse-results )   ;  ; |  ;  ;
1313[ 🤖 Make Your Own Agent] ( #implement-a-new-agent )   ;  ; |  ;  ;
14- [ ↻ Reproducibility] ( #↻- reproducibility )   ;  ; |  ;  ;
14+ [ ↻ Reproducibility] ( #reproducibility )   ;  ; |  ;  ;
1515
1616[ ![ PyPI - License] ( https://img.shields.io/pypi/l/agentlab?style=flat-square )] ( [https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0) )
1717[ ![ PyPI - Downloads] ( https://img.shields.io/pypi/dm/agentlab?style=flat-square )] ( https://pypistats.org/packages/agentlab )
@@ -36,7 +36,7 @@ AgentLab Features:
3636* Various [ reproducibility features] ( #reproducibility-features )
3737* Unified LeaderBoard (soon)
3838
39- ## 🎯 Supported Benchmarks
39+ ## Supported Benchmarks
4040| Benchmark | Setup <br > Link | # Task <br > Template| Seed <br > Diversity | Max <br > Step | Multi-tab | Hosted Method | BrowserGym <br > Leaderboard |
4141| -----------| ------------| ---------| ----------------| -----------| -----------| ---------------| ----------------------|
4242| [ WebArena] ( https://webarena.dev/ ) | [ setup] ( https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/webarena/README.md ) | 812 | None | 30 | yes | self hosted (docker) | soon |
@@ -49,7 +49,7 @@ AgentLab Features:
4949| [ GAIA] ( https://huggingface.co/spaces/gaia-benchmark/leaderboard ) (soon) | - | - | None | - | - | live web | soon |
5050| [ Mind2Web-live] ( https://huggingface.co/datasets/iMeanAI/Mind2Web-Live ) (soon) | - | - | None | - | - | live web | soon |
5151| [ MiniWoB] ( https://miniwob.farama.org/index.html ) | [ setup] ( https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/miniwob/README.md ) | 125 | Medium | 10 | no | self hosted (static files) | soon |
52- ## 🛠️ Setup agentlab
52+ ## Setup agentlab
5353
5454``` bash
5555pip install agentlab
@@ -93,7 +93,7 @@ Try your own agent:
9393agentlab-assistant --agent_config=" module.path.to.your.AgentArgs"
9494```
9595
96- ## 🚀 Launch experiments
96+ ## Launch experiments
9797
9898``` python
9999# Import your agent configuration extending bgym.AgentArgs class
@@ -160,7 +160,7 @@ each agent. AgentLab currently does not support evaluations across multiple inst
160160either create a quick script to handle this or submit a PR to AgentLab. For a smoother parallel
161161experience, consider using benchmarks like WorkArena instead.
162162
163- ## 🔍 Analyse Results
163+ ## Analyse Results
164164
165165### Loading Results
166166
@@ -211,7 +211,7 @@ For a better integration with the tools, make sure to implement most functions i
211211If you think your agent should be included directly in AgenLab, let use know and it can be added in
212212agentlab/agents/ with the name of your agent.
213213
214- ## ↻ Reproducibility
214+ ## Reproducibility
215215Several factors can influence reproducibility of results in the context of evaluating agents on
216216dynamic benchmarks.
217217
0 commit comments