Skip to content

Commit 71ec578

Browse files
recursixgasseTLSDC
authored
Refine ux again (#150)
* yet another way to kill timedout jobs * Improve timeout handling in task polling logic * Add method to override max_steps in Study class * add support for tab visibility in observation flags and update related components * fix tests * black * Improve timeout handling in task polling logic * yet another way to kill timedout jobs (#108) * Add method to override max_steps in Study class * add support for tab visibility in observation flags and update related components * fix tests * black * black * Fix sorting bug. improve directory content retrieval with summary statistics * fix test * black * tmp * add error report, add cum cost to summary and ray backend by default * displaying exp names in ray dashboard (#123) * displaying exp names in ray dashboard * fixing tests * enabling chat o_0 (#124) * sequential studies * little bug * more flexible requirement * imrove readme * Enhance agent configuration and logging in study setup - Updated `get_vision_agent` to append "_vision" to agent names. - Modified `BaseMessage.__str__` to include a no-warning option for logging. - Improved `make_study` function to accept single agent args and benchmark types. - Added detailed docstrings for better clarity on parameters and functionality. - Introduced `avg_step_timeout` and `demo_mode` attributes in the Study class. * get_text was added by mistake * Update README and Jupyter notebook with improved documentation and result analysis instructions * Update requirements to include Jupyter support for black * Update README.md * Fix formatting and improve clarity in README.md * Update README.md to enhance visuals and improve navigation * Add badges to README.md for PyPI, GitHub stars, and CI status * Add video demonstration to AgentXray section in README.md * test video * xray video test * Update AgentXray section in README.md with new asset link * minor * fix setup link ... again * remove upper case letter before getting the benchmark * minor * Update ReproducibilityAgent link in README.md for better accessibility --------- Co-authored-by: Maxime Gasse <[email protected]> Co-authored-by: Thibault LSDC <[email protected]>
1 parent 6db6878 commit 71ec578

File tree

2 files changed

+32
-23
lines changed

2 files changed

+32
-23
lines changed

README.md

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11

22

3+
34
<a href="https://github.com/user-attachments/assets/c2bc0b80-89da-4afb-9120-2feb018df19d"> <img
45
src="https://github.com/user-attachments/assets/c2bc0b80-89da-4afb-9120-2feb018df19d" width="800"
56
/> </a>
@@ -13,30 +14,40 @@
1314
[🤖 Make Your Own Agent](#-implement-a-new-agent) &nbsp;&nbsp;|&nbsp;&nbsp;
1415
[↻ Reproducibility](#-reproducibility) &nbsp;&nbsp;|&nbsp;&nbsp;
1516

17+
[![pypi](https://badge.fury.io/py/agentlab.svg)](https://pypi.org/project/agentlab/)
1618
[![PyPI - License](https://img.shields.io/pypi/l/agentlab?style=flat-square)]([https://opensource.org/licenses/MIT](http://www.apache.org/licenses/LICENSE-2.0))
1719
[![PyPI - Downloads](https://img.shields.io/pypi/dm/agentlab?style=flat-square)](https://pypistats.org/packages/agentlab)
1820
[![GitHub star chart](https://img.shields.io/github/stars/ServiceNow/AgentLab?style=flat-square)](https://star-history.com/#ServiceNow/AgentLab)
21+
[![Code Format](https://github.com/ServiceNow/AgentLab/actions/workflows/code_format.yml/badge.svg)](https://github.com/ServiceNow/AgentLab/actions/workflows/code_format.yml)
22+
[![Tests](https://github.com/ServiceNow/AgentLab/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/ServiceNow/AgentLab/actions/workflows/unit_tests.yml)
23+
24+
25+
[🛠️ Setup](#%EF%B8%8F-setup-agentlab) &nbsp;&nbsp;|&nbsp;&nbsp;
26+
[🤖 Assistant](#-ui-assistant) &nbsp;&nbsp;|&nbsp;&nbsp;
27+
[🚀 Launch Experiments](#-launch-experiments) &nbsp;&nbsp;|&nbsp;&nbsp;
28+
[🔍 Analyse Results](#-analyse-results) &nbsp;&nbsp;|&nbsp;&nbsp;
29+
&nbsp;&nbsp;|&nbsp;&nbsp;
30+
[🤖 Build Your Agent](#-implement-a-new-agent) &nbsp;&nbsp;|&nbsp;&nbsp;
31+
[↻ Reproducibility](#-reproducibility)
1932

2033

21-
<video controls style="max-width: 700px;">
22-
<source src="https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85" type="video/mp4">
23-
Your browser does not support the video tag.
24-
</video>
2534

35+
https://github.com/ServiceNow/BrowserGym/assets/26232819/e0bfc788-cc8e-44f1-b8c3-0d1114108b85
2636

2737
AgentLab is a framework for developing and evaluating agents on a variety of
28-
[benchmarks](#🎯-supported-benchmarks) supported by
38+
[benchmarks](#-supported-benchmarks) supported by
2939
[BrowserGym](https://github.com/ServiceNow/BrowserGym).
3040

3141
AgentLab Features:
32-
* Easy large scale parallel [agent experiments](#🚀-launch-experiments) using [ray](https://www.ray.io/)
42+
* Easy large scale parallel [agent experiments](#-launch-experiments) using [ray](https://www.ray.io/)
3343
* Building blocks for making agents over BrowserGym
3444
* Unified LLM API for OpenRouter, OpenAI, Azure, or self hosted using TGI.
3545
* Prefered way for running benchmarks like WebArena
3646
* Various [reproducibility features](#reproducibility-features)
3747
* Unified LeaderBoard (soon)
3848

3949
## 🎯 Supported Benchmarks
50+
4051
| Benchmark | Setup <br> Link | # Task <br> Template| Seed <br> Diversity | Max <br> Step | Multi-tab | Hosted Method | BrowserGym <br> Leaderboard |
4152
|-----------|------------|---------|----------------|-----------|-----------|---------------|----------------------|
4253
| [WebArena](https://webarena.dev/) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/webarena/README.md) | 812 | None | 30 | yes | self hosted (docker) | soon |
@@ -45,10 +56,11 @@ AgentLab Features:
4556
| [WorkArena](https://github.com/ServiceNow/WorkArena) L3 | [setup](https://github.com/ServiceNow/WorkArena?tab=readme-ov-file#getting-started) | 341 | High | 50 | no | demo instance | soon |
4657
| [WebLinx](https://mcgill-nlp.github.io/weblinx/) | - | 31586 | None | 1 | no | self hosted (dataset) | soon |
4758
| [VisualWebArena](https://github.com/web-arena-x/visualwebarena) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/visualwebarena/README.md) | 910 | None | 30 | yes | self hosted (docker) | soon |
48-
| [Assistant Bench](https://assistantbench.github.io/) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/assistantbench/README.md) | 214 | None | 30 | yes | live web | soon |
59+
| [AssistantBench](https://assistantbench.github.io/) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/assistantbench/README.md) | 214 | None | 30 | yes | live web | soon |
4960
| [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) (soon) | - | - | None | - | - | live web | soon |
5061
| [Mind2Web-live](https://huggingface.co/datasets/iMeanAI/Mind2Web-Live) (soon) | - | - | None | - | - | live web | soon |
5162
| [MiniWoB](https://miniwob.farama.org/index.html) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/miniwob/README.md) | 125 | Medium | 10 | no | self hosted (static files) | soon |
63+
5264
## 🛠️ Setup
5365

5466
```bash
@@ -61,7 +73,7 @@ playwright install
6173
```
6274

6375
Make sure to prepare the required benchmark according to instructions provided in the [setup
64-
column](#🎯-supported-benchmarks).
76+
column](#-supported-benchmarks).
6577

6678
```bash
6779
export AGENTLAB_EXP_ROOT=<root directory of experiment results> # defaults to $HOME/agentlab_results
@@ -86,6 +98,7 @@ export AZURE_OPENAI_ENDPOINT=<your endpoint> # if using azure models
8698
</details>
8799

88100
## 🤖 UI-Assistant
101+
89102
Use an assistant to work for you (at your own cost and risk).
90103

91104
```bash
@@ -178,23 +191,15 @@ result_df = inspect_results.load_result_df("path/to/your/study")
178191

179192

180193
### AgentXray
181-
Inspect the behaviour of your agent using xray. You can load previous or ongoing experiments. The refresh mechanism is currently a bit clunky, but you can refresh the page, refresh the experiment directory list and select again your experiment to see an updated version of your currently running experiments.
182194

195+
https://github.com/user-attachments/assets/06c4dac0-b78f-45b7-9405-003da4af6b37
183196

197+
In a terminal, execute:
184198
```bash
185199
agentlab-xray
186200
```
187201

188-
**⚠️ Note**: Gradio is still in developement and unexpected behavior have been frequently noticed. Version 5.5 seems to work properly so far. If you're not sure that the proper information is displaying, refresh the page and select your experiment again.
189-
190-
191-
<video controls style="max-width: 800px;">
192-
<source src="https://github.com/user-attachments/assets/06c4dac0-b78f-45b7-9405-003da4af6b37" type="video/mp4">
193-
Your browser does not support the video tag.
194-
</video>
195-
196-
197-
You will be able to select the recent experiments in the directory `AGENTLAB_EXP_ROOT` and visualize
202+
You can load previous or ongoing experiments in the directory `AGENTLAB_EXP_ROOT` and visualize
198203
the results in a gradio interface.
199204

200205
In the following order, select:
@@ -206,14 +211,18 @@ In the following order, select:
206211
Once this is selected, you can see the trace of your agent on the given task. Click on the profiling
207212
image to select a step and observe the action taken by the agent.
208213

214+
215+
**⚠️ Note**: Gradio is still in developement and unexpected behavior have been frequently noticed. Version 5.5 seems to work properly so far. If you're not sure that the proper information is displaying, refresh the page and select your experiment again.
216+
217+
209218
## 🤖 Implement a new Agent
210219

211220
Get inspiration from the `MostBasicAgent` in
212221
[agentlab/agents/most_basic_agent/most_basic_agent.py](src/agentlab/agents/most_basic_agent/most_basic_agent.py).
213222
For a better integration with the tools, make sure to implement most functions in the
214223
[AgentArgs](src/agentlab/agents/agent_args.py#L5) API and the extended `bgym.AbstractAgentArgs`.
215224

216-
If you think your agent should be included directly in AgenLab, let use know and it can be added in
225+
If you think your agent should be included directly in AgenLab, let us know and it can be added in
217226
agentlab/agents/ with the name of your agent.
218227

219228
## ↻ Reproducibility
@@ -243,7 +252,7 @@ dynamic benchmarks.
243252
* **Reproduced results in the leaderboard**. For agents that are repdocudibile, we encourage users
244253
to try to reproduce the results and upload them to the leaderboard. There is a special column
245254
containing information about all reproduced results of an agent on a benchmark.
246-
* **ReproducibilityAgent**: You can run this agent on an existing study and it will try to re-run
255+
* **ReproducibilityAgent**: [You can run this agent](src/agentlab/agents/generic_agent/reproducibility_agent.py) on an existing study and it will try to re-run
247256
the same actions on the same task seeds. A vsiual diff of the two prompts will be displayed in the
248257
AgentInfo HTML tab of AgentXray. You will be able to inspect on some tasks what kind of changes
249258
between to two executions. **Note**: this is a beta feature and will need some adaptation for your

src/agentlab/experiments/study.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def make_study(
7676
agent_args = [agent_args]
7777

7878
if isinstance(benchmark, str):
79-
benchmark = bgym.DEFAULT_BENCHMARKS[benchmark]()
79+
benchmark = bgym.DEFAULT_BENCHMARKS[benchmark.lower()]()
8080

8181
if "webarena" in benchmark.name and len(agent_args) > 1:
8282
logger.warning(
@@ -220,7 +220,7 @@ def __post_init__(self):
220220
"""Initialize the study. Set the uuid, and generate the exp_args_list."""
221221
self.uuid = uuid.uuid4()
222222
if isinstance(self.benchmark, str):
223-
self.benchmark = bgym.DEFAULT_BENCHMARKS[self.benchmark]()
223+
self.benchmark = bgym.DEFAULT_BENCHMARKS[self.benchmark.lower()]()
224224
if isinstance(self.dir, str):
225225
self.dir = Path(self.dir)
226226
self.make_exp_args_list()

0 commit comments

Comments
 (0)