You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Easy large scale parallel agent experiments using [ray](https://www.ray.io/)
27
-
* Building blocks for making agents
28
-
* Unified LLM api for OpenRouter, OpenAI, Azure, or self hosted using TGI.
31
+
* Easy large scale parallel [agent experiments](#🚀-launch-experiments) using [ray](https://www.ray.io/)
32
+
* Building blocks for making agents over BrowserGym
33
+
* Unified LLM API for OpenRouter, OpenAI, Azure, or self hosted using TGI.
29
34
* Prefered way for running benchmarks like WebArena
30
-
* Various Reproducibility features
35
+
* Various [reproducibility features](#reproducibility-features)
31
36
* Unified LeaderBoard (soon)
32
37
33
38
## 🎯 Supported Benchmarks
@@ -131,8 +136,7 @@ experiments.
131
136
### Debugging
132
137
133
138
For debugging, run experiments with `n_jobs=1` and use VSCode's debug mode. This allows you to pause
134
-
execution at breakpoints. To prevent the debugger from stopping on errors while running multiple
135
-
experiments in VSCode, set `enable_debug = False` in `ExpArgs`.
139
+
execution at breakpoints.
136
140
137
141
### About Parallel Jobs
138
142
@@ -155,14 +159,28 @@ each agent. AgentLab currently does not support evaluations across multiple inst
155
159
either create a quick script to handle this or submit a PR to AgentLab. For a smoother parallel
156
160
experience, consider using benchmarks like WorkArena instead.
157
161
162
+
## 🔍 Analyse Results
163
+
164
+
### Loading Results
165
+
166
+
The class [`ExpResult`](https://github.com/ServiceNow/BrowserGym/blob/da26a5849d99d9a3169d7b1fde79f909c55c9ba7/browsergym/experiments/src/browsergym/experiments/loop.py#L595) provides a lazy loader for all the information of a specific experiment. You can use [`yield_all_exp_results`](https://github.com/ServiceNow/BrowserGym/blob/da26a5849d99d9a3169d7b1fde79f909c55c9ba7/browsergym/experiments/src/browsergym/experiments/loop.py#L872) to recursivley find all results in a directory. Finally [`load_result_df`](https://github.com/ServiceNow/AgentLab/blob/be1998c5fad5bda47ba50497ec3899aae03e85ec/src/agentlab/analyze/inspect_results.py#L119C5-L119C19) gathers all the summary information in a single dataframe. See [`inspect_results.ipynb`](src/agentlab/analyze/inspect_results.ipynb) for example usage.
Inspect the behaviour of your agent using xray. You can load previous or ongoing experiments. The refresh mechanism is currently a bit clunky, but you can refresh the page, refresh the experiment directory list and select again your experiment to see an updated version of your currently running experiments.
158
176
159
-
## 🔍 AgentXray
160
-
While your experiments are running, you can inspect the results using:
161
177
162
178
```bash
163
179
agentlab-xray
164
180
```
165
181
182
+
**⚠️ Note**: Gradio is still in developement and unexpected behavior have been frequently noticed. Version 5.5 seems to work properly so far. If you're not sure that the proper information is displaying, refresh the page and select your experiment again.
0 commit comments