Skip to content

Commit 3e4b72e

Browse files
committed
Update README and experiment script for clarity and consistency
1 parent 450105d commit 3e4b72e

File tree

2 files changed

+10
-7
lines changed

2 files changed

+10
-7
lines changed

tutorials/2_eval_on_miniwob/experiment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@
1111
from agentlab.benchmarks.setup_benchmark import ensure_benchmark
1212
from agentlab.experiments.study import Study
1313

14-
project_dir = Path(__file__).parents[2]
1514
# This ensures MiniWob assets are downloaded and sets the MINIWOB_URL .env in the project dir.
15+
project_dir = Path(__file__).parents[2]
1616
ensure_benchmark("miniwob", project_root=project_dir)
1717
load_dotenv(project_dir.joinpath(".env"), override=False) # load .env variables
1818

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
### Launch tutorial experiment
22
* Activate the python env `source .venv/bin/activate`
3-
* Run the following command from **AgentLab** directory `python tutorials/2_eval_on_miniwob/experiment.py`.
3+
* Run the following command from **AgentLab** directory [`python tutorials/2_eval_on_miniwob/experiment.py`](./experiment.py).
44
* This should launch your agent on 4 miniwob tasks in parallel and save results to `$HOME/agentlab_results`.
55

66

@@ -10,17 +10,20 @@
1010
* Navigate around XRay to visualize the agent's behavior
1111

1212
### (optional) Load your results in a notebook
13-
Optionally load the results in `inspect_results.ipynb`.
13+
Optionally load the results in [inspect_results.ipynb](./inspect_results.ipynb).
1414
This will help understand the content of results
1515

1616
### Change the experiment
17-
go back to `experiment.py` and uncomment the line
17+
go back to [experiment.py](./experiment.py) and uncomment the lines
1818
```python
1919
benchmark = DEFAULT_BENCHMARKS["miniwob"]() # 125 tasks
20-
benchmark = benchmark.subset_from_glob(column="task_name", glob="*enter*")
20+
benchmark = benchmark.subset_from_glob(column="task_name", glob="*enter*") # filter only 7 tasks
2121
```
2222
This will change the benchmark from `"miniwob_tiny_test"` to all the `"*enter*"` tasks in the full miniwob.
2323

24-
Run the experiment again. This will launch on 7 tasks with 5 seeds each.
24+
Run the experiment again. This will launch on 7 tasks with 5 seeds each (35 experiments)
2525

26-
Finally refresh XRay to be able to load the new experiment.
26+
Finally refresh XRay to be able to load the new experiment.
27+
28+
### Exercise 1
29+
What are the failures and why?

0 commit comments

Comments
 (0)