File tree Expand file tree Collapse file tree 2 files changed +10
-7
lines changed
tutorials/2_eval_on_miniwob Expand file tree Collapse file tree 2 files changed +10
-7
lines changed Original file line number Diff line number Diff line change 1111from agentlab .benchmarks .setup_benchmark import ensure_benchmark
1212from agentlab .experiments .study import Study
1313
14- project_dir = Path (__file__ ).parents [2 ]
1514# This ensures MiniWob assets are downloaded and sets the MINIWOB_URL .env in the project dir.
15+ project_dir = Path (__file__ ).parents [2 ]
1616ensure_benchmark ("miniwob" , project_root = project_dir )
1717load_dotenv (project_dir .joinpath (".env" ), override = False ) # load .env variables
1818
Original file line number Diff line number Diff line change 11### Launch tutorial experiment
22* Activate the python env ` source .venv/bin/activate `
3- * Run the following command from ** AgentLab** directory ` python tutorials/2_eval_on_miniwob/experiment.py ` .
3+ * Run the following command from ** AgentLab** directory [ ` python tutorials/2_eval_on_miniwob/experiment.py ` ] ( ./experiment.py ) .
44* This should launch your agent on 4 miniwob tasks in parallel and save results to ` $HOME/agentlab_results ` .
55
66
1010* Navigate around XRay to visualize the agent's behavior
1111
1212### (optional) Load your results in a notebook
13- Optionally load the results in ` inspect_results.ipynb ` .
13+ Optionally load the results in [ inspect_results.ipynb] ( ./inspect_results.ipynb ) .
1414This will help understand the content of results
1515
1616### Change the experiment
17- go back to ` experiment.py ` and uncomment the line
17+ go back to [ experiment.py] ( ./experiment.py ) and uncomment the lines
1818``` python
1919benchmark = DEFAULT_BENCHMARKS [" miniwob" ]() # 125 tasks
20- benchmark = benchmark.subset_from_glob(column = " task_name" , glob = " *enter*" )
20+ benchmark = benchmark.subset_from_glob(column = " task_name" , glob = " *enter*" ) # filter only 7 tasks
2121```
2222This will change the benchmark from ` "miniwob_tiny_test" ` to all the ` "*enter*" ` tasks in the full miniwob.
2323
24- Run the experiment again. This will launch on 7 tasks with 5 seeds each.
24+ Run the experiment again. This will launch on 7 tasks with 5 seeds each (35 experiments)
2525
26- Finally refresh XRay to be able to load the new experiment.
26+ Finally refresh XRay to be able to load the new experiment.
27+
28+ ### Exercise 1
29+ What are the failures and why?
You can’t perform that action at this time.
0 commit comments