Commit f0e4275
WebArena Verified (#377)
* init commit for webarena verified
* upd Makefile
* adding the basic files
* update dependencies
* start adding integration with wa_verified
* upd readme
* use custom backend for webarena_verified
* pass the wa instance to the evaluator
* pass the wa instance to the evaluator
* cleanup evaluator
* remove custom webarena verified instance
* update requirements to latest wav code
* use simpler and cleaner wav eval
* enable tracing
* fix wav
* update to new webarena verified version
* update task name template to webarena_verified.templateID.taskID
* fix config
* fix csv file
* add webarena_verified backend
* fix wav tasks
* do not check reachable if url is todo
* fix tmp trace creation, update goal to prompt model to satisfy wav return format/
* create webarena_verified action space with special submit function to match the benchmark expected agent response format
* look for extra header file path in environment variable
* undo special action set for webarena_verified
* remove wav actions
* load extra context headers for webarena(+lite)
* update README
* update requirements
* update makefile and readme
* update readme
* update requirements
* update readme
* update test
* black formater
* upd makefile
* update to new webarena_verified dataset version
* small debug
* add massage of shopping_admin tasks
* assume all endpoints are running
* update to latest version before the public release
* update instructions to fetch latest version before the public release
* exponential backoff
* update README
* compare json with the one in the library
* update install instructions
* update makefile
* update pypi deployment with webarena-verified
* fix assets directory
* fix task id template
* remove task json file, use the one from the webarena-verified library. Update task template to include revision number
* remove metadata and create it dynamically
* do not hardcode revision number
* fix
* run black formater
* fix format?
* always create the metadata file
* version-bump-dev
* Remove git dependency and add ins to install from source
* version-bump-dev 0.14.3.dev3
* add webarena-verified package as a dependency
* version-bump-dev 0.14.3.dev4
* add webarena-verified in the dev requirements.txt
* update gitignore
---------
Co-authored-by: Nicolas Gontier <nicolas.gontier@servicenow.com>
Co-authored-by: Aman Jaiswal <amanjaiswal73892@gmail.com>1 parent 2fe88fd commit f0e4275
File tree
32 files changed
+1512
-29
lines changed- .github/workflows
- browsergym
- assistantbench
- src/browsergym/assistantbench/evaluation/evaluate_utils
- core/src/browsergym/core
- experiments
- src/browsergym/experiments
- benchmark
- metadata
- miniwob
- visualwebarena
- webarena_verified
- src/browsergym/webarena_verified
- webarenalite
- src/browsergym/webarenalite
- webarena
- src/browsergym/webarena
- dev
- docs/src
- tests/experiments
32 files changed
+1512
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
33 | 38 | | |
34 | 39 | | |
35 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
55 | 56 | | |
56 | 57 | | |
57 | 58 | | |
| 59 | + | |
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
| |||
69 | 71 | | |
70 | 72 | | |
71 | 73 | | |
| 74 | + | |
72 | 75 | | |
73 | 76 | | |
74 | 77 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
| 72 | + | |
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
57 | 63 | | |
58 | 64 | | |
59 | 65 | | |
| |||
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
| 2 | + | |
2 | 3 | | |
3 | 4 | | |
4 | 5 | | |
| |||
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
135 | 151 | | |
136 | 152 | | |
137 | 153 | | |
| |||
252 | 268 | | |
253 | 269 | | |
254 | 270 | | |
255 | | - | |
| 271 | + | |
| 272 | + | |
256 | 273 | | |
257 | 274 | | |
258 | 275 | | |
| |||
0 commit comments