Skip to content

Conversation

@amanjaiswal73892
Copy link
Collaborator

@amanjaiswal73892 amanjaiswal73892 commented Sep 9, 2025

Add WebArena Lite from VisualAgentBench


This pull request introduces the new "WebArena Lite" benchmark to the BrowserGym project. It adds a new package (browsergym/webarenalite) with its own configuration, tasks, and dependencies, and integrates this benchmark into the experiment and environment selection logic. The key changes are grouped below:

WebArena Lite Benchmark Integration

  • Added a new editable package, browsergym/webarenalite, to the project installation process in the Makefile, ensuring it is installed with other BrowserGym components.
  • Registered the new "webarena_lite" benchmark in the experiment configuration, allowing it to be selected and used like existing benchmarks.
  • Updated environment selection logic to import browsergym.webarenalite when a task name starts with "webarena", ensuring tasks are correctly registered and available.

New Package: browsergym/webarenalite

  • Added the package definition (pyproject.toml) and dependencies (requirements.txt) for browsergym-webarenalite, specifying its requirements and build configuration. [1] [2]
  • Implemented the initialization logic in __init__.py to register all WebArena Lite tasks using task IDs from the new config, and ensured necessary NLTK resources are available at runtime.
  • Defined the full list of task IDs for WebArena Lite in config.py, providing a central reference for all tasks included in the benchmark.

Benchmark Metadata

  • Added a comprehensive metadata CSV file (webarenalite.csv) listing all tasks, their properties, and dependencies for the WebArena Lite benchmark.

Description by Korbit AI

What change is being made?

Add WebArena Lite as a new benchmark to the BrowserGym project, including configuration, setup, and evaluation logic for tasks, as well as the necessary dependencies.

Why are these changes being made?

The addition of WebArena Lite expands the benchmarks available within the BrowserGym ecosystem, enabling testing of lightweight tasks that require simpler environments and possibly fewer computational resources. This new benchmark aims to provide broader test coverage and flexibility for benchmarking web-based AI models. The integration includes configuration files, task registration, and evaluation logic tailored for the WebArena Lite environment, mimicking successful patterns from similar benchmark implementations like WebArena and VisualAgentBench.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@korbit-ai
Copy link

korbit-ai bot commented Sep 9, 2025

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

recursix
recursix previously approved these changes Sep 9, 2025
@amanjaiswal73892 amanjaiswal73892 merged commit 86c0566 into main Sep 10, 2025
12 of 13 checks passed
@amanjaiswal73892 amanjaiswal73892 deleted the webarenalite branch September 10, 2025 18:05
@fbaltor
Copy link

fbaltor commented Sep 28, 2025

Hey, doesn't this package need to be added as a dependency to browsergym/pyproject.toml?

@amanjaiswal73892
Copy link
Collaborator Author

Hey, doesn't this package need to be added as a dependency to browsergym/pyproject.toml?

It indeed does and will be included before the next release :) .

layahaasini pushed a commit to layahaasini/BrowserGym that referenced this pull request Nov 21, 2025
* Add webarenalite

* use old_task_id for webarena lite

* include only the lite metadata in webarenalite metadata

* Add code acknowledgement

* black

* update test

* remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants