Add WebArena Lite #368

amanjaiswal73892 · 2025-09-09T18:28:33Z

This pull request introduces the new "WebArena Lite" benchmark to the BrowserGym project. It adds a new package (browsergym/webarenalite) with its own configuration, tasks, and dependencies, and integrates this benchmark into the experiment and environment selection logic. The key changes are grouped below:

WebArena Lite Benchmark Integration

Added a new editable package, browsergym/webarenalite, to the project installation process in the Makefile, ensuring it is installed with other BrowserGym components.
Registered the new "webarena_lite" benchmark in the experiment configuration, allowing it to be selected and used like existing benchmarks.
Updated environment selection logic to import browsergym.webarenalite when a task name starts with "webarena", ensuring tasks are correctly registered and available.

New Package: browsergym/webarenalite

Added the package definition (pyproject.toml) and dependencies (requirements.txt) for browsergym-webarenalite, specifying its requirements and build configuration. [1] [2]
Implemented the initialization logic in __init__.py to register all WebArena Lite tasks using task IDs from the new config, and ensured necessary NLTK resources are available at runtime.
Defined the full list of task IDs for WebArena Lite in config.py, providing a central reference for all tasks included in the benchmark.

Benchmark Metadata

Added a comprehensive metadata CSV file (webarenalite.csv) listing all tasks, their properties, and dependencies for the WebArena Lite benchmark.

Description by Korbit AI

What change is being made?

Add WebArena Lite as a new benchmark to the BrowserGym project, including configuration, setup, and evaluation logic for tasks, as well as the necessary dependencies.

Why are these changes being made?

The addition of WebArena Lite expands the benchmarks available within the BrowserGym ecosystem, enabling testing of lightweight tasks that require simpler environments and possibly fewer computational resources. This new benchmark aims to provide broader test coverage and flexibility for benchmarking web-based AI models. The integration includes configuration files, task registration, and evaluation logic tailored for the WebArena Lite environment, mimicking successful patterns from similar benchmark implementations like WebArena and VisualAgentBench.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai · 2025-09-09T18:28:38Z

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

browsergym/webarenalite/src/browsergym/webarenalite/evaluators.py

fbaltor · 2025-09-28T22:51:09Z

Hey, doesn't this package need to be added as a dependency to browsergym/pyproject.toml?

amanjaiswal73892 · 2025-09-30T15:54:19Z

Hey, doesn't this package need to be added as a dependency to browsergym/pyproject.toml?

It indeed does and will be included before the next release :) .

* Add webarenalite * use old_task_id for webarena lite * include only the lite metadata in webarenalite metadata * Add code acknowledgement * black * update test * remove unused import

amanjaiswal73892 added 4 commits September 8, 2025 23:44

Add webarenalite

748bfe8

use old_task_id for webarena lite

c66bb1d

include only the lite metadata in webarenalite metadata

d2ce90e

Add code acknowledgement

7031046

amanjaiswal73892 requested review from jardinetsouffleton and recursix September 9, 2025 18:30

recursix previously approved these changes Sep 9, 2025

View reviewed changes

browsergym/webarenalite/src/browsergym/webarenalite/evaluators.py Show resolved Hide resolved

amanjaiswal73892 added 2 commits September 9, 2025 19:11

black

8cb519d

update test

8b47787

amanjaiswal73892 dismissed recursix’s stale review via 8b47787 September 9, 2025 19:12

remove unused import

aefcea8

jardinetsouffleton approved these changes Sep 10, 2025

View reviewed changes

amanjaiswal73892 merged commit 86c0566 into main Sep 10, 2025
12 of 13 checks passed

amanjaiswal73892 deleted the webarenalite branch September 10, 2025 18:05

fbaltor mentioned this pull request Sep 22, 2025

Add new experiments/tasks ServiceNow/AgentLab#297

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add WebArena Lite #368

Add WebArena Lite #368

Uh oh!

amanjaiswal73892 commented Sep 9, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

fbaltor commented Sep 28, 2025

Uh oh!

amanjaiswal73892 commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add WebArena Lite #368

Add WebArena Lite #368

Uh oh!

Conversation

amanjaiswal73892 commented Sep 9, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

fbaltor commented Sep 28, 2025

Uh oh!

amanjaiswal73892 commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

amanjaiswal73892 commented Sep 9, 2025 •

edited by korbit-ai bot

Loading