Some changes in preparation for agentlab's new ToolUseAgent and the new APIs #340

TLSDC · 2025-04-23T18:46:08Z

Description by Korbit AI

What change is being made?

Upgrade BrowserGym to version 0.14.0, introduce support for new agent types and APIs, add a new scrolling function, and modify environment configuration to allow raw page output.

Why are these changes being made?

These updates are made to support the upcoming ToolUseAgent and new APIs in agentlab by enhancing action description conversion, updating dependencies to the latest version, and providing flexibility in the environment's observation output to accommodate raw page data when required. Additionally, code cleanup and optimizations in requirements and imports have been performed for better maintainability.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai · 2025-04-23T18:46:12Z

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

Copilot

Pull Request Overview

This PR prepares the codebase for integration with agentlab’s new ToolUseAgent and APIs by removing legacy benchmark utilities and introducing new tooling methods and environment observation options.

Removed legacy benchmark files and associated imports.
Added a new parameter and observation space branch for using raw page output in the browser environment.
Extended high-level action functionality with a to_tool_description method for API integration.

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
browsergym/experiments/src/browsergym/experiments/benchmark/*	Deprecated benchmark code removed to support the new API integration.
browsergym/experiments/src/bgym/init.py	Removed benchmark import references.
browsergym/core/src/browsergym/core/env.py	Added use_raw_page_output support with updated observation space.
browsergym/core/src/browsergym/core/action/highlevel.py	Introduced to_tool_description method to translate actions to tool descriptors.
browsergym/core/src/browsergym/core/action/base.py	Added a placeholder for to_tool_descriptor; note the naming inconsistency with highlevel.py.

Files not reviewed (2)

browsergym/experiments/src/browsergym/experiments/benchmark/metadata/assistantbench.csv: Language not supported
browsergym/experiments/src/browsergym/experiments/benchmark/metadata/miniwob.csv: Language not supported

Comments suppressed due to low confidence (3)

browsergym/core/src/browsergym/core/action/base.py:37

There is an inconsistency between 'to_tool_descriptor' in base.py and the implemented 'to_tool_description' in highlevel.py. Consider aligning the naming convention across these classes.

def to_tool_descriptor(self) -> list[Any]:

browsergym/core/src/browsergym/core/env.py:561

The function _get_obs() uses 'copy.deepcopy' but there is no corresponding 'import copy' in the file. Please add 'import copy' at the top.

copy.deepcopy(self.chat.messages)

browsergym/core/src/browsergym/core/env.py:572

The code calls 'time.time()' but there is no visible import for the 'time' module. Ensure that 'import time' is added to avoid runtime errors.

np.asarray([time.time() - self.start_time])

browsergym/core/src/browsergym/core/action/highlevel.py

Co-authored-by: Copilot <[email protected]>

recursix · 2025-04-24T17:41:19Z

browsergym/core/src/browsergym/core/action/highlevel.py

+            for param_name, param in signature.parameters.items():
+                param_type = "string"  # Default to string if type is not specified
+                if param.annotation != inspect.Parameter.empty:
+                    if param.annotation == str:


This if/else is just a mapping. This could be done more concisely with a dict.

type_map = { str: "string", float: "number", int: "number", bool: "boolean", dict: "object", list: "array" }

recursix · 2025-04-24T17:47:05Z

browsergym/core/src/browsergym/core/action/highlevel.py

        # return the constructed python code
        return python_code

+    def to_tool_description(self, api="openai") -> list[dict]:


Seems like a good candidate for an easy unit test. But perhaps we could isolate it into a separate function first. Perhaps tool = get_tool(tool_name, action).

…Gym into tlsdc/tool_use_agents

browsergym/experiments/src/browsergym/experiments/loop.py

…r browser launch args in env.py; clean up utils.py

…onsistency

recursix · 2025-05-20T14:57:29Z

browsergym/core/src/browsergym/core/action/highlevel.py

+        tools = []
+        for tool_name, action in self.action_set.items():
+            # Parse the signature to extract parameter names and types
+            parameters = {"type": "object", "properties": {}, "required": []}


Isolate from here into a function get_tool(name, action) -> tool and unit test it.

…istency; update browser launch args comments in env.py

…d functionality; add main execution block for direct testing

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Unsafe Raw Page Object Exposure ▹ view
	Observation extraction logic lacks clear structure ▹ view
	Replace type mapping chain with dictionary ▹ view
	Uncached Function Introspection ▹ view
	Missing API Validation in Tool Description Generation ▹ view

Files scanned

File Path	Reviewed
browsergym/core/src/browsergym/core/init.py	✅
browsergym/experiments/src/bgym/init.py	✅
browsergym/core/src/browsergym/core/action/base.py	✅
browsergym/experiments/src/browsergym/experiments/benchmark/configs.py	✅
browsergym/experiments/src/browsergym/experiments/benchmark/utils.py	✅
browsergym/core/src/browsergym/core/action/functions.py	✅
browsergym/core/src/browsergym/core/action/highlevel.py	✅
browsergym/experiments/src/browsergym/experiments/loop.py	✅
browsergym/core/src/browsergym/core/env.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-05-20T16:10:48Z

browsergym/core/src/browsergym/core/env.py

-
+        if self.use_raw_page_output:
+            obs = {
+                "page": self.page,


Unsafe Raw Page Object Exposure

Tell me more

What is the issue?

Direct exposure of the raw Playwright page object in the observation space when use_raw_page_output=True.

Why this matters

Exposing the raw browser page object gives unrestricted access to potentially sensitive page data and browser control methods that could be exploited maliciously.

Suggested change ∙ Feature Preview

Instead of exposing the raw page object, create a restricted interface that only exposes necessary page properties and methods:

"page": { "url": self.page.url, "title": self.page.title(), # Add other needed properties with proper sanitization }

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-20T16:10:48Z

browsergym/core/src/browsergym/core/env.py

    def _get_obs(self):
-
+        if self.use_raw_page_output:
+            obs = {
+                "page": self.page,
+                "chat_messages": tuple(copy.deepcopy(self.chat.messages)),
+                "goal": _try_to_extract_legacy_goal(self.goal_object),  # legacy goal, deprecated
+                "goal_object": tuple(
+                    copy.deepcopy(self.goal_object)
+                ),  # new goal format, list of messages openai style
+                "open_pages_urls": tuple(page.url for page in self.context.pages),
+                "open_pages_titles": tuple(page.title() for page in self.context.pages),
+                "active_page_index": np.asarray([self.context.pages.index(self.page)]),
+                "url": self.page.url,  # redundant with "open_pages_urls" and "active_page_index"
+                "last_action": self.last_action,
+                "last_action_error": self.last_action_error,
+                "elapsed_time": np.asarray([time.time() - self.start_time]),
+            }
+            return obs
        for retries_left in reversed(range(EXTRACT_OBS_MAX_TRIES)):
            try:


Observation extraction logic lacks clear structure

Tell me more

What is the issue?

The _get_obs method has a complex structure with two very different code paths and duplicated field extraction logic.

Why this matters

The method's structure makes it hard to understand what fields are common between the two observation types and obscures the relationship between the observation_space definition and actual observation creation.

Suggested change ∙ Feature Preview

Split the method into smaller, focused functions:

def _get_obs(self): base_obs = self._get_base_observation() if self.use_raw_page_output: return base_obs return {**base_obs, **self._get_extended_observation()} def _get_base_observation(self): return { "chat_messages": tuple(copy.deepcopy(self.chat.messages)), # ... other common fields } def _get_extended_observation(self): return self._extract_with_retries()

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-20T16:10:49Z

browsergym/core/src/browsergym/core/action/highlevel.py

+                param_type = "string"  # Default to string if type is not specified
+                if param.annotation != inspect.Parameter.empty:
+                    if param.annotation is str:
+                        param_type = "string"
+                    elif param.annotation is float or param.annotation is int:
+                        param_type = "number"
+                    elif param.annotation is bool:
+                        param_type = "boolean"
+                    elif param.annotation is dict:
+                        param_type = "object"
+                    elif param.annotation is list:
+                        param_type = "array"


Replace type mapping chain with dictionary

Tell me more

What is the issue?

Long type mapping chain using if-elif statements could be replaced with a more readable mapping dictionary.

Why this matters

The current nested if-else structure makes the code harder to scan and maintain. Future type additions would require extending the if-else chain.

Suggested change ∙ Feature Preview

TYPE_MAPPING = { str: "string", float: "number", int: "number", bool: "boolean", dict: "object", list: "array" } param_type = "string" # Default to string if type is not specified if param.annotation != inspect.Parameter.empty: param_type = TYPE_MAPPING.get(param.annotation, "string")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-20T16:10:49Z

browsergym/core/src/browsergym/core/action/highlevel.py

+            parameters = {"type": "object", "properties": {}, "required": []}
+            signature = inspect.signature(globals()[tool_name])
+            for param_name, param in signature.parameters.items():


Uncached Function Introspection

Tell me more

What is the issue?

The to_tool_description method performs introspection inside a loop for each action, which is an expensive operation that could be cached.

Why this matters

Repeatedly calling inspect.signature() for the same functions across multiple calls creates unnecessary overhead, especially if this method is called frequently.

Suggested change ∙ Feature Preview

Cache the function signatures and parameter information during initialization or as a class variable to avoid repeated introspection:

def __init__(self, ...): # existing code self._cached_signatures = {} for tool_name in self.action_set: self._cached_signatures[tool_name] = { 'signature': inspect.signature(globals()[tool_name]), 'parameters': self._extract_parameters(globals()[tool_name]) }

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-20T16:10:49Z

browsergym/core/src/browsergym/core/action/highlevel.py

        # return the constructed python code
        return python_code

+    def to_tool_description(self, api="openai") -> list[dict]:


Missing API Validation in Tool Description Generation

Tell me more

What is the issue?

The to_tool_description method defaults to 'openai' API but lacks validation for unsupported API values.

Why this matters

Invalid API values could lead to unexpected behavior as the method will use 'parameters' as the default schema key for any unrecognized API value.

Suggested change ∙ Feature Preview

Add validation for supported API values:

def to_tool_description(self, api="openai") -> list[dict]: supported_apis = {"openai", "anthropic"} if api not in supported_apis: raise ValueError(f"Unsupported API: {api}. Must be one of {supported_apis}") schema_keys = { "openai": "parameters", "anthropic": "input_schema", } schema = schema_keys[api] # Will now always be valid

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

…ew APIs (ServiceNow#340) * added openai/anthropic compatible tool json descriptions * added raw page as possible output to reduce overhead * moving the benchmark section out of bgym (to agentlab) * Update browsergym/core/src/browsergym/core/action/highlevel.py Co-authored-by: Copilot <[email protected]> * adding benchmark stuff back in * backtracking * backtracking everything * brought back the 'eval' bc we need it actually * fixing broken imports * version bump * switching eval func to globals() * Update loop.py * Update browsergym/experiments/src/browsergym/experiments/loop.py * Add scroll_at function and update references in highlevel.py; refactor browser launch args in env.py; clean up utils.py * Replace 'scroll' with 'scroll_at' in ACTION_SUBSETS for clarity and consistency * Refactor ACTION_SUBSETS to replace 'scroll_at' with 'scroll' for consistency; update browser launch args comments in env.py * Update test_scroll to include 'bid' in action_set subsets for improved functionality; add main execution block for direct testing --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: recursix <[email protected]>

TLSDC added 3 commits April 22, 2025 16:18

added openai/anthropic compatible tool json descriptions

79d4141

added raw page as possible output to reduce overhead

80e8843

moving the benchmark section out of bgym (to agentlab)

31c3224

TLSDC mentioned this pull request Apr 23, 2025

Dev branch for the ToolUseAgent ServiceNow/AgentLab#239

Closed

TLSDC marked this pull request as draft April 23, 2025 18:48

TLSDC requested review from amanjaiswal73892 and Copilot April 23, 2025 18:48

Copilot AI reviewed Apr 23, 2025

View reviewed changes

browsergym/core/src/browsergym/core/action/highlevel.py Outdated Show resolved Hide resolved

TLSDC requested a review from recursix April 23, 2025 19:00

Update browsergym/core/src/browsergym/core/action/highlevel.py

3d19bea

Co-authored-by: Copilot <[email protected]>

recursix reviewed Apr 24, 2025

View reviewed changes

recursix previously approved these changes Apr 24, 2025

View reviewed changes

TLSDC added 3 commits May 2, 2025 15:20

adding benchmark stuff back in

6de7a02

backtracking

5911c1e

Merge branch 'tlsdc/tool_use_agents' of github.com:ServiceNow/Browser…

be39f6f

…Gym into tlsdc/tool_use_agents

TLSDC dismissed recursix’s stale review via be39f6f May 2, 2025 20:44

TLSDC and others added 5 commits May 2, 2025 16:45

Merge branch 'main' into tlsdc/tool_use_agents

2703be9

backtracking everything

ecb4632

Merge branch 'tlsdc/tool_use_agents' of github.com:ServiceNow/Browser…

26ac1e9

…Gym into tlsdc/tool_use_agents

brought back the 'eval' bc we need it actually

be0e45d

fixing broken imports

da73bf2

TLSDC requested a review from recursix May 2, 2025 21:06

TLSDC and others added 3 commits May 2, 2025 17:08

version bump

709332a

switching eval func to globals()

fca18ff

Update loop.py

3dca1b7

TLSDC commented May 7, 2025

View reviewed changes

browsergym/experiments/src/browsergym/experiments/loop.py Outdated Show resolved Hide resolved

TLSDC and others added 2 commits May 7, 2025 20:16

Update browsergym/experiments/src/browsergym/experiments/loop.py

808ce72

Add scroll_at function and update references in highlevel.py; refacto…

02978dd

…r browser launch args in env.py; clean up utils.py

Replace 'scroll' with 'scroll_at' in ACTION_SUBSETS for clarity and c…

67e7302

…onsistency

recursix reviewed May 20, 2025

View reviewed changes

recursix added 2 commits May 20, 2025 10:57

Refactor ACTION_SUBSETS to replace 'scroll_at' with 'scroll' for cons…

b2583f9

…istency; update browser launch args comments in env.py

Update test_scroll to include 'bid' in action_set subsets for improve…

616f6b9

…d functionality; add main execution block for direct testing

amanjaiswal73892 approved these changes May 20, 2025

View reviewed changes

amanjaiswal73892 marked this pull request as ready for review May 20, 2025 16:05

amanjaiswal73892 merged commit 6bc4c51 into main May 20, 2025
13 checks passed

amanjaiswal73892 deleted the tlsdc/tool_use_agents branch May 20, 2025 16:05

korbit-ai bot reviewed May 20, 2025

View reviewed changes

amanjaiswal73892 mentioned this pull request Jul 16, 2025

Tool Describe for AbstractActionSet #336

Closed

Some changes in preparation for agentlab's new ToolUseAgent and the new APIs #340

Some changes in preparation for agentlab's new ToolUseAgent and the new APIs #340

Uh oh!

Conversation

TLSDC commented Apr 23, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot commented Apr 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

recursix Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

recursix Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

recursix Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

recursix May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot May 20, 2025

Choose a reason for hiding this comment

Unsafe Raw Page Object Exposure

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 20, 2025

Choose a reason for hiding this comment

Observation extraction logic lacks clear structure

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 20, 2025

Choose a reason for hiding this comment

Replace type mapping chain with dictionary

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 20, 2025

Choose a reason for hiding this comment

Uncached Function Introspection

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 20, 2025

Choose a reason for hiding this comment

Missing API Validation in Tool Description Generation

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Reviewers

TLSDC commented Apr 23, 2025 •

edited by korbit-ai bot

Loading

recursix Apr 24, 2025 •

edited

Loading

korbit-ai bot left a comment •

edited

Loading