Skip to content

Conversation

@TLSDC
Copy link
Collaborator

@TLSDC TLSDC commented Apr 23, 2025

Description by Korbit AI

What change is being made?

Upgrade BrowserGym to version 0.14.0, introduce support for new agent types and APIs, add a new scrolling function, and modify environment configuration to allow raw page output.

Why are these changes being made?

These updates are made to support the upcoming ToolUseAgent and new APIs in agentlab by enhancing action description conversion, updating dependencies to the latest version, and providing flexibility in the environment's observation output to accommodate raw page data when required. Additionally, code cleanup and optimizations in requirements and imports have been performed for better maintainability.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@korbit-ai
Copy link

korbit-ai bot commented Apr 23, 2025

Korbit doesn't automatically review large (3000+ lines changed) pull requests such as this one. If you want me to review anyway, use /korbit-review.

@TLSDC TLSDC marked this pull request as draft April 23, 2025 18:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prepares the codebase for integration with agentlab’s new ToolUseAgent and APIs by removing legacy benchmark utilities and introducing new tooling methods and environment observation options.

  • Removed legacy benchmark files and associated imports.
  • Added a new parameter and observation space branch for using raw page output in the browser environment.
  • Extended high-level action functionality with a to_tool_description method for API integration.

Reviewed Changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
browsergym/experiments/src/browsergym/experiments/benchmark/* Deprecated benchmark code removed to support the new API integration.
browsergym/experiments/src/bgym/init.py Removed benchmark import references.
browsergym/core/src/browsergym/core/env.py Added use_raw_page_output support with updated observation space.
browsergym/core/src/browsergym/core/action/highlevel.py Introduced to_tool_description method to translate actions to tool descriptors.
browsergym/core/src/browsergym/core/action/base.py Added a placeholder for to_tool_descriptor; note the naming inconsistency with highlevel.py.
Files not reviewed (2)
  • browsergym/experiments/src/browsergym/experiments/benchmark/metadata/assistantbench.csv: Language not supported
  • browsergym/experiments/src/browsergym/experiments/benchmark/metadata/miniwob.csv: Language not supported
Comments suppressed due to low confidence (3)

browsergym/core/src/browsergym/core/action/base.py:37

  • There is an inconsistency between 'to_tool_descriptor' in base.py and the implemented 'to_tool_description' in highlevel.py. Consider aligning the naming convention across these classes.
def to_tool_descriptor(self) -> list[Any]:

browsergym/core/src/browsergym/core/env.py:561

  • The function _get_obs() uses 'copy.deepcopy' but there is no corresponding 'import copy' in the file. Please add 'import copy' at the top.
copy.deepcopy(self.chat.messages)

browsergym/core/src/browsergym/core/env.py:572

  • The code calls 'time.time()' but there is no visible import for the 'time' module. Ensure that 'import time' is added to avoid runtime errors.
np.asarray([time.time() - self.start_time])

@TLSDC TLSDC requested a review from recursix April 23, 2025 19:00
for param_name, param in signature.parameters.items():
param_type = "string" # Default to string if type is not specified
if param.annotation != inspect.Parameter.empty:
if param.annotation == str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else is just a mapping. This could be done more concisely with a dict.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_map = {
        str: "string",
        float: "number",
        int: "number",
        bool: "boolean",
        dict: "object",
        list: "array"
    } 

# return the constructed python code
return python_code

def to_tool_description(self, api="openai") -> list[dict]:
Copy link
Collaborator

@recursix recursix Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good candidate for an easy unit test. But perhaps we could isolate it into a separate function first. Perhaps tool = get_tool(tool_name, action).

recursix
recursix previously approved these changes Apr 24, 2025
@TLSDC TLSDC requested a review from recursix May 2, 2025 21:06
tools = []
for tool_name, action in self.action_set.items():
# Parse the signature to extract parameter names and types
parameters = {"type": "object", "properties": {}, "required": []}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolate from here into a function get_tool(name, action) -> tool and unit test it.

recursix added 2 commits May 20, 2025 10:57
…istency; update browser launch args comments in env.py
…d functionality; add main execution block for direct testing
@amanjaiswal73892 amanjaiswal73892 marked this pull request as ready for review May 20, 2025 16:05
@amanjaiswal73892 amanjaiswal73892 merged commit 6bc4c51 into main May 20, 2025
13 checks passed
@amanjaiswal73892 amanjaiswal73892 deleted the tlsdc/tool_use_agents branch May 20, 2025 16:05
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Security Unsafe Raw Page Object Exposure ▹ view
Readability Observation extraction logic lacks clear structure ▹ view
Readability Replace type mapping chain with dictionary ▹ view
Performance Uncached Function Introspection ▹ view
Functionality Missing API Validation in Tool Description Generation ▹ view
Files scanned
File Path Reviewed
browsergym/core/src/browsergym/core/init.py
browsergym/experiments/src/bgym/init.py
browsergym/core/src/browsergym/core/action/base.py
browsergym/experiments/src/browsergym/experiments/benchmark/configs.py
browsergym/experiments/src/browsergym/experiments/benchmark/utils.py
browsergym/core/src/browsergym/core/action/functions.py
browsergym/core/src/browsergym/core/action/highlevel.py
browsergym/experiments/src/browsergym/experiments/loop.py
browsergym/core/src/browsergym/core/env.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X


if self.use_raw_page_output:
obs = {
"page": self.page,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe Raw Page Object Exposure category Security

Tell me more
What is the issue?

Direct exposure of the raw Playwright page object in the observation space when use_raw_page_output=True.

Why this matters

Exposing the raw browser page object gives unrestricted access to potentially sensitive page data and browser control methods that could be exploited maliciously.

Suggested change ∙ Feature Preview

Instead of exposing the raw page object, create a restricted interface that only exposes necessary page properties and methods:

"page": {
    "url": self.page.url,
    "title": self.page.title(),
    # Add other needed properties with proper sanitization
}
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines 604 to 623
def _get_obs(self):

if self.use_raw_page_output:
obs = {
"page": self.page,
"chat_messages": tuple(copy.deepcopy(self.chat.messages)),
"goal": _try_to_extract_legacy_goal(self.goal_object), # legacy goal, deprecated
"goal_object": tuple(
copy.deepcopy(self.goal_object)
), # new goal format, list of messages openai style
"open_pages_urls": tuple(page.url for page in self.context.pages),
"open_pages_titles": tuple(page.title() for page in self.context.pages),
"active_page_index": np.asarray([self.context.pages.index(self.page)]),
"url": self.page.url, # redundant with "open_pages_urls" and "active_page_index"
"last_action": self.last_action,
"last_action_error": self.last_action_error,
"elapsed_time": np.asarray([time.time() - self.start_time]),
}
return obs
for retries_left in reversed(range(EXTRACT_OBS_MAX_TRIES)):
try:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation extraction logic lacks clear structure category Readability

Tell me more
What is the issue?

The _get_obs method has a complex structure with two very different code paths and duplicated field extraction logic.

Why this matters

The method's structure makes it hard to understand what fields are common between the two observation types and obscures the relationship between the observation_space definition and actual observation creation.

Suggested change ∙ Feature Preview

Split the method into smaller, focused functions:

def _get_obs(self):
    base_obs = self._get_base_observation()
    if self.use_raw_page_output:
        return base_obs
    return {**base_obs, **self._get_extended_observation()}

def _get_base_observation(self):
    return {
        "chat_messages": tuple(copy.deepcopy(self.chat.messages)),
        # ... other common fields
    }

def _get_extended_observation(self):
    return self._extract_with_retries()
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +536 to +547
param_type = "string" # Default to string if type is not specified
if param.annotation != inspect.Parameter.empty:
if param.annotation is str:
param_type = "string"
elif param.annotation is float or param.annotation is int:
param_type = "number"
elif param.annotation is bool:
param_type = "boolean"
elif param.annotation is dict:
param_type = "object"
elif param.annotation is list:
param_type = "array"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace type mapping chain with dictionary category Readability

Tell me more
What is the issue?

Long type mapping chain using if-elif statements could be replaced with a more readable mapping dictionary.

Why this matters

The current nested if-else structure makes the code harder to scan and maintain. Future type additions would require extending the if-else chain.

Suggested change ∙ Feature Preview
TYPE_MAPPING = {
    str: "string",
    float: "number",
    int: "number",
    bool: "boolean",
    dict: "object",
    list: "array"
}

param_type = "string"  # Default to string if type is not specified
if param.annotation != inspect.Parameter.empty:
    param_type = TYPE_MAPPING.get(param.annotation, "string")
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +533 to +535
parameters = {"type": "object", "properties": {}, "required": []}
signature = inspect.signature(globals()[tool_name])
for param_name, param in signature.parameters.items():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncached Function Introspection category Performance

Tell me more
What is the issue?

The to_tool_description method performs introspection inside a loop for each action, which is an expensive operation that could be cached.

Why this matters

Repeatedly calling inspect.signature() for the same functions across multiple calls creates unnecessary overhead, especially if this method is called frequently.

Suggested change ∙ Feature Preview

Cache the function signatures and parameter information during initialization or as a class variable to avoid repeated introspection:

def __init__(self, ...):
    # existing code
    self._cached_signatures = {}
    for tool_name in self.action_set:
        self._cached_signatures[tool_name] = {
            'signature': inspect.signature(globals()[tool_name]),
            'parameters': self._extract_parameters(globals()[tool_name])
        }
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

# return the constructed python code
return python_code

def to_tool_description(self, api="openai") -> list[dict]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing API Validation in Tool Description Generation category Functionality

Tell me more
What is the issue?

The to_tool_description method defaults to 'openai' API but lacks validation for unsupported API values.

Why this matters

Invalid API values could lead to unexpected behavior as the method will use 'parameters' as the default schema key for any unrecognized API value.

Suggested change ∙ Feature Preview

Add validation for supported API values:

def to_tool_description(self, api="openai") -> list[dict]:
    supported_apis = {"openai", "anthropic"}
    if api not in supported_apis:
        raise ValueError(f"Unsupported API: {api}. Must be one of {supported_apis}")
    
    schema_keys = {
        "openai": "parameters",
        "anthropic": "input_schema",
    }
    schema = schema_keys[api]  # Will now always be valid
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

layahaasini pushed a commit to layahaasini/BrowserGym that referenced this pull request Nov 21, 2025
…ew APIs (ServiceNow#340)

* added openai/anthropic compatible tool json descriptions

* added raw page as possible output to reduce overhead

* moving the benchmark section out of bgym (to agentlab)

* Update browsergym/core/src/browsergym/core/action/highlevel.py

Co-authored-by: Copilot <[email protected]>

* adding benchmark stuff back in

* backtracking

* backtracking everything

* brought back the 'eval' bc we need it actually

* fixing broken imports

* version bump

* switching eval func to globals()

* Update loop.py

* Update browsergym/experiments/src/browsergym/experiments/loop.py

* Add scroll_at function and update references in highlevel.py; refactor browser launch args in env.py; clean up utils.py

* Replace 'scroll' with 'scroll_at' in ACTION_SUBSETS for clarity and consistency

* Refactor ACTION_SUBSETS to replace 'scroll_at' with 'scroll' for consistency; update browser launch args comments in env.py

* Update test_scroll to include 'bid' in action_set subsets for improved functionality; add main execution block for direct testing

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: recursix <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants