-
Notifications
You must be signed in to change notification settings - Fork 143
Some changes in preparation for agentlab's new ToolUseAgent and the new APIs #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR prepares the codebase for integration with agentlab’s new ToolUseAgent and APIs by removing legacy benchmark utilities and introducing new tooling methods and environment observation options.
- Removed legacy benchmark files and associated imports.
- Added a new parameter and observation space branch for using raw page output in the browser environment.
- Extended high-level action functionality with a to_tool_description method for API integration.
Reviewed Changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| browsergym/experiments/src/browsergym/experiments/benchmark/* | Deprecated benchmark code removed to support the new API integration. |
| browsergym/experiments/src/bgym/init.py | Removed benchmark import references. |
| browsergym/core/src/browsergym/core/env.py | Added use_raw_page_output support with updated observation space. |
| browsergym/core/src/browsergym/core/action/highlevel.py | Introduced to_tool_description method to translate actions to tool descriptors. |
| browsergym/core/src/browsergym/core/action/base.py | Added a placeholder for to_tool_descriptor; note the naming inconsistency with highlevel.py. |
Files not reviewed (2)
- browsergym/experiments/src/browsergym/experiments/benchmark/metadata/assistantbench.csv: Language not supported
- browsergym/experiments/src/browsergym/experiments/benchmark/metadata/miniwob.csv: Language not supported
Comments suppressed due to low confidence (3)
browsergym/core/src/browsergym/core/action/base.py:37
- There is an inconsistency between 'to_tool_descriptor' in base.py and the implemented 'to_tool_description' in highlevel.py. Consider aligning the naming convention across these classes.
def to_tool_descriptor(self) -> list[Any]:
browsergym/core/src/browsergym/core/env.py:561
- The function _get_obs() uses 'copy.deepcopy' but there is no corresponding 'import copy' in the file. Please add 'import copy' at the top.
copy.deepcopy(self.chat.messages)
browsergym/core/src/browsergym/core/env.py:572
- The code calls 'time.time()' but there is no visible import for the 'time' module. Ensure that 'import time' is added to avoid runtime errors.
np.asarray([time.time() - self.start_time])
Co-authored-by: Copilot <[email protected]>
| for param_name, param in signature.parameters.items(): | ||
| param_type = "string" # Default to string if type is not specified | ||
| if param.annotation != inspect.Parameter.empty: | ||
| if param.annotation == str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else is just a mapping. This could be done more concisely with a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type_map = {
str: "string",
float: "number",
int: "number",
bool: "boolean",
dict: "object",
list: "array"
} | # return the constructed python code | ||
| return python_code | ||
|
|
||
| def to_tool_description(self, api="openai") -> list[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good candidate for an easy unit test. But perhaps we could isolate it into a separate function first. Perhaps tool = get_tool(tool_name, action).
…Gym into tlsdc/tool_use_agents
…r browser launch args in env.py; clean up utils.py
| tools = [] | ||
| for tool_name, action in self.action_set.items(): | ||
| # Parse the signature to extract parameter names and types | ||
| parameters = {"type": "object", "properties": {}, "required": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolate from here into a function get_tool(name, action) -> tool and unit test it.
…istency; update browser launch args comments in env.py
…d functionality; add main execution block for direct testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Unsafe Raw Page Object Exposure ▹ view | ||
| Observation extraction logic lacks clear structure ▹ view | ||
| Replace type mapping chain with dictionary ▹ view | ||
| Uncached Function Introspection ▹ view | ||
| Missing API Validation in Tool Description Generation ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| browsergym/core/src/browsergym/core/init.py | ✅ |
| browsergym/experiments/src/bgym/init.py | ✅ |
| browsergym/core/src/browsergym/core/action/base.py | ✅ |
| browsergym/experiments/src/browsergym/experiments/benchmark/configs.py | ✅ |
| browsergym/experiments/src/browsergym/experiments/benchmark/utils.py | ✅ |
| browsergym/core/src/browsergym/core/action/functions.py | ✅ |
| browsergym/core/src/browsergym/core/action/highlevel.py | ✅ |
| browsergym/experiments/src/browsergym/experiments/loop.py | ✅ |
| browsergym/core/src/browsergym/core/env.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
|
|
||
| if self.use_raw_page_output: | ||
| obs = { | ||
| "page": self.page, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe Raw Page Object Exposure 
Tell me more
What is the issue?
Direct exposure of the raw Playwright page object in the observation space when use_raw_page_output=True.
Why this matters
Exposing the raw browser page object gives unrestricted access to potentially sensitive page data and browser control methods that could be exploited maliciously.
Suggested change ∙ Feature Preview
Instead of exposing the raw page object, create a restricted interface that only exposes necessary page properties and methods:
"page": {
"url": self.page.url,
"title": self.page.title(),
# Add other needed properties with proper sanitization
}Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| def _get_obs(self): | ||
|
|
||
| if self.use_raw_page_output: | ||
| obs = { | ||
| "page": self.page, | ||
| "chat_messages": tuple(copy.deepcopy(self.chat.messages)), | ||
| "goal": _try_to_extract_legacy_goal(self.goal_object), # legacy goal, deprecated | ||
| "goal_object": tuple( | ||
| copy.deepcopy(self.goal_object) | ||
| ), # new goal format, list of messages openai style | ||
| "open_pages_urls": tuple(page.url for page in self.context.pages), | ||
| "open_pages_titles": tuple(page.title() for page in self.context.pages), | ||
| "active_page_index": np.asarray([self.context.pages.index(self.page)]), | ||
| "url": self.page.url, # redundant with "open_pages_urls" and "active_page_index" | ||
| "last_action": self.last_action, | ||
| "last_action_error": self.last_action_error, | ||
| "elapsed_time": np.asarray([time.time() - self.start_time]), | ||
| } | ||
| return obs | ||
| for retries_left in reversed(range(EXTRACT_OBS_MAX_TRIES)): | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation extraction logic lacks clear structure 
Tell me more
What is the issue?
The _get_obs method has a complex structure with two very different code paths and duplicated field extraction logic.
Why this matters
The method's structure makes it hard to understand what fields are common between the two observation types and obscures the relationship between the observation_space definition and actual observation creation.
Suggested change ∙ Feature Preview
Split the method into smaller, focused functions:
def _get_obs(self):
base_obs = self._get_base_observation()
if self.use_raw_page_output:
return base_obs
return {**base_obs, **self._get_extended_observation()}
def _get_base_observation(self):
return {
"chat_messages": tuple(copy.deepcopy(self.chat.messages)),
# ... other common fields
}
def _get_extended_observation(self):
return self._extract_with_retries()Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| param_type = "string" # Default to string if type is not specified | ||
| if param.annotation != inspect.Parameter.empty: | ||
| if param.annotation is str: | ||
| param_type = "string" | ||
| elif param.annotation is float or param.annotation is int: | ||
| param_type = "number" | ||
| elif param.annotation is bool: | ||
| param_type = "boolean" | ||
| elif param.annotation is dict: | ||
| param_type = "object" | ||
| elif param.annotation is list: | ||
| param_type = "array" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace type mapping chain with dictionary 
Tell me more
What is the issue?
Long type mapping chain using if-elif statements could be replaced with a more readable mapping dictionary.
Why this matters
The current nested if-else structure makes the code harder to scan and maintain. Future type additions would require extending the if-else chain.
Suggested change ∙ Feature Preview
TYPE_MAPPING = {
str: "string",
float: "number",
int: "number",
bool: "boolean",
dict: "object",
list: "array"
}
param_type = "string" # Default to string if type is not specified
if param.annotation != inspect.Parameter.empty:
param_type = TYPE_MAPPING.get(param.annotation, "string")Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| parameters = {"type": "object", "properties": {}, "required": []} | ||
| signature = inspect.signature(globals()[tool_name]) | ||
| for param_name, param in signature.parameters.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncached Function Introspection 
Tell me more
What is the issue?
The to_tool_description method performs introspection inside a loop for each action, which is an expensive operation that could be cached.
Why this matters
Repeatedly calling inspect.signature() for the same functions across multiple calls creates unnecessary overhead, especially if this method is called frequently.
Suggested change ∙ Feature Preview
Cache the function signatures and parameter information during initialization or as a class variable to avoid repeated introspection:
def __init__(self, ...):
# existing code
self._cached_signatures = {}
for tool_name in self.action_set:
self._cached_signatures[tool_name] = {
'signature': inspect.signature(globals()[tool_name]),
'parameters': self._extract_parameters(globals()[tool_name])
}Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| # return the constructed python code | ||
| return python_code | ||
|
|
||
| def to_tool_description(self, api="openai") -> list[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing API Validation in Tool Description Generation 
Tell me more
What is the issue?
The to_tool_description method defaults to 'openai' API but lacks validation for unsupported API values.
Why this matters
Invalid API values could lead to unexpected behavior as the method will use 'parameters' as the default schema key for any unrecognized API value.
Suggested change ∙ Feature Preview
Add validation for supported API values:
def to_tool_description(self, api="openai") -> list[dict]:
supported_apis = {"openai", "anthropic"}
if api not in supported_apis:
raise ValueError(f"Unsupported API: {api}. Must be one of {supported_apis}")
schema_keys = {
"openai": "parameters",
"anthropic": "input_schema",
}
schema = schema_keys[api] # Will now always be validProvide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
…ew APIs (ServiceNow#340) * added openai/anthropic compatible tool json descriptions * added raw page as possible output to reduce overhead * moving the benchmark section out of bgym (to agentlab) * Update browsergym/core/src/browsergym/core/action/highlevel.py Co-authored-by: Copilot <[email protected]> * adding benchmark stuff back in * backtracking * backtracking everything * brought back the 'eval' bc we need it actually * fixing broken imports * version bump * switching eval func to globals() * Update loop.py * Update browsergym/experiments/src/browsergym/experiments/loop.py * Add scroll_at function and update references in highlevel.py; refactor browser launch args in env.py; clean up utils.py * Replace 'scroll' with 'scroll_at' in ACTION_SUBSETS for clarity and consistency * Refactor ACTION_SUBSETS to replace 'scroll_at' with 'scroll' for consistency; update browser launch args comments in env.py * Update test_scroll to include 'bid' in action_set subsets for improved functionality; add main execution block for direct testing --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: recursix <[email protected]>
Description by Korbit AI
What change is being made?
Upgrade BrowserGym to version 0.14.0, introduce support for new agent types and APIs, add a new scrolling function, and modify environment configuration to allow raw page output.
Why are these changes being made?
These updates are made to support the upcoming ToolUseAgent and new APIs in agentlab by enhancing action description conversion, updating dependencies to the latest version, and providing flexibility in the environment's observation output to accommodate raw page data when required. Additionally, code cleanup and optimizations in requirements and imports have been performed for better maintainability.