Skip to content

Conversation

@pranavraja99
Copy link

This implements native function calling with true multi-turn using the tool role. Parsing logic is implemented to parse out the tool calls from the model output, execute and feed it into the model in its native format.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces native function calling for multi-turn conversations, which is a significant enhancement. The implementation includes a new 'browse' example with a Brave Search tool, environment, and tool parsers. The changes are well-structured, adding new components for the browsing task and modifying existing generator and inference client to plumb through tool specifications. My review includes suggestions to remove some dead code and debug statements, fix a bug in API key handling, and optimize a loop in the tool parser. Overall, this is a solid contribution.

Comment on lines +131 to +134
brave_api_key = os.getenv("BRAVE_API_KEY")

if not brave_api_key:
return {"error": "No BRAVE_API_KEY environment variable found. Please set it to use this function."}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The search method re-fetches the API key from the environment variable, ignoring self.api_key which was set in the constructor. This is redundant and can lead to incorrect behavior if the API key was provided via the constructor. You should use self.api_key throughout this method. Also, remember to update its usage on line 212.

Suggested change
brave_api_key = os.getenv("BRAVE_API_KEY")
if not brave_api_key:
return {"error": "No BRAVE_API_KEY environment variable found. Please set it to use this function."}
if not self.api_key:
return {"error": "No BRAVE_API_KEY environment variable found. Please set it or pass the api_key to the constructor."}


country = region_mapping.get(region, "ALL")

headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": brave_api_key}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To be consistent with using self.api_key from the constructor, this line should also be updated to use self.api_key instead of the local brave_api_key variable.

Suggested change
headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": brave_api_key}
headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": self.api_key}

Comment on lines 90 to 106
else:
tool_calls = []
for tool_call in result:
try:
evaled_tool_dict = json.loads(tool_call)
except Exception:
try:
evaled_tool_dict = ast.literal_eval(tool_call)
except Exception:
evaled_tool_dict = None

if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):
tool_calls.append(evaled_tool_dict)

content = model_output[: model_output.find(self.tool_call_start_token)]

return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation of content on line 104 is inside the for loop, which means it's being re-calculated on every iteration with the same result. This is inefficient. You should move this calculation outside and before the loop.

Suggested change
else:
tool_calls = []
for tool_call in result:
try:
evaled_tool_dict = json.loads(tool_call)
except Exception:
try:
evaled_tool_dict = ast.literal_eval(tool_call)
except Exception:
evaled_tool_dict = None
if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):
tool_calls.append(evaled_tool_dict)
content = model_output[: model_output.find(self.tool_call_start_token)]
return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}
else:
tool_calls = []
content = model_output[: model_output.find(self.tool_call_start_token)]
for tool_call in result:
try:
evaled_tool_dict = json.loads(tool_call)
except Exception:
try:
evaled_tool_dict = ast.literal_eval(tool_call)
except Exception:
evaled_tool_dict = None
if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):
tool_calls.append(evaled_tool_dict)
return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}

Comment on lines 267 to 269
print("self.cfg.environment.env_class", self.cfg.environment.env_class)
generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)
print("GOT TO GENERATOR")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These print statements appear to be for debugging purposes. They should be removed from the final code to keep the output clean. If this information is valuable, consider using the logger with an appropriate level (e.g., logger.debug).

Suggested change
print("self.cfg.environment.env_class", self.cfg.environment.env_class)
generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)
print("GOT TO GENERATOR")
generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)

Comment on lines +45 to +49
def _parse_action(self, action: str) -> List[Optional[str]]:
match = None
if "<search>" in action and "</search>" in action:
match = re.search(r"<search>(.*?)</search>", action, re.DOTALL)
return [match.group(1)] if match else [None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _parse_action method appears to be unused. The step method uses self.tool_parser.extract_tool_calls(action) for parsing, making this method redundant. It should be removed to improve code clarity and maintainability.

trainer.eval_before_train=false \
generator.eval_sampling_params.temperature=0 \
generator.eval_sampling_params.stop='["</search>", "</answer>"]' \
trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the trainer.export_path. It's set to a path containing skyrl-search, but other related paths and names use skyrl-browse. For consistency, this should probably be skyrl-browse.

Suggested change
trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \
trainer.export_path="$HOME/skyrl-browse_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \

trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \
trainer.eval_interval=50 \
$@

No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file ends with an extra empty line containing a space, which is unusual. It's better to remove it and ensure the file ends with the last command or a single newline.

trainer.eval_before_train=false \
generator.eval_sampling_params.temperature=0 \
generator.eval_sampling_params.stop='["</search>", "</answer>"]' \
trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the trainer.export_path. It's set to a path containing skyrl-search, but other related paths and names use skyrl-browse. For consistency, this should probably be skyrl-browse.

Suggested change
trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \
trainer.export_path="$HOME/skyrl-browse_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \

trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \
trainer.eval_interval=50 \
$@

No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file ends with an extra empty line containing a space, which is unusual. It's better to remove it and ensure the file ends with the last command or a single newline.

# Close the environment
await self._run_in_executor_if_available(env.close)

# decoded_text = self.tokenizer.decode(input_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out line appears to be a leftover from debugging. It should be removed to keep the code clean.

@pranavraja99 pranavraja99 changed the title Pranav/mt brave search rl Multi-turn native tool use RL Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants