Multi-turn native tool use RL #643

pranavraja99 · 2025-11-08T07:16:41Z

This implements native function calling with true multi-turn using the tool role. Parsing logic is implemented to parse out the tool calls from the model output, execute and feed it into the model in its native format.

gemini-code-assist

Code Review

This pull request introduces native function calling for multi-turn conversations, which is a significant enhancement. The implementation includes a new 'browse' example with a Brave Search tool, environment, and tool parsers. The changes are well-structured, adding new components for the browsing task and modifying existing generator and inference client to plumb through tool specifications. My review includes suggestions to remove some dead code and debug statements, fix a bug in API key handling, and optimize a loop in the tool parser. Overall, this is a solid contribution.

gemini-code-assist · 2025-11-08T07:18:38Z

skyrl-train/examples/browse/browse_tool.py

+        brave_api_key = os.getenv("BRAVE_API_KEY")
+
+        if not brave_api_key:
+            return {"error": "No BRAVE_API_KEY environment variable found. Please set it to use this function."}


The search method re-fetches the API key from the environment variable, ignoring self.api_key which was set in the constructor. This is redundant and can lead to incorrect behavior if the API key was provided via the constructor. You should use self.api_key throughout this method. Also, remember to update its usage on line 212.

Suggested change

brave_api_key = os.getenv("BRAVE_API_KEY")

if not brave_api_key:

return {"error": "No BRAVE_API_KEY environment variable found. Please set it to use this function."}

if not self.api_key:

return {"error": "No BRAVE_API_KEY environment variable found. Please set it or pass the api_key to the constructor."}

gemini-code-assist · 2025-11-08T07:18:38Z

skyrl-train/examples/browse/browse_tool.py

+
+        country = region_mapping.get(region, "ALL")
+
+        headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": brave_api_key}


To be consistent with using self.api_key from the constructor, this line should also be updated to use self.api_key instead of the local brave_api_key variable.

Suggested change

headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": brave_api_key}

headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": self.api_key}

gemini-code-assist · 2025-11-08T07:18:39Z

skyrl-train/examples/browse/tool_parser.py

+        else:
+            tool_calls = []
+            for tool_call in result:
+                try:
+                    evaled_tool_dict = json.loads(tool_call)
+                except Exception:
+                    try:
+                        evaled_tool_dict = ast.literal_eval(tool_call)
+                    except Exception:
+                        evaled_tool_dict = None
+
+                if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):
+                    tool_calls.append(evaled_tool_dict)
+
+                content = model_output[: model_output.find(self.tool_call_start_token)]
+
+            return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}


The calculation of content on line 104 is inside the for loop, which means it's being re-calculated on every iteration with the same result. This is inefficient. You should move this calculation outside and before the loop.

Suggested change

else:

tool_calls = []

for tool_call in result:

try:

evaled_tool_dict = json.loads(tool_call)

except Exception:

try:

evaled_tool_dict = ast.literal_eval(tool_call)

except Exception:

evaled_tool_dict = None

if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):

tool_calls.append(evaled_tool_dict)

content = model_output[: model_output.find(self.tool_call_start_token)]

return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}

else:

tool_calls = []

content = model_output[: model_output.find(self.tool_call_start_token)]

for tool_call in result:

try:

evaled_tool_dict = json.loads(tool_call)

except Exception:

try:

evaled_tool_dict = ast.literal_eval(tool_call)

except Exception:

evaled_tool_dict = None

if evaled_tool_dict and evaled_tool_dict.get("name") and evaled_tool_dict.get("arguments"):

tool_calls.append(evaled_tool_dict)

return {"tools_called": bool(tool_calls), "tool_calls": tool_calls, "content": content}

gemini-code-assist · 2025-11-08T07:18:39Z

skyrl-train/examples/browse/browse_entrypoint.py

+        print("self.cfg.environment.env_class", self.cfg.environment.env_class)
+        generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)
+        print("GOT TO GENERATOR")


These print statements appear to be for debugging purposes. They should be removed from the final code to keep the output clean. If this information is valuable, consider using the logger with an appropriate level (e.g., logger.debug).

Suggested change

print("self.cfg.environment.env_class", self.cfg.environment.env_class)

generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)

print("GOT TO GENERATOR")

generator: GeneratorInterface = self.get_generator(self.cfg, tokenizer, inference_engine_client)

gemini-code-assist · 2025-11-08T07:18:39Z

skyrl-train/examples/browse/browse_env.py

+    def _parse_action(self, action: str) -> List[Optional[str]]:
+        match = None
+        if "<search>" in action and "</search>" in action:
+            match = re.search(r"<search>(.*?)</search>", action, re.DOTALL)
+        return [match.group(1)] if match else [None]


The _parse_action method appears to be unused. The step method uses self.tool_parser.extract_tool_calls(action) for parsing, making this method redundant. It should be removed to improve code clarity and maintainability.

gemini-code-assist · 2025-11-08T07:18:39Z

skyrl-train/examples/browse/run_brave_search.sh

+  trainer.eval_before_train=false \
+  generator.eval_sampling_params.temperature=0 \
+  generator.eval_sampling_params.stop='["</search>", "</answer>"]' \
+  trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \


There appears to be a typo in the trainer.export_path. It's set to a path containing skyrl-search, but other related paths and names use skyrl-browse. For consistency, this should probably be skyrl-browse.

Suggested change

trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \

trainer.export_path="$HOME/skyrl-browse_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \

gemini-code-assist · 2025-11-08T07:18:39Z

skyrl-train/examples/browse/run_brave_search.sh

+  trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \
+  trainer.eval_interval=50 \
+  $@
+


This file ends with an extra empty line containing a space, which is unusual. It's better to remove it and ensure the file ends with the last command or a single newline.

gemini-code-assist · 2025-11-08T07:18:40Z

skyrl-train/examples/browse/run_brave_search_lora.sh

+  trainer.eval_before_train=false \
+  generator.eval_sampling_params.temperature=0 \
+  generator.eval_sampling_params.stop='["</search>", "</answer>"]' \
+  trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \


There appears to be a typo in the trainer.export_path. It's set to a path containing skyrl-search, but other related paths and names use skyrl-browse. For consistency, this should probably be skyrl-browse.

Suggested change

trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \

trainer.export_path="$HOME/skyrl-browse_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \

gemini-code-assist · 2025-11-08T07:18:40Z

skyrl-train/examples/browse/run_brave_search_lora.sh

+  trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B_lora/exports" \
+  trainer.eval_interval=50 \
+  $@
+


This file ends with an extra empty line containing a space, which is unusual. It's better to remove it and ensure the file ends with the last command or a single newline.

gemini-code-assist · 2025-11-08T07:18:40Z

skyrl-train/skyrl_train/generators/skyrl_gym_generator.py

        # Close the environment
        await self._run_in_executor_if_available(env.close)

+        # decoded_text = self.tokenizer.decode(input_ids)


This commented-out line appears to be a leftover from debugging. It should be removed to keep the code clean.

Pranav Raja added 5 commits November 5, 2025 23:07

pushing separate files

dbf004f

push config files

bd9fc16

add changes for native tool calling

e66ede9

push parser

cfdc41a

pushing latest changes

d1b55a6

gemini-code-assist bot reviewed Nov 8, 2025

View reviewed changes

Pranav Raja added 2 commits November 8, 2025 07:30

removing print statement

89ed8c2

remove content parsing in tool call parsing logic

5295859

pranavraja99 changed the title ~~Pranav/mt brave search rl~~ Multi-turn native tool use RL Nov 8, 2025

push changes

2b64847

erictang000 added the skyrl-train label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-turn native tool use RL #643

Multi-turn native tool use RL #643

Uh oh!

pranavraja99 commented Nov 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

gemini-code-assist bot Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		country = region_mapping.get(region, "ALL")

		headers = {"Accept": "application/json", "Accept-Encoding": "gzip", "X-Subscription-Token": brave_api_key}

	trainer.export_path="$HOME/skyrl-search_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \
	trainer.export_path="$HOME/skyrl-browse_6turns_maxgeneratelen_1536_Qwen3-8B/exports" \

Multi-turn native tool use RL #643

Are you sure you want to change the base?

Multi-turn native tool use RL #643

Uh oh!

Conversation

pranavraja99 commented Nov 8, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants