Fix Proxy Mode with extra_body support, enforcing max_tokens, switching to individual mode, and fix prompt template and tool call parsing #775

samjia2000 · 2025-12-29T15:14:35Z

…est, (2) enforcing 'max_tokens' on proxy server side, (3) switch to 'individual' mode in tool call scenarios, (4) fix prompt template and tool call parsing in completions.create function

Description

Fix proxy mode with

supporting 'extra_body' when creating completions request
enforcing 'max_tokens' on proxy server side
switch to 'individual' mode in tool call scenarios
fix prompt template and tool call parsing in completions.create function

Related Issue

Fixes #(issue)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Training is valided on Tau2 Airline domain:

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

…est, (2) enforcing 'max_tokens' on proxy server side, (3) switch to 'individual' mode in tool call scenarios, (4) fix prompt template and tool call parsing in completions.create function

gemini-code-assist · 2025-12-29T15:15:08Z

Summary of Changes

Hello @samjia2000, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the proxy mode's capabilities by introducing support for 'extra_body' in completion requests and enforcing 'max_tokens' at the proxy level. It also refines the client's prompt templating and tool call parsing logic, ensuring more robust and flexible interactions with language models. Furthermore, the system now defaults to an 'individual' export mode for tool call scenarios and includes several minor bug fixes and configuration adjustments, particularly within the Tau2 example, to improve overall stability and control.

Highlights

Enhanced Proxy Request Handling: The proxy now supports an 'extra_body' field in completion requests, enabling more flexible parameter passing to the underlying client. It also enforces 'max_tokens' on the proxy server side, providing better control over generation length.
Improved Prompt Templating and Tool Call Parsing: The client's 'create' function has been refined to correctly handle prompt templates and tool call parsing across different chat template types ('hf' and 'concat'). A new generic '_ensure_dict' utility simplifies data serialization and validation.
Switch to 'Individual' Export Mode: The default export style for completion results has been changed from 'concat' to 'individual' in tool call scenarios, as reflected in the 'tau2_train.py' configuration, improving how results are stored and processed.
Refined Caching and Error Handling: Minor improvements were made to the caching mechanism, including a more informative warning for interaction ID mismatches and a change in how 'None' rewards are handled, now defaulting to 'Zero reward'.
Tau2 Example Updates: The Tau2 example configurations and agent behavior were updated to align with these changes, including disabling 'thinking' in the chat template and passing 'max_total_tokens' to the proxy for better control during training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several important fixes and features, including support for extra_body in proxy mode, enforcement of max_tokens, and changes to prompt templating and tool call parsing. The changes generally improve the robustness and functionality of the system.

My review focuses on a few key areas:

A critical bug in areal/experimental/openai/cache.py where filtering logic doesn't update the cache object correctly for the 'concat' export style.
An opportunity to improve code clarity in areal/experimental/openai/client.py by removing an unused parameter from the new _ensure_dict function.
A suggestion to improve the robustness of request parsing in areal/utils/proxy_utils.py by using the recommended Pydantic API.
A minor issue with a weak assertion in the example examples/tau2/tau2_train.py.

Overall, the PR is a good step forward. Addressing these points will further improve the code quality and prevent potential bugs.

gemini-code-assist · 2025-12-29T15:17:37Z

areal/experimental/openai/cache.py

        for id, interaction in self.items():
            if interaction.interaction_id != id:
-                raise ValueError(
-                    f"Interaction ID mismatch: {interaction.interaction_id} != {id}"
+                logger.warning(
+                    f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation."
                )
+        cache = {
+            id: interaction
+            for id, interaction in cache.items()
+            if id == interaction.interaction_id
+        }


The change from raise ValueError to logger.warning improves robustness. However, the current filtering implementation has a flaw. The cache variable is rebound to a new dict, but self (the InteractionCache instance) is not modified. The subsequent code for style == 'concat' still uses self, which contains the unfiltered interactions, leading to incorrect behavior. For style == 'individual', it correctly uses the filtered cache.

To fix this, you should modify self in-place. The suggested change also makes the filtering more efficient by using a single pass.

for id, interaction in list(self.items()): if interaction.interaction_id != id: logger.warning( f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation. Removing it." ) del self[id] cache = self

gemini-code-assist · 2025-12-29T15:17:37Z

areal/experimental/openai/client.py

+def _ensure_dict(
+    name: str,
+    item: Any,
+) -> Any:
+    _item = None
+    if isinstance(item, dict):
+        _item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}
+    elif isinstance(item, BaseModel):
+        _item = item.model_dump(exclude_none=True, mode="json")
+    elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
+        _item = [_ensure_dict(name, i) for i in item]
+    else:
+        _item = item
+    return _item


The name parameter in _ensure_dict is not used. It's passed in recursive calls but its value is never read. It should be removed to clean up the code. The call sites at lines 209, 236, and 518 in this file should be updated accordingly to _ensure_dict(messages_list_raw), _ensure_dict(tools), and _ensure_dict(input) respectively.

Suggested change

def _ensure_dict(

name: str,

item: Any,

) -> Any:

_item = None

if isinstance(item, dict):

_item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}

elif isinstance(item, BaseModel):

_item = item.model_dump(exclude_none=True, mode="json")

elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):

_item = [_ensure_dict(name, i) for i in item]

else:

_item = item

return _item

def _ensure_dict(

item: Any,

) -> Any:

_item = None

if isinstance(item, dict):

_item = {k: _ensure_dict(v) for k, v in item.items() if v is not None}

elif isinstance(item, BaseModel):

_item = item.model_dump(exclude_none=True, mode="json")

elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):

_item = [_ensure_dict(i) for i in item]

else:

_item = item

return _item

gemini-code-assist · 2025-12-29T15:17:37Z

areal/utils/proxy_utils.py

+    known_fields = {k: v for k, v in data.items() if k in cls.__annotations__.keys()}
+
+    # Extract extra fields
+    extra_body = {k: v for k, v in data.items() if k not in cls.__annotations__.keys()}


Using cls.__annotations__.keys() to get the fields of a Pydantic model is not robust. It might not work correctly with field aliases, inherited fields, or other advanced Pydantic features. The recommended way to get a model's fields in Pydantic v2+ is to use cls.model_fields.

Suggested change

known_fields = {k: v for k, v in data.items() if k in cls.__annotations__.keys()}

# Extract extra fields

extra_body = {k: v for k, v in data.items() if k not in cls.__annotations__.keys()}

known_fields = {k: v for k, v in data.items() if k in cls.model_fields}

# Extract extra fields

extra_body = {k: v for k, v in data.items() if k not in cls.model_fields}

gemini-code-assist · 2025-12-29T15:17:38Z

examples/tau2/tau2_train.py

+        assert len(run_infos) == len(rewards), (
+            len(run_infos),
+            len(rewards),
+            self.group_size,
+        )


The assertion len(run_infos) == len(rewards) is likely always true, as both run_infos and rewards are derived from self.group_size. This doesn't effectively check if the number of results matches the expected group size. A more explicit assertion against self.group_size with a descriptive message would be more useful.

assert len(run_infos) == self.group_size and len(rewards) == self.group_size, ( f"Expected {self.group_size} run_infos and rewards, but got {len(run_infos)} and {len(rewards)}" )

nuzant

This PR may have conflict with changes introduced in #736 . I can take over and fork a new branch to fix the conflict, then post a new PR to merge changes introduced by #740 and this PR into main.

nuzant · 2025-12-30T10:15:42Z

areal/experimental/openai/cache.py

+                logger.warning(
+                    f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation."


Is this only happening in individual mode? It seems this should never happen. If generation fails, the interaction should not be added into the cache.

Is this only happening in individual mode? It seems this should never happen. If generation fails, the interaction should not be added into the cache.

Generation could fail when enforcing max_tokens on the proxy server side.

nuzant · 2025-12-30T10:44:20Z

areal/experimental/openai/client.py

+def _ensure_dict(
+    name: str,
+    item: Any,
+) -> Any:
+    _item = None
+    if isinstance(item, dict):
+        _item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}
+    elif isinstance(item, BaseModel):
+        _item = item.model_dump(exclude_none=True, mode="json")
+    elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
+        _item = [_ensure_dict(name, i) for i in item]
+    else:
+        _item = item


We could rewrite a new ensure_input_type function and remove _ensure_message_dict_list.

We could rewrite a new ensure_input_type function and remove _ensure_message_dict_list.

Agree

fix proxy with (1) supporting 'extra_body' in create completions requ…

74ed4dc

…est, (2) enforcing 'max_tokens' on proxy server side, (3) switch to 'individual' mode in tool call scenarios, (4) fix prompt template and tool call parsing in completions.create function

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

samjia2000 requested a review from garrett4wade December 30, 2025 02:41

garrett4wade marked this pull request as draft December 30, 2025 03:05

samjia2000 requested a review from nuzant December 30, 2025 09:18

nuzant reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Proxy Mode with extra_body support, enforcing max_tokens, switching to individual mode, and fix prompt template and tool call parsing #775

Fix Proxy Mode with extra_body support, enforcing max_tokens, switching to individual mode, and fix prompt template and tool call parsing #775

Uh oh!

samjia2000 commented Dec 29, 2025

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

nuzant left a comment

Uh oh!

nuzant Dec 30, 2025

Uh oh!

samjia2000 Dec 31, 2025 •

edited

Loading

Uh oh!

nuzant Dec 30, 2025

Uh oh!

samjia2000 Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		logger.warning(
		f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation."

Fix Proxy Mode with extra_body support, enforcing max_tokens, switching to individual mode, and fix prompt template and tool call parsing #775

Are you sure you want to change the base?

Fix Proxy Mode with extra_body support, enforcing max_tokens, switching to individual mode, and fix prompt template and tool call parsing #775

Uh oh!

Conversation

samjia2000 commented Dec 29, 2025

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Dec 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

nuzant left a comment

Choose a reason for hiding this comment

Uh oh!

nuzant Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

samjia2000 Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuzant Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

samjia2000 Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samjia2000 Dec 31, 2025 •

edited

Loading