Skip to content

Conversation

@samjia2000
Copy link
Collaborator

…est, (2) enforcing 'max_tokens' on proxy server side, (3) switch to 'individual' mode in tool call scenarios, (4) fix prompt template and tool call parsing in completions.create function

Description

Fix proxy mode with

  1. supporting 'extra_body' when creating completions request
  2. enforcing 'max_tokens' on proxy server side
  3. switch to 'individual' mode in tool call scenarios
  4. fix prompt template and tool call parsing in completions.create function

Related Issue

Fixes #(issue)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Training is valided on Tau2 Airline domain:

image

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

…est, (2) enforcing 'max_tokens' on proxy server side, (3) switch to 'individual' mode in tool call scenarios, (4) fix prompt template and tool call parsing in completions.create function
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @samjia2000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the proxy mode's capabilities by introducing support for 'extra_body' in completion requests and enforcing 'max_tokens' at the proxy level. It also refines the client's prompt templating and tool call parsing logic, ensuring more robust and flexible interactions with language models. Furthermore, the system now defaults to an 'individual' export mode for tool call scenarios and includes several minor bug fixes and configuration adjustments, particularly within the Tau2 example, to improve overall stability and control.

Highlights

  • Enhanced Proxy Request Handling: The proxy now supports an 'extra_body' field in completion requests, enabling more flexible parameter passing to the underlying client. It also enforces 'max_tokens' on the proxy server side, providing better control over generation length.
  • Improved Prompt Templating and Tool Call Parsing: The client's 'create' function has been refined to correctly handle prompt templates and tool call parsing across different chat template types ('hf' and 'concat'). A new generic '_ensure_dict' utility simplifies data serialization and validation.
  • Switch to 'Individual' Export Mode: The default export style for completion results has been changed from 'concat' to 'individual' in tool call scenarios, as reflected in the 'tau2_train.py' configuration, improving how results are stored and processed.
  • Refined Caching and Error Handling: Minor improvements were made to the caching mechanism, including a more informative warning for interaction ID mismatches and a change in how 'None' rewards are handled, now defaulting to 'Zero reward'.
  • Tau2 Example Updates: The Tau2 example configurations and agent behavior were updated to align with these changes, including disabling 'thinking' in the chat template and passing 'max_total_tokens' to the proxy for better control during training.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several important fixes and features, including support for extra_body in proxy mode, enforcement of max_tokens, and changes to prompt templating and tool call parsing. The changes generally improve the robustness and functionality of the system.

My review focuses on a few key areas:

  1. A critical bug in areal/experimental/openai/cache.py where filtering logic doesn't update the cache object correctly for the 'concat' export style.
  2. An opportunity to improve code clarity in areal/experimental/openai/client.py by removing an unused parameter from the new _ensure_dict function.
  3. A suggestion to improve the robustness of request parsing in areal/utils/proxy_utils.py by using the recommended Pydantic API.
  4. A minor issue with a weak assertion in the example examples/tau2/tau2_train.py.

Overall, the PR is a good step forward. Addressing these points will further improve the code quality and prevent potential bugs.

Comment on lines 201 to +210
for id, interaction in self.items():
if interaction.interaction_id != id:
raise ValueError(
f"Interaction ID mismatch: {interaction.interaction_id} != {id}"
logger.warning(
f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation."
)
cache = {
id: interaction
for id, interaction in cache.items()
if id == interaction.interaction_id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The change from raise ValueError to logger.warning improves robustness. However, the current filtering implementation has a flaw. The cache variable is rebound to a new dict, but self (the InteractionCache instance) is not modified. The subsequent code for style == 'concat' still uses self, which contains the unfiltered interactions, leading to incorrect behavior. For style == 'individual', it correctly uses the filtered cache.

To fix this, you should modify self in-place. The suggested change also makes the filtering more efficient by using a single pass.

        for id, interaction in list(self.items()):
            if interaction.interaction_id != id:
                logger.warning(
                    f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation. Removing it."
                )
                del self[id]
        cache = self

Comment on lines +61 to +74
def _ensure_dict(
name: str,
item: Any,
) -> Any:
_item = None
if isinstance(item, dict):
_item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}
elif isinstance(item, BaseModel):
_item = item.model_dump(exclude_none=True, mode="json")
elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
_item = [_ensure_dict(name, i) for i in item]
else:
_item = item
return _item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The name parameter in _ensure_dict is not used. It's passed in recursive calls but its value is never read. It should be removed to clean up the code. The call sites at lines 209, 236, and 518 in this file should be updated accordingly to _ensure_dict(messages_list_raw), _ensure_dict(tools), and _ensure_dict(input) respectively.

Suggested change
def _ensure_dict(
name: str,
item: Any,
) -> Any:
_item = None
if isinstance(item, dict):
_item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}
elif isinstance(item, BaseModel):
_item = item.model_dump(exclude_none=True, mode="json")
elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
_item = [_ensure_dict(name, i) for i in item]
else:
_item = item
return _item
def _ensure_dict(
item: Any,
) -> Any:
_item = None
if isinstance(item, dict):
_item = {k: _ensure_dict(v) for k, v in item.items() if v is not None}
elif isinstance(item, BaseModel):
_item = item.model_dump(exclude_none=True, mode="json")
elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
_item = [_ensure_dict(i) for i in item]
else:
_item = item
return _item

Comment on lines +240 to +243
known_fields = {k: v for k, v in data.items() if k in cls.__annotations__.keys()}

# Extract extra fields
extra_body = {k: v for k, v in data.items() if k not in cls.__annotations__.keys()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using cls.__annotations__.keys() to get the fields of a Pydantic model is not robust. It might not work correctly with field aliases, inherited fields, or other advanced Pydantic features. The recommended way to get a model's fields in Pydantic v2+ is to use cls.model_fields.

Suggested change
known_fields = {k: v for k, v in data.items() if k in cls.__annotations__.keys()}
# Extract extra fields
extra_body = {k: v for k, v in data.items() if k not in cls.__annotations__.keys()}
known_fields = {k: v for k, v in data.items() if k in cls.model_fields}
# Extract extra fields
extra_body = {k: v for k, v in data.items() if k not in cls.model_fields}

Comment on lines +188 to +192
assert len(run_infos) == len(rewards), (
len(run_infos),
len(rewards),
self.group_size,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion len(run_infos) == len(rewards) is likely always true, as both run_infos and rewards are derived from self.group_size. This doesn't effectively check if the number of results matches the expected group size. A more explicit assertion against self.group_size with a descriptive message would be more useful.

        assert len(run_infos) == self.group_size and len(rewards) == self.group_size, (
            f"Expected {self.group_size} run_infos and rewards, but got {len(run_infos)} and {len(rewards)}"
        )

@garrett4wade garrett4wade marked this pull request as draft December 30, 2025 03:05
@samjia2000 samjia2000 requested a review from nuzant December 30, 2025 09:18
Copy link
Collaborator

@nuzant nuzant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR may have conflict with changes introduced in #736 . I can take over and fork a new branch to fix the conflict, then post a new PR to merge changes introduced by #740 and this PR into main.

Comment on lines +203 to +204
logger.warning(
f"Interaction ID mismatch: {interaction.interaction_id} != {id}. It is possibly due to generation failure during trajectory generation."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only happening in individual mode? It seems this should never happen. If generation fails, the interaction should not be added into the cache.

Copy link
Collaborator Author

@samjia2000 samjia2000 Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only happening in individual mode? It seems this should never happen. If generation fails, the interaction should not be added into the cache.

Generation could fail when enforcing max_tokens on the proxy server side.

Comment on lines +61 to +73
def _ensure_dict(
name: str,
item: Any,
) -> Any:
_item = None
if isinstance(item, dict):
_item = {k: _ensure_dict(name, v) for k, v in item.items() if v is not None}
elif isinstance(item, BaseModel):
_item = item.model_dump(exclude_none=True, mode="json")
elif type(item).__name__ == "ValidatorIterator" or isinstance(item, list):
_item = [_ensure_dict(name, i) for i in item]
else:
_item = item
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rewrite a new ensure_input_type function and remove _ensure_message_dict_list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rewrite a new ensure_input_type function and remove _ensure_message_dict_list.

Agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants