Skip to content

Conversation

emparu
Copy link

@emparu emparu commented Oct 1, 2025

…ate_step defaults

  1. Bug Fix: TypeError during generation for text-only models

The Problem:
When using a Gemma3CausalLM model configured for text-only processing (i.e., with vision_encoder=None and preprocessor=None), a call to causal_lm.generate() fails with a TypeError.

The root cause is that the internal generate_step method returns a dictionary containing an 'images': None key-value pair. This None value is eventually passed to ops.concatenate during the output normalization step, which does not accept None as a valid input. This workflow is common when pretraining a model from scratch.

The Fix:
The generate_step method has been modified to only include the 'images' key in its returned dictionary if an image tensor is actually present. This ensures that a None value is never passed to downstream functions, resolving the TypeError.

Proof of Bug and Fix:
The following Colab notebook demonstrates the bug with the original code and shows the successful execution after applying this fix: https://colab.research.google.com/drive/1QVk2idB6fcdYYJb1cBQGaKHe5QSGjCti?usp=sharing

  1. Refactoring: Remove Hardcoded Stop Token

The Problem:
The internal generate_step method has a hardcoded default stop_token_ids=[106], which corresponds to the <end_of_turn> token. This is conceptually incorrect for a base architectural model, as the model itself should not have opinions about instruction-following or conversational tokens. This hardcoded value can interfere with pretraining or sampling raw text.

The Fix:
The method signature has been changed from stop_token_ids=[106] to stop_token_ids=None.

This is a safe, non-breaking change because the public-facing Gemma3CausalLM.generate() method is already responsible for setting the appropriate stop tokens when a user specifies stop_token_ids="auto".

Description of the change

Reference

Colab Notebook

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have followed the Keras Hub Model contribution guidelines in making these changes.
  • I have followed the Keras Hub API design guidelines in making these changes.
  • I have signed the Contributor License Agreement.

…ate_step defaults

1. Bug Fix: TypeError during generation for text-only models

The Problem:
When using a Gemma3CausalLM model configured for text-only processing (i.e., with vision_encoder=None and preprocessor=None), a call to causal_lm.generate() fails with a TypeError.

The root cause is that the internal generate_step method returns a dictionary containing an 'images': None key-value pair. This None value is eventually passed to ops.concatenate during the output normalization step, which does not accept None as a valid input. This workflow is common when pretraining a model from scratch.

The Fix:
The generate_step method has been modified to only include the 'images' key in its returned dictionary if an image tensor is actually present. This ensures that a None value is never passed to downstream functions, resolving the TypeError.

Proof of Bug and Fix:
The following Colab notebook demonstrates the bug with the original code and shows the successful execution after applying this fix:
https://colab.research.google.com/drive/1QVk2idB6fcdYYJb1cBQGaKHe5QSGjCti?usp=sharing


2. Refactoring: Remove Hardcoded Stop Token

The Problem:
The internal generate_step method has a hardcoded default stop_token_ids=[106], which corresponds to the <end_of_turn> token. This is conceptually incorrect for a base architectural model, as the model itself should not have opinions about instruction-following or conversational tokens. This hardcoded value can interfere with pretraining or sampling raw text.

The Fix:
The method signature has been changed from stop_token_ids=[106] to stop_token_ids=None.

This is a safe, non-breaking change because the public-facing Gemma3CausalLM.generate() method is already responsible for setting the appropriate stop tokens when a user specifies stop_token_ids="auto".
@github-actions github-actions bot added the Gemma Gemma model specific issues label Oct 1, 2025
Copy link
Contributor

Summary of Changes

Hello @emparu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses two key areas within the Gemma3CausalLM model's generation logic. It fixes a TypeError that arose when using the model in a text-only configuration by refining how image-related outputs are handled. Additionally, it refactors the internal generate_step method to remove an inappropriate hardcoded stop token, making the model's behavior more flexible and architecturally sound for various use cases, including pretraining.

Highlights

  • Bug Fix: TypeError in text-only generation: Resolved a TypeError that occurred when using Gemma3CausalLM for text-only processing. Previously, the generate_step method would return images: None, which caused ops.concatenate to fail. The fix ensures the 'images' key is only included in the output dictionary if an image tensor is actually present, preventing None from being passed to downstream functions.
  • Refactoring: Removed hardcoded stop token: The generate_step method no longer has a hardcoded default stop_token_ids=[106]. This default, corresponding to <end_of_turn>, was conceptually incorrect for a base architectural model and could interfere with pretraining. The signature has been changed to stop_token_ids=None, as the public generate() method already handles setting appropriate stop tokens.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two valuable improvements to the Gemma3CausalLM model. First, it corrects a TypeError that occurred during text-only generation by ensuring the images key is only included in the output dictionary when an image is present. Second, it refactors the generate_step method to remove a hardcoded stop_token_ids default value, correctly changing it to None. This change not only makes the base model more generic but also resolves a potential issue with mutable default arguments. The changes are well-implemented and align with the project's coding standards. The pull request is well-documented with a clear explanation of the problem and the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant