Skip to content

Conversation

@mayank31398
Copy link
Collaborator

No description provided.

Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mayank31398, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the capability to integrate causal convolution into the GRU and RNN sequence mixer blocks. By adding configurable kernel_size and activation_function parameters, the recurrent layers can now optionally apply a causal convolutional filter to their inputs, enhancing their ability to capture local temporal dependencies while maintaining the sequential processing nature of RNNs and GRUs. This provides greater flexibility in designing and optimizing sequence models.

Highlights

  • Configuration Updates: New kernel_size and activation_function parameters have been added to the _GRUArgs and _RNNArgs configuration classes, allowing for the specification of causal convolution properties.
  • Causal Convolution Integration: The GRU and RNN sequence mixer blocks now conditionally incorporate a ParameterizedConv1d layer and utilize a causal_convolution function in their forward pass when a kernel_size is provided. This enables these recurrent layers to perform causal convolutional filtering on their inputs.
  • State Management Enhancement: The cache mechanism for both GRU and RNN layers has been updated to store and retrieve separate states for the convolutional component (conv_state) and the recurrent component (ssm_state), ensuring proper state propagation during sequence generation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for causal convolution to the GRU and RNN sequence mixer blocks. The changes look good overall, introducing kernel_size and activation_function parameters to enable this feature. However, I've identified a couple of critical issues in both gru.py and rnn.py that need to be addressed. There's a line that will cause an AttributeError when convolution is disabled, and there's a fundamental mismatch with the GenerationCache implementation which will lead to TypeErrors. The cache logic needs to be updated to handle the new dual-state (convolution and recurrent) system.


self.reset_parameters()

mark_parameter_as_mup_learning_rate(self.conv1d.weight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line will cause an AttributeError when kernel_size is None, as self.conv1d is only initialized within the else block where kernel_size is not None (lines 103-114).

Additionally, this call is redundant, as self.conv1d.weight is already marked for muP learning rate on line 116. Please remove this line to fix the crash and avoid duplication.

)
x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens)

c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There seems to be a mismatch with the GenerationCache implementation for RNN/GRU layers (_RNNCache).

  1. This line attempts to unpack two values (c, h) from cache_params.get_cache(self.layer_idx). However, the current _RNNCache.get_cache returns a single tensor, which will lead to a TypeError during unpacking.
  2. Similarly, on line 209, cache_params.update is called with conv_state and ssm_state arguments, but the _RNNCache.update method does not accept these keyword arguments, which will also cause a TypeError.

It appears the _RNNCache class needs to be updated to store and manage two separate states (one for the convolution and one for the GRU state) when causal convolution is used.


self.reset_parameters()

mark_parameter_as_mup_learning_rate(self.conv1d.weight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line will raise an AttributeError if kernel_size is None, because self.conv1d is only defined within the conditional block (lines 78-89) where kernel_size is not None.

This call is also redundant, as self.conv1d.weight is already marked on line 91. Removing this line will fix the potential crash and the code duplication.

x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens)

input_state = None if cache_params is None else cache_params.get_cache(self.layer_idx)
c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a mismatch with the _RNNCache implementation that will cause runtime errors.

  1. This line attempts to unpack two values (c, h) from cache_params.get_cache(self.layer_idx), but _RNNCache.get_cache returns a single tensor, which will cause a TypeError.
  2. On line 170, cache_params.update is called with conv_state and ssm_state, but the _RNNCache.update method signature doesn't support these arguments, leading to another TypeError.

The _RNNCache needs to be updated to handle separate states for the convolution and the RNN when causal convolution is enabled.

Signed-off-by: Mayank Mishra <[email protected]>
@mayank31398 mayank31398 merged commit 390d4d9 into main Jan 8, 2026
2 checks passed
@mayank31398 mayank31398 deleted the tuner branch January 8, 2026 01:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants