[RNN, GRU] support causal convolution with RNN and GRU layers #360

mayank31398 · 2026-01-07T23:10:12Z

No description provided.

Signed-off-by: Mayank Mishra <[email protected]>

gemini-code-assist · 2026-01-07T23:10:31Z

Summary of Changes

Hello @mayank31398, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the capability to integrate causal convolution into the GRU and RNN sequence mixer blocks. By adding configurable kernel_size and activation_function parameters, the recurrent layers can now optionally apply a causal convolutional filter to their inputs, enhancing their ability to capture local temporal dependencies while maintaining the sequential processing nature of RNNs and GRUs. This provides greater flexibility in designing and optimizing sequence models.

Highlights

Configuration Updates: New kernel_size and activation_function parameters have been added to the _GRUArgs and _RNNArgs configuration classes, allowing for the specification of causal convolution properties.
Causal Convolution Integration: The GRU and RNN sequence mixer blocks now conditionally incorporate a ParameterizedConv1d layer and utilize a causal_convolution function in their forward pass when a kernel_size is provided. This enables these recurrent layers to perform causal convolutional filtering on their inputs.
State Management Enhancement: The cache mechanism for both GRU and RNN layers has been updated to store and retrieve separate states for the convolutional component (conv_state) and the recurrent component (ssm_state), ensuring proper state propagation during sequence generation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for causal convolution to the GRU and RNN sequence mixer blocks. The changes look good overall, introducing kernel_size and activation_function parameters to enable this feature. However, I've identified a couple of critical issues in both gru.py and rnn.py that need to be addressed. There's a line that will cause an AttributeError when convolution is disabled, and there's a fundamental mismatch with the GenerationCache implementation which will lead to TypeErrors. The cache logic needs to be updated to handle the new dual-state (convolution and recurrent) system.

gemini-code-assist · 2026-01-07T23:11:29Z

lm_engine/hf_models/modeling_utils/sequence_mixer_blocks/gru.py


-        self.reset_parameters()
-
+        mark_parameter_as_mup_learning_rate(self.conv1d.weight)


This line will cause an AttributeError when kernel_size is None, as self.conv1d is only initialized within the else block where kernel_size is not None (lines 103-114).

Additionally, this call is redundant, as self.conv1d.weight is already marked for muP learning rate on line 116. Please remove this line to fix the crash and avoid duplication.

gemini-code-assist · 2026-01-07T23:11:29Z

lm_engine/hf_models/modeling_utils/sequence_mixer_blocks/gru.py

-        )
+                x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens)
+
+        c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx)


There seems to be a mismatch with the GenerationCache implementation for RNN/GRU layers (_RNNCache).

This line attempts to unpack two values (c, h) from cache_params.get_cache(self.layer_idx). However, the current _RNNCache.get_cache returns a single tensor, which will lead to a TypeError during unpacking.

Similarly, on line 209, cache_params.update is called with conv_state and ssm_state arguments, but the _RNNCache.update method does not accept these keyword arguments, which will also cause a TypeError.

It appears the _RNNCache class needs to be updated to store and manage two separate states (one for the convolution and one for the GRU state) when causal convolution is used.

gemini-code-assist · 2026-01-07T23:11:29Z

lm_engine/hf_models/modeling_utils/sequence_mixer_blocks/rnn.py


-        self.reset_parameters()
-
+        mark_parameter_as_mup_learning_rate(self.conv1d.weight)


This line will raise an AttributeError if kernel_size is None, because self.conv1d is only defined within the conditional block (lines 78-89) where kernel_size is not None.

This call is also redundant, as self.conv1d.weight is already marked on line 91. Removing this line will fix the potential crash and the code duplication.

gemini-code-assist · 2026-01-07T23:11:29Z

lm_engine/hf_models/modeling_utils/sequence_mixer_blocks/rnn.py

+                x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens)

-        input_state = None if cache_params is None else cache_params.get_cache(self.layer_idx)
+        c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx)


There's a mismatch with the _RNNCache implementation that will cause runtime errors.

This line attempts to unpack two values (c, h) from cache_params.get_cache(self.layer_idx), but _RNNCache.get_cache returns a single tensor, which will cause a TypeError.

On line 170, cache_params.update is called with conv_state and ssm_state, but the _RNNCache.update method signature doesn't support these arguments, leading to another TypeError.

The _RNNCache needs to be updated to handle separate states for the convolution and the RNN when causal convolution is enabled.

Signed-off-by: Mayank Mishra <[email protected]>

mayank31398 added 4 commits January 7, 2026 14:53

update rnn

198497e

Signed-off-by: Mayank Mishra <[email protected]>

update __init__

3fda87f

Signed-off-by: Mayank Mishra <[email protected]>

update gru

cba2701

Signed-off-by: Mayank Mishra <[email protected]>

update gru

6e1bb32

Signed-off-by: Mayank Mishra <[email protected]>

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

update gru

61bab95

Signed-off-by: Mayank Mishra <[email protected]>

mayank31398 merged commit 390d4d9 into main Jan 8, 2026
2 checks passed

mayank31398 deleted the tuner branch January 8, 2026 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RNN, GRU] support causal convolution with RNN and GRU layers #360

[RNN, GRU] support causal convolution with RNN and GRU layers #360

Uh oh!

mayank31398 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		self.reset_parameters()

		mark_parameter_as_mup_learning_rate(self.conv1d.weight)

[RNN, GRU] support causal convolution with RNN and GRU layers #360

[RNN, GRU] support causal convolution with RNN and GRU layers #360

Uh oh!

Conversation

mayank31398 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants