-
Notifications
You must be signed in to change notification settings - Fork 24
[RNN, GRU] support causal convolution with RNN and GRU layers #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
Signed-off-by: Mayank Mishra <[email protected]>
Summary of ChangesHello @mayank31398, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the capability to integrate causal convolution into the GRU and RNN sequence mixer blocks. By adding configurable Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for causal convolution to the GRU and RNN sequence mixer blocks. The changes look good overall, introducing kernel_size and activation_function parameters to enable this feature. However, I've identified a couple of critical issues in both gru.py and rnn.py that need to be addressed. There's a line that will cause an AttributeError when convolution is disabled, and there's a fundamental mismatch with the GenerationCache implementation which will lead to TypeErrors. The cache logic needs to be updated to handle the new dual-state (convolution and recurrent) system.
|
|
||
| self.reset_parameters() | ||
|
|
||
| mark_parameter_as_mup_learning_rate(self.conv1d.weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will cause an AttributeError when kernel_size is None, as self.conv1d is only initialized within the else block where kernel_size is not None (lines 103-114).
Additionally, this call is redundant, as self.conv1d.weight is already marked for muP learning rate on line 116. Please remove this line to fix the crash and avoid duplication.
| ) | ||
| x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens) | ||
|
|
||
| c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a mismatch with the GenerationCache implementation for RNN/GRU layers (_RNNCache).
- This line attempts to unpack two values (
c,h) fromcache_params.get_cache(self.layer_idx). However, the current_RNNCache.get_cachereturns a single tensor, which will lead to aTypeErrorduring unpacking. - Similarly, on line 209,
cache_params.updateis called withconv_stateandssm_statearguments, but the_RNNCache.updatemethod does not accept these keyword arguments, which will also cause aTypeError.
It appears the _RNNCache class needs to be updated to store and manage two separate states (one for the convolution and one for the GRU state) when causal convolution is used.
|
|
||
| self.reset_parameters() | ||
|
|
||
| mark_parameter_as_mup_learning_rate(self.conv1d.weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will raise an AttributeError if kernel_size is None, because self.conv1d is only defined within the conditional block (lines 78-89) where kernel_size is not None.
This call is also redundant, as self.conv1d.weight is already marked on line 91. Removing this line will fix the potential crash and the code duplication.
| x = pack_sequence(inputs=x, cu_seqlens=cu_seqlens) | ||
|
|
||
| input_state = None if cache_params is None else cache_params.get_cache(self.layer_idx) | ||
| c, h = (None, None) if cache_params is None else cache_params.get_cache(self.layer_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a mismatch with the _RNNCache implementation that will cause runtime errors.
- This line attempts to unpack two values (
c,h) fromcache_params.get_cache(self.layer_idx), but_RNNCache.get_cachereturns a single tensor, which will cause aTypeError. - On line 170,
cache_params.updateis called withconv_stateandssm_state, but the_RNNCache.updatemethod signature doesn't support these arguments, leading to anotherTypeError.
The _RNNCache needs to be updated to handle separate states for the convolution and the RNN when causal convolution is enabled.
Signed-off-by: Mayank Mishra <[email protected]>
No description provided.