Conversation
Summary of ChangesHello @maitry63, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a complete implementation of the ModernBERT architecture within Keras Hub. It encompasses the core backbone with its innovative attention mechanisms and normalization techniques, a specialized masked language model for pre-training, and a custom tokenizer to process input data effectively. The changes aim to provide a state-of-the-art BERT variant with improved architectural components. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive implementation of the ModernBERT architecture, including the backbone, tokenizer, preprocessor, and a masked language model task. The code is well-structured and generally follows the repository's contribution guidelines. However, there are several critical issues related to model and layer implementation, serialization, and testing that need to be addressed. Key issues include incorrect model input/output definitions in the masked LM task, broken serialization in custom layers, and several bugs in the test suites that will prevent them from passing. I've provided detailed comments and suggestions to fix these issues.
| from modernbert_layers import ( | ||
| ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer, | ||
| ) |
There was a problem hiding this comment.
The import from modernbert_layers import ... is incorrect. This relative import will fail when the tests are run. Please use the full import path from the project root.
| from modernbert_layers import ( | |
| ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer, | |
| ) | |
| from keras_hub.src.models.modernbert.modernbert_layers import ( | |
| ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer, | |
| ) |
| def setUp(self): | ||
| self.vocab = ["[CLS]", "[PAD]", "[SEP]", "air", "Ġair", "plane", "Ġat"] | ||
| self.vocab += ["port", "[MASK]", "[UNK]"] | ||
| self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)]) |
There was a problem hiding this comment.
The vocabulary used in this test does not match the special tokens defined in ModernBertTokenizer. The test uses [CLS], [PAD], [SEP], while the tokenizer expects <|endoftext|>, <|padding|>, and <mask>. This will cause a KeyError during tokenizer initialization. Please update the test vocabulary to use the correct special tokens.
| def setUp(self): | |
| self.vocab = ["[CLS]", "[PAD]", "[SEP]", "air", "Ġair", "plane", "Ġat"] | |
| self.vocab += ["port", "[MASK]", "[UNK]"] | |
| self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)]) | |
| self.vocab = ["<|endoftext|>", "<|padding|>", "<mask>", "air", "Ġair", "plane", "Ġat"] | |
| self.vocab += ["port", "[UNK]"] | |
| self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)]) |
| self.init_kwargs = { | ||
| "vocabulary_size": 10, | ||
| "num_layers": 2, | ||
| "num_heads": 4, | ||
| "hidden_dim": 8, | ||
| "intermediate_dim": 32, | ||
| } |
There was a problem hiding this comment.
The init_kwargs dictionary is missing the required local_attention_window argument for instantiating ModernBertBackbone. This will cause a TypeError when running the tests. Please add this argument to the dictionary.
self.init_kwargs = {
"vocabulary_size": 10,
"num_layers": 2,
"num_heads": 4,
"hidden_dim": 8,
"intermediate_dim": 32,
"local_attention_window": 128,
}| def pack_inputs(self, inputs): | ||
| """Pad and truncate to the target sequence length.""" | ||
| return ops.pad( | ||
| inputs, | ||
| axis=-1, | ||
| constant_values=self.tokenizer.pad_token_id, | ||
| )[:, :self.sequence_length] |
There was a problem hiding this comment.
The usage of ops.pad is incorrect. It does not accept an axis argument. You need to provide a paddings tensor that specifies the padding for each dimension. For padding to a fixed sequence length, you should calculate the required padding length and apply it to the sequence dimension.
| def pack_inputs(self, inputs): | |
| """Pad and truncate to the target sequence length.""" | |
| return ops.pad( | |
| inputs, | |
| axis=-1, | |
| constant_values=self.tokenizer.pad_token_id, | |
| )[:, :self.sequence_length] | |
| def pack_inputs(self, inputs): | |
| """Pad and truncate to the target sequence length.""" | |
| shape = ops.shape(inputs) | |
| pad_length = ops.maximum(0, self.sequence_length - shape[-1]) | |
| paddings = [[0, 0] for _ in range(len(shape) - 1)] + [[0, pad_length]] | |
| padded_inputs = ops.pad( | |
| inputs, | |
| paddings, | |
| constant_values=self.tokenizer.pad_token_id, | |
| ) | |
| return padded_inputs[..., : self.sequence_length] |
| self.backbone = ModernBertBackbone( | ||
| vocabulary_size=100, | ||
| num_layers=2, | ||
| num_heads=2, | ||
| hidden_dim=16, | ||
| intermediate_dim=32, | ||
| ) |
There was a problem hiding this comment.
The instantiation of ModernBertBackbone is missing the required local_attention_window argument. This will cause a TypeError when setting up the test. Please provide a value for this argument.
self.backbone = ModernBertBackbone(
vocabulary_size=100,
num_layers=2,
num_heads=2,
hidden_dim=16,
intermediate_dim=32,
local_attention_window=128,
)| def test_serialization(self): | ||
| layer = ModernBertEncoderLayer( | ||
| hidden_dim=16, intermediate_dim=32, num_heads=2, local_attention_window=64 | ||
| ) | ||
| config = layer.get_config() | ||
| new_layer = ModernBertEncoderLayer.from_config(config) | ||
| self.assertEqual(new_layer.local_attention_window, 64) No newline at end of file |
There was a problem hiding this comment.
The serialization test is quite minimal. It only checks if one attribute is correctly restored after calling from_config. According to the style guide (line 412), you should use self.run_layer_test() to verify layer functionality more thoroughly, including serialization, shape inference, and training. This would provide a much more robust test.
References
- The style guide recommends using
self.run_layer_test()for testing individual layers to ensure all core functionality is covered. (link)
| """ModernBERT tokenizer based on Byte-Pair Encoding (BPE). | ||
|
|
||
| ModernBERT uses a byte-level BPE tokenizer. This class handles the | ||
| transformation of raw text into token IDs and manages special tokens | ||
| such as [PAD], [CLS], and [MASK]. | ||
|
|
||
| Args: | ||
| vocabulary: dict or str. A dictionary mapping tokens to IDs, or a path | ||
| to a JSON file containing the vocabulary. | ||
| merges: list or str. A list of BPE merges, or a path to a merges file. | ||
| **kwargs: Standard `BytePairTokenizer` arguments. | ||
| """ |
There was a problem hiding this comment.
The docstring is missing an Example section, which is required by the repository's style guide (line 369). Please add a usage example for ModernBertTokenizer.
References
- Docstrings must include comprehensive examples showing usage patterns. (link)
| """ | ||
| ModernBERT Encoder Layer. | ||
| """ |
There was a problem hiding this comment.
The docstring for ModernBertEncoderLayer is just a title. It's missing the Args and Example sections required by the repository's style guide (lines 529-530). Please provide a complete docstring that documents the layer's arguments and includes a usage example.
References
- Layer docstrings must document all parameters and include usage examples. (link)
| intermediate_dim, | ||
| num_layers, | ||
| num_heads, | ||
| local_attention_window, |
There was a problem hiding this comment.
The __init__ method is missing a default value for the local_attention_window argument. The docstring on line 26 specifies (default 128), but this is not reflected in the method signature. Please add the default value to the signature to match the documentation and avoid TypeError when the argument is not provided.
| local_attention_window, | |
| local_attention_window=128, |
0020e26 to
89e7dcb
Compare
This PR continues work from closed PR #2256
ModernBertBackbone: Support for dynamic configurations, alternating between Global and Local (Sliding Window) attention.
Key Architectural Components: Implementation of RoPE (Rotary Positional Embeddings), GeGLU activation, and RMSNorm.
ModernBertMaskedLM: Masked Language Model task for pre-training and fine-tuning.
ModernBertTokenizer: A dedicated tokenizer compatible with the reference implementation.
Reference
Original PR - #2256