ModernBERT Implementation by maitry63 · Pull Request #2518 · keras-team/keras-hub

maitry63 · 2026-01-12T02:56:31Z

This PR continues work from closed PR #2256

ModernBertBackbone: Support for dynamic configurations, alternating between Global and Local (Sliding Window) attention.

Key Architectural Components: Implementation of RoPE (Rotary Positional Embeddings), GeGLU activation, and RMSNorm.

ModernBertMaskedLM: Masked Language Model task for pre-training and fine-tuning.

ModernBertTokenizer: A dedicated tokenizer compatible with the reference implementation.

Reference

Original PR - #2256

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

gemini-code-assist · 2026-01-12T02:56:50Z

Summary of Changes

Hello @maitry63, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a complete implementation of the ModernBERT architecture within Keras Hub. It encompasses the core backbone with its innovative attention mechanisms and normalization techniques, a specialized masked language model for pre-training, and a custom tokenizer to process input data effectively. The changes aim to provide a state-of-the-art BERT variant with improved architectural components.

Highlights

ModernBertBackbone Implementation: Introduces a new backbone model supporting dynamic configurations, including alternating Global and Local (Sliding Window) attention mechanisms.
Advanced Architectural Components: Integrates Rotary Positional Embeddings (RoPE), GeGLU activation functions, and RMSNorm for enhanced model performance and stability.
Masked Language Model (MLM) Task: Provides ModernBertMaskedLM for pre-training and fine-tuning, leveraging the new backbone.
Dedicated Tokenizer: Includes ModernBertTokenizer, a byte-level BPE tokenizer compatible with the ModernBERT architecture and its special tokens.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive implementation of the ModernBERT architecture, including the backbone, tokenizer, preprocessor, and a masked language model task. The code is well-structured and generally follows the repository's contribution guidelines. However, there are several critical issues related to model and layer implementation, serialization, and testing that need to be addressed. Key issues include incorrect model input/output definitions in the masked LM task, broken serialization in custom layers, and several bugs in the test suites that will prevent them from passing. I've provided detailed comments and suggestions to fix these issues.

gemini-code-assist · 2026-01-12T03:00:21Z

keras_hub/src/models/modernbert/modernbert_layers_test.py

+from modernbert_layers import (
+    ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer,
+)


The import from modernbert_layers import ... is incorrect. This relative import will fail when the tests are run. Please use the full import path from the project root.

Suggested change

from modernbert_layers import (

ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer,

)

from keras_hub.src.models.modernbert.modernbert_layers import (

ModernBertMLP, ModernBertAttention, ModernBertEncoderLayer,

)

gemini-code-assist · 2026-01-12T03:00:22Z

keras_hub/src/models/modernbert/modernbert_tokenizer_test.py

+    def setUp(self):
+        self.vocab = ["[CLS]", "[PAD]", "[SEP]", "air", "Ġair", "plane", "Ġat"]
+        self.vocab += ["port", "[MASK]", "[UNK]"]
+        self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)])


The vocabulary used in this test does not match the special tokens defined in ModernBertTokenizer. The test uses [CLS], [PAD], [SEP], while the tokenizer expects <|endoftext|>, <|padding|>, and <mask>. This will cause a KeyError during tokenizer initialization. Please update the test vocabulary to use the correct special tokens.

Suggested change

def setUp(self):

self.vocab = ["[CLS]", "[PAD]", "[SEP]", "air", "Ġair", "plane", "Ġat"]

self.vocab += ["port", "[MASK]", "[UNK]"]

self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)])

self.vocab = ["<|endoftext|>", "<|padding|>", "<mask>", "air", "Ġair", "plane", "Ġat"]

self.vocab += ["port", "[UNK]"]

self.vocab = dict([(token, i) for i, token in enumerate(self.vocab)])

gemini-code-assist · 2026-01-12T03:00:22Z

keras_hub/src/models/modernbert/modernbert_backbone_test.py

+        self.init_kwargs = {
+            "vocabulary_size": 10,
+            "num_layers": 2,
+            "num_heads": 4,
+            "hidden_dim": 8,
+            "intermediate_dim": 32,
+        }


The init_kwargs dictionary is missing the required local_attention_window argument for instantiating ModernBertBackbone. This will cause a TypeError when running the tests. Please add this argument to the dictionary.

self.init_kwargs = { "vocabulary_size": 10, "num_layers": 2, "num_heads": 4, "hidden_dim": 8, "intermediate_dim": 32, "local_attention_window": 128, }

gemini-code-assist · 2026-01-12T03:00:22Z

keras_hub/src/models/modernbert/modernbert_preprocessor.py

+    def pack_inputs(self, inputs):
+        """Pad and truncate to the target sequence length."""
+        return ops.pad(
+            inputs,
+            axis=-1,
+            constant_values=self.tokenizer.pad_token_id,
+        )[:, :self.sequence_length]


The usage of ops.pad is incorrect. It does not accept an axis argument. You need to provide a paddings tensor that specifies the padding for each dimension. For padding to a fixed sequence length, you should calculate the required padding length and apply it to the sequence dimension.

Suggested change

def pack_inputs(self, inputs):

"""Pad and truncate to the target sequence length."""

return ops.pad(

inputs,

axis=-1,

constant_values=self.tokenizer.pad_token_id,

)[:, :self.sequence_length]

def pack_inputs(self, inputs):

"""Pad and truncate to the target sequence length."""

shape = ops.shape(inputs)

pad_length = ops.maximum(0, self.sequence_length - shape[-1])

paddings = [[0, 0] for _ in range(len(shape) - 1)] + [[0, pad_length]]

padded_inputs = ops.pad(

inputs,

paddings,

constant_values=self.tokenizer.pad_token_id,

)

return padded_inputs[..., : self.sequence_length]

gemini-code-assist · 2026-01-12T03:00:22Z

keras_hub/src/models/modernbert/modernbert_masked_lm_test.py

+        self.backbone = ModernBertBackbone(
+            vocabulary_size=100,
+            num_layers=2,
+            num_heads=2,
+            hidden_dim=16,
+            intermediate_dim=32,
+        )


The instantiation of ModernBertBackbone is missing the required local_attention_window argument. This will cause a TypeError when setting up the test. Please provide a value for this argument.

self.backbone = ModernBertBackbone( vocabulary_size=100, num_layers=2, num_heads=2, hidden_dim=16, intermediate_dim=32, local_attention_window=128, )

keras_hub/src/models/modernbert/modernbert_backbone.py

gemini-code-assist · 2026-01-12T03:00:23Z

keras_hub/src/models/modernbert/modernbert_layers_test.py

+    def test_serialization(self):
+        layer = ModernBertEncoderLayer(
+            hidden_dim=16, intermediate_dim=32, num_heads=2, local_attention_window=64
+        )
+        config = layer.get_config()
+        new_layer = ModernBertEncoderLayer.from_config(config)
+        self.assertEqual(new_layer.local_attention_window, 64)


The serialization test is quite minimal. It only checks if one attribute is correctly restored after calling from_config. According to the style guide (line 412), you should use self.run_layer_test() to verify layer functionality more thoroughly, including serialization, shape inference, and training. This would provide a much more robust test.

References

The style guide recommends using self.run_layer_test() for testing individual layers to ensure all core functionality is covered. ^(link)

gemini-code-assist · 2026-01-12T03:00:23Z

keras_hub/src/models/modernbert/modernbert_tokenizer.py

+    """ModernBERT tokenizer based on Byte-Pair Encoding (BPE).
+
+    ModernBERT uses a byte-level BPE tokenizer. This class handles the 
+    transformation of raw text into token IDs and manages special tokens 
+    such as [PAD], [CLS], and [MASK].
+
+    Args:
+        vocabulary: dict or str. A dictionary mapping tokens to IDs, or a path
+            to a JSON file containing the vocabulary.
+        merges: list or str. A list of BPE merges, or a path to a merges file.
+        **kwargs: Standard `BytePairTokenizer` arguments.
+    """


The docstring is missing an Example section, which is required by the repository's style guide (line 369). Please add a usage example for ModernBertTokenizer.

References

Docstrings must include comprehensive examples showing usage patterns. ^(link)

gemini-code-assist · 2026-01-12T03:00:23Z

keras_hub/src/models/modernbert/modernbert_layers.py

+    """
+    ModernBERT Encoder Layer.
+    """


The docstring for ModernBertEncoderLayer is just a title. It's missing the Args and Example sections required by the repository's style guide (lines 529-530). Please provide a complete docstring that documents the layer's arguments and includes a usage example.

References

Layer docstrings must document all parameters and include usage examples. ^(link)

gemini-code-assist · 2026-01-12T03:00:23Z

keras_hub/src/models/modernbert/modernbert_backbone.py

+        intermediate_dim,
+        num_layers,
+        num_heads,
+        local_attention_window,


The __init__ method is missing a default value for the local_attention_window argument. The docstring on line 26 specifies (default 128), but this is not reflected in the method signature. Please add the default value to the signature to match the documentation and avoid TypeError when the argument is not provided.

Suggested change

local_attention_window,

local_attention_window=128,

SauravMaheshkar added 3 commits May 16, 2025 15:49

feat: init ModernBertBackbone

6bed56a

feat: update backbone + add tokenizer

5b6e62a

chore: api-gen

89e7dcb

maitry63 mentioned this pull request Jan 12, 2026

Adding ModernBert model #2477

Closed

6 tasks

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

maitry63 closed this Jan 13, 2026

maitry63 reopened this Jan 13, 2026

maitry63 force-pushed the modernbert-implementation branch from 0020e26 to 89e7dcb Compare January 15, 2026 19:24

feat: modernbert implementation

dfc2ffe

sachinprasadhs added the new model For PRs that contribute a new model to the Keras Hub registry. label Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

ModernBERT Implementation#2518

ModernBERT Implementation#2518
maitry63 wants to merge 4 commits intokeras-team:masterfrom
maitry63:modernbert-implementation

maitry63 commented Jan 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

gemini-code-assist bot Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

maitry63 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maitry63 commented Jan 12, 2026 •

edited

Loading