Note: This repository is Python-first. Prefer the Python guidelines in this document.
- Python: Google Python Style Guide
- Shell: Google Shell Style Guide
Use uv run to execute scripts, rather than activating a virtual environment and calling python directly.
Don't:
source .venv/bin/activate
python examples/models/generate_from_hf.pyDo:
uv run python examples/models/generate_from_hf.pyException: docker/Dockerfile.ci is exempt from this rule.
- The code developed for Megatron-Bridge should conform to Python 3.10+.
- Maximum line length is 119 characters (matching ruff configuration).
- Indent code with 4 spaces. Do not use tabs.
-
Files
- snake_case:
some_file.py
- snake_case:
-
Classes
- PascalCase:
class SomeClass
- PascalCase:
-
Functions and Methods
- snake_case:
def my_awesome_function():
- snake_case:
-
Local Variables
- snake_case:
my_variable = ... - prefix
kfor variable names that start with a number:k_99th_percentile = ...
- snake_case:
-
Global Variables
- upper snake_case and prefix
G:G_MY_GLOBAL = ...
- upper snake_case and prefix
-
Constants
- upper snake_case:
MY_CONSTANT = ...
- upper snake_case:
- Avoid shadowing variables declared in an outer scope.
- Initialize all externally visible members of a class in the constructor.
Organize imports in the following order, separated by blank lines:
- Future imports
- Standard library imports
- Third-party imports (including
megatron.core,torch,transformers) - First-party imports (
megatron.bridge.*) - Local folder imports
Example:
from __future__ import annotations
import abc
import logging
import torch
from megatron.core import parallel_state as mpu
from transformers import PreTrainedModel
from megatron.bridge.models.model_bridge import MegatronModelBridge
from megatron.bridge.utils.common_utils import print_rank_0- Use double quotes for strings (matching ruff formatter configuration).
- For interfaces that may be used outside a file, prefer docstrings over comments.
- Comments should be reserved for code within a function, or interfaces that are local to a file.
- If a piece of code is commented out, there should be a comment around that piece of code describing its usage and why it's commented out. Otherwise that is a debug comment and it should be removed before merging.
Use the Google style, which can be parsed by Sphinx.
Example:
def convert_weights(
source_model: torch.nn.Module,
target_model: torch.nn.Module,
mapping: MegatronParamMapping,
) -> dict[str, torch.Tensor]:
"""Convert weights from source to target model format.
This function handles the conversion of weights between HuggingFace
and Megatron model formats, including tensor parallel distribution.
Args:
source_model: The source model containing weights to convert.
target_model: The target model that will receive converted weights.
mapping: Parameter mapping defining the conversion rules.
Returns:
Dictionary mapping parameter names to converted weight tensors.
Raises:
ValueError: If source and target models have incompatible shapes.
"""
...Avoid using reflection when functionality can be easily achieved without reflection.
For example, instead of:
def make_complex(*args):
x, y = args
return dict(**locals())Do:
def make_complex(x, y):
return {"x": x, "y": y}- When using try-except blocks, limit the except to the smallest set of errors possible.
For example, instead of:
try:
open(path, "r").read()
except:
print("Failed to open file")Do:
try:
open(path, "r").read()
except FileNotFoundError:
print("Failed to open file")- When using try-except blocks to handle multiple possible variable types (i.e. duck-typing), keep the body of the try as small as possible, using the else block to implement the logic.
For example, instead of:
try:
f.seek(0)
f.read()
except AttributeError:
... # Not a file-like object, do something elseDo:
try:
f.seek # Do not call to minimize chance of unrelated failure
except AttributeError:
... # Not a file-like object, do something else
else:
f.seek(0)
f.read()- Use type hints for function arguments and return types.
- Use
T | Nonefor nullable types (notOptional[T]). - Use
X | Yfor union types (notUnion[X, Y]). - Use
TypeVarfor generic type parameters. - Use built-in generics (
list,dict,tuple) instead oftypingequivalents.
Example:
from typing import TypeVar
T = TypeVar("T", bound=torch.nn.Module)
def get_module_by_name(
model: T,
name: str,
default: torch.nn.Module | None = None,
) -> torch.nn.Module | None:
"""Get a module from a model by its name."""
...
def convert_weights(
weights: torch.Tensor | dict[str, torch.Tensor],
) -> dict[str, torch.Tensor]:
"""Convert weights, accepting either a single tensor or a dict."""
...- Use
dataclassesorNamedTuplefor configuration objects. - Be explicit about required vs optional fields.
- Do not add arbitrary defaults for configs; be as explicit as possible.
Example:
from dataclasses import dataclass
@dataclass
class ModelConfig:
"""Configuration for model architecture."""
hidden_size: int
num_layers: int
num_attention_heads: int
vocab_size: int
max_position_embeddings: int = 2048
hidden_dropout: float = 0.1
attention_dropout: float = 0.1
use_flash_attention: bool | None = NoneWhen adding new model bridges, follow these conventions:
- Create a new directory under
src/megatron/bridge/models/<model_name>/ - Implement the parameter mapping in
param_mapping.py - Implement the model bridge in
model_bridge.py - Register the model in the appropriate registry
- Always validate tensor shapes before copying weights.
- Handle tensor parallel and pipeline parallel distribution correctly.
- Use
print_rank_0for logging to avoid duplicate output across ranks.
Recipes should be placed under src/megatron/bridge/recipes/<model_name>/ and include:
- Model configuration defaults
- Training hyperparameters
- Parallelism settings
- Data configuration
Use descriptive names that include the model size and configuration:
llama3_8b.py
llama3_70b.py
qwen2_7b_instruct.py
When a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensure that docs/index.md is updated and the document appears in the most appropriate section.
Important: All new key features (e.g., enabling a new model, enabling a new parallelism strategy) must include documentation updates. This should:
- Explain the motivation and purpose of the feature
- Outline the technical approach and architecture
- Provide clear usage examples and instructions for users
- Document internal implementation details where appropriate
- Place unit tests in
tests/unit_tests/ - Name test files with
test_prefix:test_model_bridge.py - Use pytest fixtures for common setup
- Use
pytest.markto categorize tests (unit, integration, system)
- Place functional tests in
tests/functional_tests/ - Use subprocess for tests that require process isolation
- Document hardware requirements for GPU tests
Use appropriate pytest markers:
import pytest
@pytest.mark.unit
def test_parameter_mapping():
"""Test that parameter mapping is correct."""
...
@pytest.mark.integration
def test_model_loading():
"""Test end-to-end model loading."""
...Add the following NVIDIA copyright header to all Python files and shell scripts. The header should appear at the top of the file:
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.