Megatron-Bridge Coding Guidelines

Note: This repository is Python-first. Prefer the Python guidelines in this document.

Style Guides We Follow

Python: Google Python Style Guide
Shell: Google Shell Style Guide

uv Guidelines

Use uv run instead of python

Use uv run to execute scripts, rather than activating a virtual environment and calling python directly.

Don't:

source .venv/bin/activate
python examples/models/generate_from_hf.py

Do:

uv run python examples/models/generate_from_hf.py

Exception: docker/Dockerfile.ci is exempt from this rule.

Python Coding Guidelines

Python Standard

The code developed for Megatron-Bridge should conform to Python 3.10+.

Line Length

Maximum line length is 119 characters (matching ruff configuration).

Indentation

Indent code with 4 spaces. Do not use tabs.

Naming

Identifier Format

Files
- snake_case: some_file.py
Classes
- PascalCase: class SomeClass
Functions and Methods
- snake_case: def my_awesome_function():
Local Variables
- snake_case: my_variable = ...
- prefix k for variable names that start with a number: k_99th_percentile = ...
Global Variables
- upper snake_case and prefix G: G_MY_GLOBAL = ...
Constants
- upper snake_case: MY_CONSTANT = ...

Identifier Guidelines

Avoid shadowing variables declared in an outer scope.
Initialize all externally visible members of a class in the constructor.

Imports

Organize imports in the following order, separated by blank lines:

Future imports
Standard library imports
Third-party imports (including megatron.core, torch, transformers)
First-party imports (megatron.bridge.*)
Local folder imports

Example:

from __future__ import annotations

import abc
import logging

import torch
from megatron.core import parallel_state as mpu
from transformers import PreTrainedModel

from megatron.bridge.models.model_bridge import MegatronModelBridge
from megatron.bridge.utils.common_utils import print_rank_0

String Quotes

Use double quotes for strings (matching ruff formatter configuration).

Comments

For interfaces that may be used outside a file, prefer docstrings over comments.
Comments should be reserved for code within a function, or interfaces that are local to a file.
If a piece of code is commented out, there should be a comment around that piece of code describing its usage and why it's commented out. Otherwise that is a debug comment and it should be removed before merging.

Docstring Syntax

Classes and Functions

Use the Google style, which can be parsed by Sphinx.

Example:

def convert_weights(
    source_model: torch.nn.Module,
    target_model: torch.nn.Module,
    mapping: MegatronParamMapping,
) -> dict[str, torch.Tensor]:
    """Convert weights from source to target model format.

    This function handles the conversion of weights between HuggingFace
    and Megatron model formats, including tensor parallel distribution.

    Args:
        source_model: The source model containing weights to convert.
        target_model: The target model that will receive converted weights.
        mapping: Parameter mapping defining the conversion rules.

    Returns:
        Dictionary mapping parameter names to converted weight tensors.

    Raises:
        ValueError: If source and target models have incompatible shapes.
    """
    ...

Avoid Reflection

Avoid using reflection when functionality can be easily achieved without reflection.

For example, instead of:

def make_complex(*args):
    x, y = args
    return dict(**locals())

Do:

def make_complex(x, y):
    return {"x": x, "y": y}

Error Handling

When using try-except blocks, limit the except to the smallest set of errors possible.

For example, instead of:

try:
    open(path, "r").read()
except:
    print("Failed to open file")

Do:

try:
    open(path, "r").read()
except FileNotFoundError:
    print("Failed to open file")

When using try-except blocks to handle multiple possible variable types (i.e. duck-typing), keep the body of the try as small as possible, using the else block to implement the logic.

For example, instead of:

try:
    f.seek(0)
    f.read()
except AttributeError:
    ... # Not a file-like object, do something else

Do:

try:
    f.seek  # Do not call to minimize chance of unrelated failure
except AttributeError:
    ... # Not a file-like object, do something else
else:
    f.seek(0)
    f.read()

Type Hints

Use type hints for function arguments and return types.
Use T | None for nullable types (not Optional[T]).
Use X | Y for union types (not Union[X, Y]).
Use TypeVar for generic type parameters.
Use built-in generics (list, dict, tuple) instead of typing equivalents.

Example:

from typing import TypeVar

T = TypeVar("T", bound=torch.nn.Module)

def get_module_by_name(
    model: T,
    name: str,
    default: torch.nn.Module | None = None,
) -> torch.nn.Module | None:
    """Get a module from a model by its name."""
    ...

def convert_weights(
    weights: torch.Tensor | dict[str, torch.Tensor],
) -> dict[str, torch.Tensor]:
    """Convert weights, accepting either a single tensor or a dict."""
    ...

Configuration and Dataclasses

Use dataclasses or NamedTuple for configuration objects.
Be explicit about required vs optional fields.
Do not add arbitrary defaults for configs; be as explicit as possible.

Example:

from dataclasses import dataclass

@dataclass
class ModelConfig:
    """Configuration for model architecture."""

    hidden_size: int
    num_layers: int
    num_attention_heads: int
    vocab_size: int
    max_position_embeddings: int = 2048
    hidden_dropout: float = 0.1
    attention_dropout: float = 0.1
    use_flash_attention: bool | None = None

Model Bridge Guidelines

Parameter Mapping

When adding new model bridges, follow these conventions:

Create a new directory under src/megatron/bridge/models/<model_name>/
Implement the parameter mapping in param_mapping.py
Implement the model bridge in model_bridge.py
Register the model in the appropriate registry

Weight Conversion

Always validate tensor shapes before copying weights.
Handle tensor parallel and pipeline parallel distribution correctly.
Use print_rank_0 for logging to avoid duplicate output across ranks.

Recipe Guidelines

Recipe Structure

Recipes should be placed under src/megatron/bridge/recipes/<model_name>/ and include:

Model configuration defaults
Training hyperparameters
Parallelism settings
Data configuration

Recipe Naming

Use descriptive names that include the model size and configuration:

llama3_8b.py
llama3_70b.py
qwen2_7b_instruct.py

Documentation Guidelines

Ensure docs/index.md is up to date

When a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensure that docs/index.md is updated and the document appears in the most appropriate section.

Documentation Requirements

Important: All new key features (e.g., enabling a new model, enabling a new parallelism strategy) must include documentation updates. This should:

Explain the motivation and purpose of the feature
Outline the technical approach and architecture
Provide clear usage examples and instructions for users
Document internal implementation details where appropriate

Testing Guidelines

Unit Tests

Place unit tests in tests/unit_tests/
Name test files with test_ prefix: test_model_bridge.py
Use pytest fixtures for common setup
Use pytest.mark to categorize tests (unit, integration, system)

Functional Tests

Place functional tests in tests/functional_tests/
Use subprocess for tests that require process isolation
Document hardware requirements for GPU tests

Test Markers

Use appropriate pytest markers:

import pytest

@pytest.mark.unit
def test_parameter_mapping():
    """Test that parameter mapping is correct."""
    ...

@pytest.mark.integration
def test_model_loading():
    """Test end-to-end model loading."""
    ...

NVIDIA Copyright

Add the following NVIDIA copyright header to all Python files and shell scripts. The header should appear at the top of the file:

# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron-Bridge Coding Guidelines

Style Guides We Follow

uv Guidelines

Use uv run instead of python

Python Coding Guidelines

Python Standard

Line Length

Indentation

Naming

Identifier Format

Identifier Guidelines

Imports

String Quotes

Comments

Docstring Syntax

Classes and Functions

Avoid Reflection

Error Handling

Type Hints

Configuration and Dataclasses

Model Bridge Guidelines

Parameter Mapping

Weight Conversion

Recipe Guidelines

Recipe Structure

Recipe Naming

Documentation Guidelines

Ensure docs/index.md is up to date

Documentation Requirements

Testing Guidelines

Unit Tests

Functional Tests

Test Markers

NVIDIA Copyright

FilesExpand file tree

CODING_GUIDELINES.md

Latest commit

History

CODING_GUIDELINES.md

File metadata and controls

Megatron-Bridge Coding Guidelines

Style Guides We Follow

uv Guidelines

Use uv run instead of python

Python Coding Guidelines

Python Standard

Line Length

Indentation

Naming

Identifier Format

Identifier Guidelines

Imports

String Quotes

Comments

Docstring Syntax

Classes and Functions

Avoid Reflection

Error Handling

Type Hints

Configuration and Dataclasses

Model Bridge Guidelines

Parameter Mapping

Weight Conversion

Recipe Guidelines

Recipe Structure

Recipe Naming

Documentation Guidelines

Ensure docs/index.md is up to date

Documentation Requirements

Testing Guidelines

Unit Tests

Functional Tests

Test Markers

NVIDIA Copyright