Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 32 additions & 26 deletions bionemo-recipes/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# BioNeMo Recipes

BioNeMo Recipes provides an easy path for the biological foundation model training community to scale up transformer-based models efficiently. Rather than offering a batteries-included training framework, we provide **model checkpoints** with TransformerEngine (TE) layers and **training recipes** that demonstrate how to achieve maximum throughput with popular open-source frameworks and fully sharded data parallel (FSDP) scale-out.
BioNeMo Recipes provides an easy path for the biological foundation model training community to scale up transformer-based models efficiently. Rather than offering a batteries-included training framework, BioNeMo Recipes provide **model checkpoints** with TransformerEngine (TE) layers and **training recipes** that demonstrate how to achieve maximum throughput with popular open-source frameworks and fully sharded data parallel (FSDP) scale-out.

## Overview

The biological AI community is actively prototyping model architectures and needs tooling that prioritizes extensibility, interoperability, and ease-of-use alongside performance. BioNeMo Recipes addresses this by offering:
The biological AI community actively prototypes model architectures and needs tooling that prioritizes extensibility, interoperability, and ease-of-use, alongside performance. BioNeMo Recipes addresses this by offering:

- **Flexible scaling**: Scale from single-GPU prototyping to multi-node training without complex parallelism configurations
- **Flexible scaling**: Scales from single-GPU prototyping to multi-node training without complex parallelism configurations
- **Framework compatibility**: Works with popular frameworks like HuggingFace Accelerate, PyTorch Lightning, and vanilla PyTorch
- **Performance optimization**: Leverages TransformerEngine and megatron-FSDP for state-of-the-art training efficiency
- **Research-friendly**: Hackable, readable code that researchers can easily adapt for their experiments
- **Research-friendly**: Contains hackable and readable code that researchers can easily adapt for their experiments

### Performance Benchmarks

Expand All @@ -21,6 +21,8 @@ The biological AI community is actively prototyping model architectures and need

### Use Cases

The use cases of BioNeMO Recipes include:

- **Foundation Model Developers**: AI researchers and ML engineers developing novel biological foundation models who need to scale up prototypes efficiently
- **Foundation Model Customizers**: Domain scientists looking to fine-tune existing models with proprietary data for drug discovery and biological research

Expand Down Expand Up @@ -48,9 +50,9 @@ Abbreviations:
- BF16: [brain-float 16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format), a common 16 bit float format for deep learning.
- FP8<sup>[1]</sup>: [8-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html), a compact format for weights allowing for faster training and inference.
- MXFP8<sup>[2]</sup>: [Multi Scale 8-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html), as compact as FP8 but with better numerical precision.
- NVFP4<sup>[2]</sup>: [NVIDIA 4-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#Beyond-FP8---training-with-NVFP4), faster than FP8, retaining accuracy via multi-scale.
- THD: **T**otal **H**eads **D**imension, also known as ["sequence packing"](https://docs.nvidia.com/nemo-framework/user-guide/24.07/nemotoolkit/features/optimizations/sequence_packing.html#sequence-packing-for-sft-peft). A way to construct a batch with sequences of different length so there are no pads, therefore no compute is wasted on computing attention for padding tokens. This is in contrast to **B**atch **S**equence **H**ead **D**imension (BSHD) format, which uses pads to create a rectangular batch.
- CP: Context parallel, also known as sequence parallel. A way to distribute the memory required to process long sequences across multiple GPUs. For more information please see [context parallel](./recipes/context_parallel.md)
- NVFP4<sup>[2]</sup>: [NVIDIA 4-bit floating point](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#Beyond-FP8---training-with-NVFP4), faster than FP8, retaining accuracy using multi-scale.
- THD: **T**otal **H**eads **D**imension, also known as ["sequence packing"](https://docs.nvidia.com/nemo-framework/user-guide/24.07/nemotoolkit/features/optimizations/sequence_packing.html#sequence-packing-for-sft-peft). A way to construct a batch with sequences of different lengths so there are no pads, which results in no compute wasted on computing attention for padding tokens. This is in contrast to **B**atch **S**equence **H**ead **D**imension (BSHD) format, which uses pads to create a rectangular batch.
- CP: Context parallel, also known as sequence parallel. A way to distribute the memory required to process long sequences across multiple GPUs. For more information, refer to [context parallel](./recipes/context_parallel.md)

\[1\]: Requires [compute capability](https://developer.nvidia.com/cuda-gpus) 9.0 and above (Hopper+) <br/>
\[2\]: Requires [compute capability](https://developer.nvidia.com/cuda-gpus) 10.0 and 10.3 (Blackwell), 12.0 support pending <br/>
Expand All @@ -63,7 +65,7 @@ This repository contains two types of components:

Huggingface-compatible `PreTrainedModel` classes that use TransformerEngine layers internally. These are designed to be:

- **Distributed via Hugging Face Hub**: Pre-converted checkpoints available at [huggingface.co/nvidia](https://huggingface.co/nvidia)
- **Distributed through Hugging Face Hub**: Pre-converted checkpoints available at [huggingface.co/nvidia](https://huggingface.co/nvidia)
- **Drop-in replacements**: Compatible with `AutoModel.from_pretrained()` without additional dependencies
- **Performance optimized**: Leverage TransformerEngine features like FP8 training and context parallelism

Expand All @@ -82,7 +84,11 @@ Recipes are **not pip-installable packages** but serve as reference implementati

## Quick Start

### Using Models
This section describe how you can get started with BioNeMo Recipes.

### Loading Models

Run the following to load the BioNeMo model.

```python
from transformers import AutoModel, AutoTokenizer
Expand All @@ -94,6 +100,8 @@ tokenizer = AutoTokenizer.from_pretrained("nvidia/AMPLIFY_120M")

### Running Recipes

Build and run recipes with the following.

```bash
# Navigate to a recipe
cd recipes/esm2_native_te_mfsdp
Expand All @@ -103,13 +111,9 @@ docker build -t esm2_recipe .
docker run --rm -it --gpus all esm2_recipe python train.py
```

______________________________________________________________________
## Setting Up the Development Environment

## Developer Guide

### Setting Up Development Environment

1. **Install pre-commit hooks:**
1. Install pre-commit hooks:

```bash
pre-commit install
Expand All @@ -130,9 +134,9 @@ ______________________________________________________________________
docker run --rm -it --gpus all my_tag pytest -v .
```

### Coding Guidelines
## Coding Guidelines

We prioritize **readability and simplicity** over comprehensive feature coverage:
BioNeMo Recipes prioritize **readability and simplicity** over comprehensive feature coverage:

- **KISS (Keep It Simple) over DRY (Don't Repeat Yourself)**: It's better to have clear, duplicated code than complex
abstractions
Expand All @@ -141,7 +145,7 @@ We prioritize **readability and simplicity** over comprehensive feature coverage

### Testing Strategy

We use a three-tier testing approach:
BioNeMo Reciptes use a three-tier testing approach:

#### L0 Tests (Pre-merge)

Expand All @@ -166,9 +170,11 @@ We use a three-tier testing approach:

### Adding New Components

With BioNeMo Recipes, you can add new components including models and recipes.

#### Adding a New Model

Models should be pip-installable packages that can export checkpoints to Hugging Face. See the
Models should be pip-installable packages that can export checkpoints to Hugging Face. Refer to the
[models README](models/README.md) for detailed guidelines on:

- Package structure and conventions
Expand All @@ -178,7 +184,7 @@ Models should be pip-installable packages that can export checkpoints to Hugging

#### Adding a New Recipe

Recipes should be self-contained Docker environments demonstrating specific training patterns. See
Recipes should be self-contained Docker environments demonstrating specific training patterns. Refer to
the [recipes README](recipes/README.md) for guidance on:

- Directory structure and naming
Expand Down Expand Up @@ -209,14 +215,14 @@ We aim to provide the fastest available training implementations for biological

## Contributing

We welcome contributions that advance the state of biological foundation model training. Please ensure your contributions:
We welcome contributions that advance the state of biological foundation model training. Ensure your contributions:

1. Follow our coding guidelines emphasizing clarity
2. Include appropriate tests (L0 minimum, L1/L2 as applicable)
3. Provide clear documentation and examples
4. Maintain compatibility with our supported frameworks
- Follow our coding guidelines emphasizing clarity
- Include appropriate tests (L0 minimum, L1/L2 as applicable)
- Provide clear documentation and examples
- Maintain compatibility with our supported frameworks

For detailed contribution guidelines, see our individual component READMEs:
For detailed contribution guidelines, refer to our individual component READMEs:

- [Models Development Guide](models/README.md)
- [Recipes Development Guide](recipes/README.md)
Expand Down
10 changes: 5 additions & 5 deletions bionemo-recipes/models/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Models Directory

This directory contains HuggingFace-compatible model implementations that use TransformerEngine layers internally. These models are designed to be distributed via the Hugging Face Hub and serve as drop-in replacements for standard transformer models with enhanced performance.
This directory contains HuggingFace-compatible model implementations that use TransformerEngine layers internally. These models are designed to be distributed through the Hugging Face Hub and serve as drop-in replacements for standard transformer models with enhanced performance.

## Overview

Models in this directory are **not intended to be pip-installed directly**. Instead, they serve as:

1. **Reference implementations** of biological foundation models using TransformerEngine
2. **Conversion utilities** for transforming existing model checkpoints to TE-compatible format
3. **Export tools** for preparing model releases on the Hugging Face Hub
- **Reference implementations** of biological foundation models using TransformerEngine
- **Conversion utilities** for transforming existing model checkpoints to TE-compatible format
- **Export tools** for preparing model releases on the Hugging Face Hub

Users will typically interact with these models by loading pre-converted checkpoints directly from the Hugging Face Hub using standard transformers APIs.

Expand All @@ -33,7 +33,7 @@ To add a new model to this directory, you must provide:
#### 3. Checkpoint Export Script

- **`export.py`**: Script that packages all necessary files for Hugging Face Hub upload
- **Complete asset bundling**: Must include all required files (see [Export Requirements](#export-requirements))
- **Complete asset bundling**: Must include all required files, refer to [Export Requirements](#export-requirements)
- **Automated process**: Should be runnable with minimal manual intervention

#### 4. Open Source License
Expand Down
13 changes: 7 additions & 6 deletions bionemo-recipes/models/amplify/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# AMPLIFY Optimized with NVIDIA TransformerEngine

This folder contains source code and tests for an AMPLIFY model that inherits from the transformers `PreTrainedModel`
class and uses TransformerEngine layers. Users don't need to install this package directly, but can load the
model directly from HuggingFace Hub using the standard transformers API (see [Inference Examples](#inference-examples)
below).
class and uses TransformerEngine layers. Users do not need to install this package directly, but can load the
model directly from HuggingFace Hub using the standard transformers API. For more information, refer to [Inference Examples](#inference-examples).

## Feature support

Expand All @@ -18,7 +17,7 @@ The AMPLIFY implementation natively supports the following TransformerEngine-pro
| **Import from HuggingFace checkpoints** | ✅ Supported |
| **Export to HuggingFace checkpoints** | 🚧 Under development |

See [BioNeMo Recipes](../../recipes/README.md) for more details on how to use these features to accelerate model
Refer to [BioNeMo Recipes](../../recipes/README.md) for more details on how to use these features to accelerate model
training and inference.

## Links to HF checkpoints
Expand All @@ -34,7 +33,7 @@ Pre-trained AMPLIFY models are available on HuggingFace as part of the NVIDIA
## Runtime Requirements

We recommend using the latest [NVIDIA PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
for optimal performance and compatibility. See the provided Dockerfile for details.
for optimal performance and compatibility. Refer to the provided Dockerfile for details.

## Inference Examples

Expand All @@ -61,7 +60,7 @@ output = model(**inputs)
## Recipe Links

Training recipes are available in the `bionemo-recipes/recipes/` directory. AMPLIFY can be trained using the same
recipes as ESM-2, simply by switching the model_tag to reference the AMPLIFY model, e.g. `nvidia/AMPLIFY_120M`, and
recipes as ESM-2, simply by switching the model_tag to reference the AMPLIFY model, such as `nvidia/AMPLIFY_120M`, and
changing the dataset as appropriate.

- **[esm2_native_te](../../recipes/esm2_native_te/)** - Demonstrates training with a simple native PyTorch training
Expand Down Expand Up @@ -118,3 +117,5 @@ Or, upload all models at once with:
```bash
for dir in *; do huggingface-cli upload nvidia/$(basename "$dir") "$dir/"; done
```

z
15 changes: 7 additions & 8 deletions bionemo-recipes/models/esm2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

This folder contains source code and tests for an ESM-2 model that inherits from the transformers `PreTrainedModel`
class and uses TransformerEngine layers. Users don't need to install this package directly, but can load the
model directly from HuggingFace Hub using the standard transformers API (see [Inference Examples](#inference-examples)
below).
model directly from HuggingFace Hub using the standard transformers API. For more information, refer to [Inference Examples](#inference-examples).

## Feature support

Expand All @@ -18,7 +17,7 @@ The ESM-2 implementation natively supports the following TransformerEngine-provi
| **Import from HuggingFace checkpoints** | ✅ Supported |
| **Export to HuggingFace checkpoints** | ✅ Supported |

See [BioNemo Recipes](../../recipes/README.md) for more details on how to use these features to accelerate model
Refer to [BioNemo Recipes](../../recipes/README.md) for more details on how to use these features to accelerate model
training and inference.

## Links to HF checkpoints
Expand All @@ -38,7 +37,7 @@ Pre-trained ESM-2 models converted from the original Facebook weights are availa
## Runtime Requirements

We recommend using the latest [NVIDIA PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
for optimal performance and compatibility. See the provided Dockerfile for details.
for optimal performance and compatibility. Refer to the provided Dockerfile for details.

## Inference Examples

Expand Down Expand Up @@ -101,7 +100,7 @@ hf_model = convert_esm_te_to_hf(te_model)
hf_model.save_pretrained("/path/to/hf_checkpoint")
```

Load and Test the Exported Model
### Loading and Testing the Exported Model

Load the exported model and perform validation:

Expand All @@ -114,8 +113,8 @@ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")

### Validating Converted Models

See the commands in [Inference Examples](#inference-examples) above to load and test both the original and converted
models to ensure loss and logit values are similar. See also the golden value tests in
To validate the converted models, refer to the commands in [Inference Examples](#inference-examples) above to load and test both the original and converted
models to ensure loss and logit values are similar. Additionally, refer to the golden value tests in
[test_modeling_esm_te.py](tests/test_modeling_esm_te.py) and [test_convert.py](tests/test_convert.py).

## Developer Guide
Expand Down Expand Up @@ -153,7 +152,7 @@ Now deploy the converted checkpoints to the HuggingFace Hub by running the follo
huggingface-cli upload nvidia/${MODEL_NAME} $PWD/checkpoint_export/${MODEL_NAME}
```

Or, upload all models at once with:
You can also upload all models at once with:

```bash
cd checkpoint_export
Expand Down
Loading