Skip to content

Commit 7af9601

Browse files
authored
cp: [docs] 25.11 release notes into r0.2.0 (#1518)
Signed-off-by: Ananth Subramaniam <[email protected]>
1 parent feaab97 commit 7af9601

File tree

5 files changed

+92
-50
lines changed

5 files changed

+92
-50
lines changed

CHANGELOG.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,63 @@
11
# Changelog
22

3+
## NVIDIA Megatron-Bridge 0.2.0
4+
5+
* Model Collection Support
6+
7+
* LLM
8+
* HuggingFace Conversion + training recipes:
9+
* GPT-oss
10+
* Qwen3 Next
11+
* Nemotron-H
12+
* Nemotron Nano v2
13+
* Moonlight
14+
* OlMoE
15+
* GLM 4.5
16+
* Gemma 3
17+
* HuggingFace conversion support:
18+
* Llama Nemotron
19+
* Mistral
20+
* Gemma
21+
* Gemma 2
22+
* VLM
23+
* Nemotron Nano v2 VL
24+
* Qwen 3 VL
25+
* Qwen2.5 VL
26+
* Gemma3 VL
27+
28+
* Performance
29+
* Megatron-Bridge support for new benchmarks
30+
* Benchmarks (same workloads as GB200 system) for GB300 system
31+
* GPT-OSS 120B
32+
* Qwen3-Next 80B_a3B
33+
* Support for linear attention on Blackwell - Gated Delta Networks
34+
* Pre-training with NVFP4 precision: Llama3 8B, Lama3 70B, Llama3.1 405B
35+
* Megatron-Bridge support for benchmarks previously existing only for NeMo 2.0
36+
* Nemotron-H 56B
37+
* Fine-tuning (SFT and LoRA): Llama3 8B and Llama3 70B
38+
* HybridEP: DeepSeek V3 benchmarks on GB200 and GB300 systems now use HybridEP
39+
* CUDA Graphs
40+
* Full-model iteration CUDA graph used for dense models- Llama3 8B, Llama3 70B, Llama3.1 405B
41+
* Fine-grained Transformer component specific CUDA Graphs used for MoE models
42+
43+
* NVIDIA Model Optimization Integration
44+
* Knowledge Distillation
45+
* Post training quantization export
46+
* quantization aware training
47+
48+
* Enhanced LoRA support
49+
* Support for expert layers
50+
* Supported merging adapters for export to HuggingFace
51+
52+
* Finetuning dataset improvements: OpenAI messages format conversion, chat template support
53+
* Integration with Tensor NVIDIA-DLFW-Inspect for tensor statistic collection & monitoring
54+
* Support for sample-based training
55+
56+
## NVIDIA Megatron-Bridge 0.1.0rc4
57+
58+
* Fix docs build
59+
* Update performance scripts
60+
361
## NVIDIA Megatron-Bridge 0.1.0rc3
462

563
* Model Collection Support

docs/releases/changelog.md

Lines changed: 3 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,4 @@
1-
# Changelog
2-
3-
## 25.09.01 NeMo Framework Container
4-
5-
- Fix docs build
6-
- Update performance scripts
7-
8-
## 25.09 NeMo Framework Container
9-
10-
### Model Collection Support
11-
12-
- Llama
13-
- Qwen 2, Qwen 3, Qwen 3 MoE
14-
- DeepSeek
15-
- Mamba
16-
- [Migration guide from Nemo 2 to Megatron Bridge](https://docs.nvidia.com/nemo/megatron-bridge/0.1.0/nemo2-migration-guide.html)
17-
- [Contribution guide for adding a new model](https://docs.nvidia.com/nemo/megatron-bridge/0.1.0/adding-new-models.html)
18-
- [Checkpoint conversion from Hugging Face to Megatron Bridge](https://docs.nvidia.com/nemo/megatron-bridge/0.1.0/bridge-guide.html#get-started-with-hugging-face-conversion)
19-
20-
### [Performance](https://docs.nvidia.com/nemo/megatron-bridge/0.1.0/performance-summary.html)
21-
22-
#### MoE LLM
23-
24-
- Change the model to dropless with balanced gating
25-
- Fusion of operators in router function
26-
- Global permutation fusion with A2A dispatcher
27-
- EP A2A communication overlap with computation in both 1F1B pipelining and non-pipelined training
28-
- Precision-aware optimizer update to support BF16 states
29-
30-
#### Megatron FSDP
31-
32-
- Migration from mcore FSDP to megatron FSDP
33-
- Fusion of weight gradient copy to reduce-scatter communication buffer to WGRAD GEMM
34-
- Removed redundant optimizer operations
35-
- Use Zero1 (opt and master param sharding) in the replica domain of hybrid FSDP to further lower memory usage
36-
- IB-SHARP support for the IB AllReduce of hybrid FSDP in a patch with NCCL2.28
37-
38-
#### MXFP8
39-
40-
- Improved act grad all-gather overlap performance via userbuffer
41-
- Parameter all-gather overlap with computation while the communication buffer sharing with reduce-scatter
42-
- Fusion of MXFP8 scaling factor swizzling kernels
43-
- Use PDL (Programmatic Dependent Launch) for quantization kernels to lower CPU overhead
44-
45-
#### Others
46-
47-
- Full iteration cuda graph for dense model without pipelining
48-
- Fusion of activation and cast (currently tensor-wise scaling only)
49-
- Store SwiGLU input in FP8 to save activation memory
1+
```{include} ../../CHANGELOG.md
2+
:relative-docs: docs/
3+
```
504

docs/releases/known-issues.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
This page lists known issues and limitations in the current release.
44

5+
## 25.11
6+
7+
- Deepseek V3 on H100 has an issue when using DeepEP and fails with `RuntimeError: DeepEP error: timeout (dispatch CPU)`.
8+
- MODEL_TFLOP/s/GPU is printed as 0 to stdout for all Hybrid models, such as Nemotron-H 56B.
9+
10+
511
## 25.09
612

713
- **Pretraining DeepSeek in subchannel FP8 precision is not working.** Pretraining DeepSeek with current scaling FP8 is a workaround, but MTP loss does not converge.

docs/releases/software-versions.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,29 @@
11
# Software Component Versions
22

3+
## NeMo Framework 25.11
4+
5+
| Software Component | Version |
6+
|-------------------|---------|
7+
| PyTorch | 2.9.0a0 |
8+
| Megatron Core | dev:0.15.0 |
9+
| Transformer Engine | 2.9 |
10+
| Megatron-Bridge | 0.2.0 |
11+
| Megatron-FSDP | 0.2.0 |
12+
| Export-Deploy | 0.3.0 |
13+
| Evaluator | 0.2.0 |
14+
| NeMo | 2.6.0 |
15+
| NeMo Run | 0.7.0 |
16+
| TRT-ModelOpt | 0.37.0 |
17+
| NVRX | 0.4.1 |
18+
| CUDA | 13.0.1 |
19+
| cuDNN | 9.13.1.26 |
20+
| TRT-LLM | 1.1.0a0 |
21+
22+
```{note}
23+
NVIDIA NeMo™ Framework Training container is built on top of NVIDIA Optimized Frameworks PyTorch 25.06 container: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
24+
```
25+
26+
327
## NeMo Framework 25.09
428

529
| Software Component | Version |

src/megatron/bridge/package_info.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
MAJOR = 0
1717
MINOR = 2
1818
PATCH = 0
19-
PRE_RELEASE = "rc7"
19+
PRE_RELEASE = ""
2020

2121
# Use the following formatting: (major, minor, patch, pre-release)
2222
VERSION = (MAJOR, MINOR, PATCH, PRE_RELEASE)

0 commit comments

Comments
 (0)