Skip to content

Commit f6f3a30

Browse files
abhinavg4snowmanwwg
andcommitted
Refactor README.md and performance-summary.md for clarity and conciseness
- Simplified descriptions of Megatron Bridge and AutoModel paths in README.md. - Removed outdated comparison table to streamline content. - Updated performance-summary.md to generalize model references and improve clarity. Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com>
1 parent 7083f86 commit f6f3a30

File tree

2 files changed

+4
-32
lines changed

2 files changed

+4
-32
lines changed

README.md

Lines changed: 1 addition & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -99,18 +99,7 @@ uv run --group automodel torchrun --nproc-per-node=8 \
9999

100100
### Dual Training Paths
101101

102-
- **Megatron Bridge Path**
103-
- State-of-the-art performance optimizations (TFLOPs)
104-
- 🎯 Advanced parallelism: Tensor (TP), Context (CP), Data (DP), etc.
105-
- 📈 Near-linear scalability to thousands of nodes
106-
- 🔧 Production-ready recipes with optimized hyperparameters
107-
108-
- **AutoModel Path**
109-
- 🌐 PyTorch DTensor-native SPMD training
110-
- 🚀 Advanced parallelisms (TP, PP, etc.) coming soon!
111-
- 🔀 FSDP2-based Hybrid Sharding Data Parallelism (HSDP)
112-
- 📦 Sequence packing for efficient training
113-
- 🎨 Minimal ceremony with YAML-driven configs
102+
**Megatron Bridge** delivers maximum throughput and scalability with near-linear performance to thousands of nodes. **AutoModel** provides an easy on-ramp for experimentation and research with PyTorch-native SPMD training.
114103

115104
### Shared Capabilities
116105

@@ -164,23 +153,6 @@ DFM/
164153
├── examples/ # Example scripts and configs
165154
```
166155

167-
## 🎯 Choosing Your Path
168-
169-
| Feature | Megatron Bridge | AutoModel |
170-
|---------|-----------------|-----------|
171-
| **Best For** | Maximum scale (1000+ GPUs) | Flexibility & fast iteration |
172-
| **Parallelism** | 6D (TP, CP, DP, etc.) | FSDP2; (TP, SP, CP available soon) |
173-
| **HF Integration** | Via bridge/conversion | HF-native (via DTensor) |
174-
| **Checkpoint Format** | Megatron + HF export | HF-native (SafeTensors with DCP) |
175-
| **Learning Curve** | Steeper (more knobs) | Gentler (YAML-driven) |
176-
| **Performance** | Highest at scale | Excellent, pytorch-native |
177-
178-
**Recommendation**:
179-
- Start with **AutoModel** for quick prototyping and HF model compatibility
180-
- Move to **Megatron Bridge** when scaling to 100+ GPUs or need advanced parallelism
181-
- Use **both**: prototype with AutoModel, scale with Megatron Bridge!
182-
183-
184156
## 🤝 Contributing
185157

186158
We welcome contributions! Please see our Contributing Guide for details on:

docs/performance-summary.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
As part of the NVIDIA NeMo Framework, DFM, provides the most recent training techniques for training advanced generative AI models, such as model parallelization, optimized attention mechanisms, and more, to achieve high training throughput.
44

5-
This page provides the current performance benchmarks for large language models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
5+
This page provides the current performance benchmarks for models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
66

77
## Nomenclature
88

@@ -29,9 +29,9 @@ Performance is measured using:
2929
:depth: 2
3030
```
3131

32-
## Performance Summary for Large Language Models
32+
## Performance Summary for Models
3333

34-
Below are performance benchmarks for various large language models organized by release version.
34+
Below are performance benchmarks for various models using DFM framework.
3535

3636
The performance data includes:
3737

0 commit comments

Comments
 (0)