You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor README.md and performance-summary.md for clarity and conciseness
- Simplified descriptions of Megatron Bridge and AutoModel paths in README.md.
- Removed outdated comparison table to streamline content.
- Updated performance-summary.md to generalize model references and improve clarity.
Co-authored-by: Wenwen Gao <94138584+snowmanwwg@users.noreply.github.com>
- 🔀 FSDP2-based Hybrid Sharding Data Parallelism (HSDP)
112
-
- 📦 Sequence packing for efficient training
113
-
- 🎨 Minimal ceremony with YAML-driven configs
102
+
**Megatron Bridge** delivers maximum throughput and scalability with near-linear performance to thousands of nodes. **AutoModel** provides an easy on-ramp for experimentation and research with PyTorch-native SPMD training.
114
103
115
104
### Shared Capabilities
116
105
@@ -164,23 +153,6 @@ DFM/
164
153
├── examples/ # Example scripts and configs
165
154
```
166
155
167
-
## 🎯 Choosing Your Path
168
-
169
-
| Feature | Megatron Bridge | AutoModel |
170
-
|---------|-----------------|-----------|
171
-
|**Best For**| Maximum scale (1000+ GPUs) | Flexibility & fast iteration |
Copy file name to clipboardExpand all lines: docs/performance-summary.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
As part of the NVIDIA NeMo Framework, DFM, provides the most recent training techniques for training advanced generative AI models, such as model parallelization, optimized attention mechanisms, and more, to achieve high training throughput.
4
4
5
-
This page provides the current performance benchmarks for large language models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
5
+
This page provides the current performance benchmarks for models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
6
6
7
7
## Nomenclature
8
8
@@ -29,9 +29,9 @@ Performance is measured using:
29
29
:depth: 2
30
30
```
31
31
32
-
## Performance Summary for Large Language Models
32
+
## Performance Summary for Models
33
33
34
-
Below are performance benchmarks for various large language models organized by release version.
34
+
Below are performance benchmarks for various models using DFM framework.
0 commit comments