Refactor README.md and performance-summary.md for clarity and conciseness

abhinavg4 · snowmanwwg · abhinavg4 · commit f6f3a303ed3e · 2025-12-01T12:53:23.000Z
- Simplified descriptions of Megatron Bridge and AutoModel paths in README.md.
- Removed outdated comparison table to streamline content.
- Updated performance-summary.md to generalize model references and improve clarity.

Co-authored-by: Wenwen Gao &lt;94138584+snowmanwwg@users.noreply.github.com&gt;
diff --git a/README.md b/README.md
@@ -99,18 +99,7 @@ uv run --group automodel torchrun --nproc-per-node=8 \
 
 ### Dual Training Paths
 
-- **Megatron Bridge Path**
-  -  State-of-the-art performance optimizations (TFLOPs)
-  - 🎯 Advanced parallelism: Tensor (TP), Context (CP), Data (DP), etc.
-  - 📈 Near-linear scalability to thousands of nodes
-  - 🔧 Production-ready recipes with optimized hyperparameters
-
-- **AutoModel Path**
-  - 🌐 PyTorch DTensor-native SPMD training
-  - 🚀 Advanced parallelisms (TP, PP, etc.) coming soon!
-  - 🔀 FSDP2-based Hybrid Sharding Data Parallelism (HSDP)
-  - 📦 Sequence packing for efficient training
-  - 🎨 Minimal ceremony with YAML-driven configs
+**Megatron Bridge** delivers maximum throughput and scalability with near-linear performance to thousands of nodes. **AutoModel** provides an easy on-ramp for experimentation and research with PyTorch-native SPMD training.
 
 ### Shared Capabilities
 
@@ -164,23 +153,6 @@ DFM/
 ├── examples/                      # Example scripts and configs
 ```
 
-## 🎯 Choosing Your Path
-
-| Feature | Megatron Bridge | AutoModel |
-|---------|-----------------|-----------|
-| **Best For** | Maximum scale (1000+ GPUs) | Flexibility & fast iteration |
-| **Parallelism** | 6D (TP, CP, DP, etc.) | FSDP2; (TP, SP, CP available soon) |
-| **HF Integration** | Via bridge/conversion | HF-native (via DTensor) |
-| **Checkpoint Format** | Megatron + HF export | HF-native (SafeTensors with DCP) |
-| **Learning Curve** | Steeper (more knobs) | Gentler (YAML-driven) |
-| **Performance** | Highest at scale | Excellent, pytorch-native |
-
-**Recommendation**:
-- Start with **AutoModel** for quick prototyping and HF model compatibility
-- Move to **Megatron Bridge** when scaling to 100+ GPUs or need advanced parallelism
-- Use **both**: prototype with AutoModel, scale with Megatron Bridge!
-
-
 ## 🤝 Contributing
 
 We welcome contributions! Please see our Contributing Guide for details on:
diff --git a/docs/performance-summary.md b/docs/performance-summary.md
@@ -2,7 +2,7 @@
 
 As part of the NVIDIA NeMo Framework, DFM, provides the most recent training techniques for training advanced generative AI models, such as model parallelization, optimized attention mechanisms, and more, to achieve high training throughput.
 
-This page provides the current performance benchmarks for large language models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
+This page provides the current performance benchmarks for models using DFM across different GPU systems and configurations as we continue to optimize the model for optimal performance. Please refer to `examples/megatron/recipes/wan/conf` for updated YAML configurations.
 
 ## Nomenclature
 
@@ -29,9 +29,9 @@ Performance is measured using:
 :depth: 2
 ```
 
-## Performance Summary for Large Language Models
+## Performance Summary for Models
 
-Below are performance benchmarks for various large language models organized by release version.
+Below are performance benchmarks for various models using DFM framework.
 
 The performance data includes: