You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: torchtitan/experiments/torchcomms/README.md
+26-5Lines changed: 26 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,10 +25,31 @@ Locally tested with:
25
25
-**FSDP** (`fully_shard`) - Fully Sharded Data Parallel
26
26
-**TP** - Tensor Parallelism
27
27
-**PP** - Pipeline Parallelism
28
-
-**CP** - Context Parallelism
28
+
-**EP** - Expert Parallelism
29
+
-**compile** - `torch.compile` integration
29
30
30
-
### Roadmap
31
+
### Performance
31
32
32
-
-[ ] Add N-D parallelism E2E perf and convergence tests
33
-
-[ ] Integrated and tested with Expert Parallelism
34
-
-[ ] Integration and testing with `torch.compile`
33
+
**Setup**: Similar setting as [docs/converging.md](../../docs/converging.md) based on [torchtitan/models/llama3/train_configs/llama3_8b.toml](../torchtitan/models/llama3/train_configs/llama3_8b.toml), but `training.local_batch_size = 1`
34
+
35
+
| Run Name | Parallelism | Distributed Library | Remarks |
-**CP** (Context Parallelism) - Temporly not working
50
+
-**Memory Overhead** - TorchComms requires higher peak memory usage. As a workaround, we need to reduce `local_batch_size` to avoid out of memory error.
51
+
52
+
## Roadmap
53
+
54
+
-[ ] Add N-D parallelism end-to-end performance and convergence tests
55
+
- Test with additional models: DeepSeek-V3, Qwen3, Llama4, etc. on large scale
0 commit comments