Skip to content

Commit 7209894

Browse files
authored
[TorchComms] Update readme (#1913)
1 parent 951f6ff commit 7209894

File tree

2 files changed

+26
-5
lines changed

2 files changed

+26
-5
lines changed

torchtitan/experiments/torchcomms/README.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,31 @@ Locally tested with:
2525
- **FSDP** (`fully_shard`) - Fully Sharded Data Parallel
2626
- **TP** - Tensor Parallelism
2727
- **PP** - Pipeline Parallelism
28-
- **CP** - Context Parallelism
28+
- **EP** - Expert Parallelism
29+
- **compile** - `torch.compile` integration
2930

30-
### Roadmap
31+
### Performance
3132

32-
- [ ] Add N-D parallelism E2E perf and convergence tests
33-
- [ ] Integrated and tested with Expert Parallelism
34-
- [ ] Integration and testing with `torch.compile`
33+
**Setup**: Similar setting as [docs/converging.md](../../docs/converging.md) based on [torchtitan/models/llama3/train_configs/llama3_8b.toml](../torchtitan/models/llama3/train_configs/llama3_8b.toml), but `training.local_batch_size = 1`
34+
35+
| Run Name | Parallelism | Distributed Library | Remarks |
36+
| ----------- | ------------------ | ------------------- | --------------------- |
37+
| (dist)DP8 | FSDP 8 | c10d.distributed | Baseline |
38+
| DP8 | FSDP 8 | torchcomms | 1D test set |
39+
| DP8_CP2_TP4 | FSDP 8, TP 4, CP 2 | torchcomms | 3D test set |
40+
| DP8_CP8 | FSDP 8, CP 8 | torchcomms | CP with larger degree |
41+
42+
**Results**:
43+
44+
![Loss Curves](./asserts/images/loss_curves.png)
45+
46+
47+
### Known Issues
48+
49+
- **CP** (Context Parallelism) - Temporly not working
50+
- **Memory Overhead** - TorchComms requires higher peak memory usage. As a workaround, we need to reduce `local_batch_size` to avoid out of memory error.
51+
52+
## Roadmap
53+
54+
- [ ] Add N-D parallelism end-to-end performance and convergence tests
55+
- Test with additional models: DeepSeek-V3, Qwen3, Llama4, etc. on large scale
153 KB
Loading

0 commit comments

Comments
 (0)