Skip to content

Computation communication overlap for DP#8

Draft
Conless wants to merge 2 commits intouw-syfi:mainfrom
Conless:dp-overlap
Draft

Computation communication overlap for DP#8
Conless wants to merge 2 commits intouw-syfi:mainfrom
Conless:dp-overlap

Conversation

@Conless
Copy link
Collaborator

@Conless Conless commented Jan 28, 2026

This PR implements a naive computation and communication overlapping for DP weight update.

Performance data:

$ python3 -m test.test_llama --iters 2 --dp_degree 2 --pp_degree 1 --num_stages 1 --schedule no-pp --model LLAMA_1B
Weight update time w/o overlap: 40.5ms
Weight update time w/ overlap: 34.1ms

Profiling with torch profiler shows that the communication time (after the first parameter group) in the non-overlap version is about 8ms, which means the communication is mostly overlapped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant