Skip to content

Conversation

@allenwang28
Copy link
Contributor

@allenwang28 allenwang28 commented Sep 23, 2025

From @LucasLLC suggestion, this allows multi-threaded DCP write by setting single_file_per_rank = False and num_threads = 8.

For the 1.7B model, this improves the push time from ~10s to ~3s.

When we disable crc32, that pushes down the push time even more to ~2s.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 23, 2025
Copy link
Contributor

@LucasLLC LucasLLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next for 0-overhead, make it async

@allenwang28 allenwang28 merged commit 62c878a into meta-pytorch:main Sep 23, 2025
5 checks passed
@allenwang28 allenwang28 deleted the dcp_speedup branch September 23, 2025 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants