Skip to content

Conversation

@brownbaerchen
Copy link
Contributor

The main advantage of this is that the NCCL communicator will not synchronise the stream before Allreduce, which it will do before allreduce. So this increases performance on GPUs a little bit.

@pancetta pancetta merged commit 2f84eda into Parallel-in-Time:master Oct 25, 2024
90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants