Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Oct 29, 2025

The switch from tree to bruck between 512 and 1023 processes leads to unexpected latency changes in benchmarks of other collectives. We should be consistent here. There is no good reason for why bruck would perform better in that range but not beyond.

Backport of #11023 to v6.0.x

The switch from tree to bruck between 512 and 1023 processes leads
to unexpected latency changes in benchmarks of other collectives.
We should be consistent here. There is no good reason for why bruck
would perform better in that range but not beyond.

Signed-off-by: Joseph Schuchart <[email protected]>
(cherry picked from commit 9bd7757)
@github-actions github-actions bot added this to the v6.0.0 milestone Oct 29, 2025
@bosilca
Copy link
Member

bosilca commented Oct 29, 2025

Honestly I am extremely cautious about this change, because it is not based on a valid experiment. I'm not saying that @devreal runs are not correct, I am saying that data is not enough to justify such a change because that timing is highly dependent on the network and the process placement and it's match (or mismatch) with the topology used in the barrier algorithm. I have yet to see an in depth analysis of such data related to this change.

@janjust
Copy link
Contributor

janjust commented Oct 30, 2025

@devreal since you opened the revert - do you want to close this?

@devreal
Copy link
Contributor Author

devreal commented Oct 30, 2025

The revert in #13483 should close this once merged into main. Closing it now. Please review #13483 so it can be merged too.

@devreal devreal closed this Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants