Skip to content

Add CI test for fallback allgather, allreduce, broadcastand reducescatter to NCCL operations#485

Merged
Binyang2014 merged 7 commits intomainfrom
qinghuazhou/nccl-rccl-integration-ci
Mar 27, 2025
Merged

Add CI test for fallback allgather, allreduce, broadcastand reducescatter to NCCL operations#485
Binyang2014 merged 7 commits intomainfrom
qinghuazhou/nccl-rccl-integration-ci

Conversation

@seagater
Copy link
Contributor

Add CI test for fallback allgather, allreduce, broadcast, and reducescatter to NCCL operations
Test following parameters:
-x MSCCLPP_ENABLE_NCCL_FALLBACK=TRUE
-x MSCCLPP_NCCL_LIB_PATH=/path_to_nccl/nccl/build/lib/libnccl.so
-x MSCCLPP_FORCE_NCCL_FALLBACK_OPERATION="allgather, allreduce, broadcast, reducescatter" or "all"

@seagater seagater requested a review from Binyang2014 March 20, 2025 20:11
@Binyang2014
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

… that CUDA is installed in the /usr/local/cuda
@Binyang2014
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link
Contributor

@Binyang2014 Binyang2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it simple, just run test with/without fallback nccl. For example for allgather, just run test with MSCCLPP_FORCE_NCCL_FALLBACK_OPERATION="allgather" and MSCCLPP_FORCE_NCCL_FALLBACK_OPERATION="other_collective". Other cases we can cover it via unit-test

@Binyang2014
Copy link
Contributor

Another thing, can we add document for these env variables?

@Binyang2014
Copy link
Contributor

/azp run

@Binyang2014 Binyang2014 enabled auto-merge (squash) March 27, 2025 21:09
@Binyang2014 Binyang2014 merged commit 0f21ed4 into main Mar 27, 2025
14 checks passed
@Binyang2014 Binyang2014 deleted the qinghuazhou/nccl-rccl-integration-ci branch March 27, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants