Skip to content

Conversation

@KeitaW
Copy link
Contributor

@KeitaW KeitaW commented Jan 6, 2026

Updated CUDA, EFA, AWS OFI NCCL, NCCL, and NCCL tests versions in the Dockerfile.

Update EFA installer to 1.45.1 which supports https://github.com/aws/aws-ofi-nccl/releases/tag/v1.17.2

Upgrade to libnccl-ofi 1.17.2 (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-changelog.html)
EFA_INSTALLER_VERSION=1.45.1
AWS_OFI_NCCL_VERSION=1.17.2

Update NCCL to v2.28.7-1, the latest version supported by aws-ofi-nccl 1.17.2.

https://github.com/aws/aws-ofi-nccl/releases

The 1.17.2 release series supports NCCL v2.28.7-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).
With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions up to Libfabric v2.3.1amzn1.0.

Update CUDA to 12.9.1

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ -> Provides https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/libnccl2_2.28.7-1+cuda12.9_amd64.deb

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Updated CUDA, EFA, AWS OFI NCCL, NCCL, and NCCL tests versions in the Dockerfile.

Update EFA installer to 1.45.1 which supports https://github.com/aws/aws-ofi-nccl/releases/tag/v1.17.2
> Upgrade to libnccl-ofi 1.17.2 (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-changelog.html)
EFA_INSTALLER_VERSION=1.45.1
AWS_OFI_NCCL_VERSION=1.17.2 

Update NCCL to v2.28.7-1, the latest version supported by aws-ofi-nccl 1.17.2. 

https://github.com/aws/aws-ofi-nccl/releases
> The 1.17.2 release series supports NCCL v2.28.7-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).
> With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions up to Libfabric v2.3.1amzn1.0.

Update CUDA to 12.9.1

https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/
-> Provides https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/libnccl2_2.28.7-1+cuda12.9_amd64.deb
@KeitaW KeitaW requested review from amanshanbhag, paragao and pbelevich and removed request for amanshanbhag January 6, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant