The latest version of the nccl-tests.Dockerfile based on NCCL 2.27.7 has a severe performance degradation compared with previous version based on NCCL 2.27.5. Please update to a newer version to avoid NCCL 2.27.7
The 10% performance degradation is observed while running on the same set of two P6-B200 nodes with AWS Batch/ECS using 16 GPUs. The tests are done with the images https://gallery.ecr.aws/hpc-cloud/nccl-tests with tags cuda12.8.1-efa1.42.0-ofiv1.16.0-ncclv2.27.5-1-testsv2.16.4 and cuda12.8.1-efa1.43.2-ofiv1.16.3-ncclv2.27.7-1-testsv2.16.9
