Releases: uxlfoundation/oneCCL
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17.2
What's New 2021.17.2:
- Added support for single process and multiple threads for allreduce, allgatherv, reduce_scatter on BMG
- Fixed performance issues for allgatherv
- Fixed a bug in comm split
- Fixed a bug in allgatherv with inplace operation
Note: Previous 2021.17.1 release is only available via binary distribution channels and fixes compatibility issues with manylinux 2-28 platform standard. No code changes are present in it.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.7
This ccl_2021.15.7-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- allreduce LL chunking to workaround the hardware bug
- fixes for sub-communicators for allreduced and pt2pt
- CCL benchmark now prints both alg and bus bandwidth
- fixes for LL flag overflow, which may happen to a long running workload (stress test)
- fixes for small GPU memory leak
- applying chunking in Allgather to reduce contention
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Attached binaries:
2021.15.7.6 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler
2021.15.7.8 package is built using 2025.3.2 version of Intel® oneAPI DPC++/C++ Compiler
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17
What's New 2021.17:
- New API: Technical preview of NCCL* like API alignment with an addition of onecclcommDestroy, onecclGetErrorstring, and onecclGetLastError APIs
- Support for single process and multiple threads: Currently supporting Allgather, Allreduce, Alltoall, ReduceScatter, Broadcast, pt2pt and Group API for scale up
- Added Operations: Added support for user defined reduction operations for scale up and extended group API to also support pt2pt operations.
- Improved Performance: Allgather optimizations for large messages for scale out up to 8 nodes
- Support for BMG: Added BMG support, for now only available on the opensource
- Bug fixes and performance optimizations
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.6
This ccl_2021.15.6-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- Bug Fixes
- Add implementation for ofi barrier to optimize the CCL barrier in OFI transport
- Applying chunking in Allgather scale-up (LL protocol)
- Code refactoring
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Attached binaries:
2021.15.6.2 package is built using 2025.0.0 version of Intel® oneAPI DPC++/C++ Compiler
2021.15.6.9 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.2
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.5
This ccl_2021.15.5-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
This release introduces bug fixes and refactoring, along with new implementations for Alltoall LL and one-way RDMA send-receive functionalities.
The cmake command is the same as before:
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.1
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.4
This ccl_2021.15.4-arc branch introduces several enhancements for Intel ARC A and B Series GPU:
- Support for Reduce-Scatter and Point-To-Point in addition to previously enabled Allreduce and Allgather
- Support for 8 bit datatypes (int8, uint8)
- Bug fixes, including removal of previously required setting of IGC_VISAOptions=-activeThreadsOnlyBarrier, which is no longer needed.
The cmake command is the same as before:
make .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16
What's New 2021.16:
- Added SYCL graph support for Record and Replay for Allgather, Allreduce, Alltoall, ReduceScatter and Broadcast
- Added SYCL-based implementation of ring algorithm for Allgather
- Added SYCL-based implementation for Broadcast
- Added multithread support for Allgather and ReduceScatter scale up impementation
- Added attribute in the communicator to specify blocking operations for CPU
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.3
This ccl_2021.15.3-arc branch adds support for Intel ARC A and B Series GPU and some bug fixes.
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
If the system does not have GPU Peer-to-Peer (P2P) support, you will need to add this compiler environment flag (export IGC_VISAOptions=-activeThreadsOnlyBarrier) before compiling. Similarly, on a system without P2P support, add export IGC_VISAOptions=-activeThreadsOnlyBarrier to your command line before running the application.