Skip to content

Releases: uxlfoundation/oneCCL

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17.2

04 Feb 15:27
6649993

Choose a tag to compare

What's New 2021.17.2:

  • Added support for single process and multiple threads for allreduce, allgatherv, reduce_scatter on BMG
  • Fixed performance issues for allgatherv
  • Fixed a bug in comm split
  • Fixed a bug in allgatherv with inplace operation

Note: Previous 2021.17.1 release is only available via binary distribution channels and fixes compatibility issues with manylinux 2-28 platform standard. No code changes are present in it.

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.7

18 Dec 00:07
c570491

Choose a tag to compare

This ccl_2021.15.7-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • allreduce LL chunking to workaround the hardware bug
  • fixes for sub-communicators for allreduced and pt2pt
  • CCL benchmark now prints both alg and bus bandwidth
  • fixes for LL flag overflow, which may happen to a long running workload (stress test)
  • fixes for small GPU memory leak
  • applying chunking in Allgather to reduce contention

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Attached binaries:

2021.15.7.6 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler

2021.15.7.8 package is built using 2025.3.2 version of Intel® oneAPI DPC++/C++ Compiler

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.17

04 Dec 10:56
93f2621

Choose a tag to compare

What's New 2021.17:

  • New API: Technical preview of NCCL* like API alignment with an addition of onecclcommDestroy, onecclGetErrorstring, and onecclGetLastError APIs
  • Support for single process and multiple threads: Currently supporting Allgather, Allreduce, Alltoall, ReduceScatter, Broadcast, pt2pt and Group API for scale up
  • Added Operations: Added support for user defined reduction operations for scale up and extended group API to also support pt2pt operations.
  • Improved Performance: Allgather optimizations for large messages for scale out up to 8 nodes
  • Support for BMG: Added BMG support, for now only available on the opensource
  • Bug fixes and performance optimizations

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.6

22 Oct 09:23
0a67730

Choose a tag to compare

This ccl_2021.15.6-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • Bug Fixes
  • Add implementation for ofi barrier to optimize the CCL barrier in OFI transport
  • Applying chunking in Allgather scale-up (LL protocol)
  • Code refactoring

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Attached binaries:

2021.15.6.2 package is built using 2025.0.0 version of Intel® oneAPI DPC++/C++ Compiler

2021.15.6.9 package is built using 2025.2.0 version of Intel® oneAPI DPC++/C++ Compiler

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.2

24 Sep 11:29
4f1449d

Choose a tag to compare

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.5

23 Sep 09:08
52eee8d

Choose a tag to compare

This ccl_2021.15.5-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

This release introduces bug fixes and refactoring, along with new implementations for Alltoall LL and one-way RDMA send-receive functionalities.

The cmake command is the same as before:

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16.1

01 Sep 11:53
f588098

Choose a tag to compare

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.4

31 Jul 17:44
57a9306

Choose a tag to compare

This ccl_2021.15.4-arc branch introduces several enhancements for Intel ARC A and B Series GPU:

  • Support for Reduce-Scatter and Point-To-Point in addition to previously enabled Allreduce and Allgather
  • Support for 8 bit datatypes (int8, uint8)
  • Bug fixes, including removal of previously required setting of IGC_VISAOptions=-activeThreadsOnlyBarrier, which is no longer needed.

The cmake command is the same as before:

make .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.16

02 Jul 13:05
303b41a

Choose a tag to compare

What's New 2021.16:

  • Added SYCL graph support for Record and Replay for Allgather, Allreduce, Alltoall, ReduceScatter and Broadcast
  • Added SYCL-based implementation of ring algorithm for Allgather
  • Added SYCL-based implementation for Broadcast
  • Added multithread support for Allgather and ReduceScatter scale up impementation
  • Added attribute in the communicator to specify blocking operations for CPU
  • Bug fixes

Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.3

12 Jun 20:40
def8705

Choose a tag to compare

This ccl_2021.15.3-arc branch adds support for Intel ARC A and B Series GPU and some bug fixes.

An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1

An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1

If the system does not have GPU Peer-to-Peer (P2P) support, you will need to add this compiler environment flag (export IGC_VISAOptions=-activeThreadsOnlyBarrier) before compiling. Similarly, on a system without P2P support, add export IGC_VISAOptions=-activeThreadsOnlyBarrier to your command line before running the application.