|
11 | 11 | ### Features: |
12 | 12 | ### Bugfixes: |
13 | 13 |
|
| 14 | +## 1.20.0 (December 2, 2025) |
| 15 | +### Features: |
| 16 | +#### UCP |
| 17 | + * Added new GPU device API for direct GPU-to-GPU communication |
| 18 | + * Added host API for GPU device management |
| 19 | + * Added device signaling API with cooperation levels and flags |
| 20 | + * Added API for working with offsets and channel id in device operations |
| 21 | + * Added method to write to local counter in device operations |
| 22 | + * Added local and remote address fields to memory list element in device API |
| 23 | + * Added device lane selection and allocated handle population |
| 24 | + * Added support for Direct NIC (DPU) data path with CUDA |
| 25 | + * Added rkey packing support for Direct NIC |
| 26 | + * Added sender flush mechanism when memory sys_dev differs from remote lane sys_dev |
| 27 | + * Added option to use single network device per protocol |
| 28 | + * Added MIN_RMA_CHUNK_SIZE configuration parameter |
| 29 | + * Decreased default value for MIN_RMA_CHUNK_SIZE from 16k to 8k |
| 30 | + * Improved protocol lane selection with find_lanes callback to minimize overhead |
| 31 | + * Improved send-zcopy latency factor for fast-completion cases |
| 32 | + * Improved multi-ppn performance estimation |
| 33 | + * Removed deprecated ucp_mem functions |
| 34 | + * Deprecated ucp_request_alloc API |
| 35 | +#### UCT |
| 36 | + * Added new device API for GPU communication (rc_gda transport) |
| 37 | + * Added GDAKI transport with endpoint export to GPU |
| 38 | + * Added DEVX QP/CQ support on foreign memory |
| 39 | + * Added device API implementation for CUDA_IPC transport |
| 40 | + * Added device put multi, put partial, and atomic operations for CUDA_IPC |
| 41 | + * Added peer failure error handling capability for GDAKI |
| 42 | + * Added check for nvidia_peermem driver when using GDA transport |
| 43 | + * Enabled Direct NIC by default for IB transport |
| 44 | + * Added XDR performance recognition |
| 45 | + * Added support for mapping DMA_BUF handle via PCIe for Direct NIC |
| 46 | + * Improved GDR_COPY performance with fast-path cache lookup |
| 47 | +#### RDMA CORE (IB, ROCE, etc.) |
| 48 | + * Added ConnectX-9 device support |
| 49 | + * Split dp_ordering flag for DV/DevX transports |
| 50 | + * Added VRF tables support for RoCE reachability check |
| 51 | + * Added EFA-specific GPUDirect support detection |
| 52 | +#### TCP |
| 53 | + * Added routing table check during reachability verification |
| 54 | +#### UCS |
| 55 | + * Introduced lightweight rwlock data structure |
| 56 | + * Added built-in atomics for rcache rwlock |
| 57 | + * Improved VFS symlink paths and duplicate object handling |
| 58 | + * Disabled error signal interception by default |
| 59 | +#### CUDA |
| 60 | + * Added wrappers for NVML functions |
| 61 | + * Added hook for cuLibraryGetGlobal |
| 62 | + * Improved CUDA call logging |
| 63 | + * Improved source/destination memory type detection for lane performance estimation |
| 64 | + * Removed unsafe usage of cuCtxGetId |
| 65 | + * Added support for cuCtxCreate_v4 for newer CUDA versions |
| 66 | + * Improved context management for CUDA_IPC operations |
| 67 | +#### UCM |
| 68 | + * Changed module info print to debug level by default |
| 69 | +#### Tools |
| 70 | + * Added GDAKI kernel option to perftest |
| 71 | + * Added UCP cuda device tests to perftest |
| 72 | + * Added MPI+CUDA example |
| 73 | + * Differentiated wakeup feature and extra info options in perftest |
| 74 | +#### Build |
| 75 | + * Added ability to build CUDA device code for supported architectures |
| 76 | + * Added ucx.spec into tarball for Universal Build System support |
| 77 | + * Added CUDA 13 support |
| 78 | + * Added GDA build failure when gpunetio not found |
| 79 | +#### Packaging |
| 80 | + * Moved driver level dependencies under Recommends section in Debian packages |
| 81 | + * Added Provides field for upstream packages in Debian |
| 82 | + * Migrated JUCX publish from OSSRH to Central Portal |
| 83 | + * Added ib-mlx5-gda separate package |
| 84 | +#### CI/Testing |
| 85 | + * Added Rocky OS support to release pipeline |
| 86 | + * Added RHEL 10 containers to build matrices |
| 87 | + * Added Debian 13 to CI build stage |
| 88 | + * Added ARM build testing |
| 89 | + * Switched to MOFED 25.07 |
| 90 | + * Switched GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image |
| 91 | + * Added support for nvidia_peermem module in testing |
| 92 | + * Disabled Valgrind in CI Tests stage |
| 93 | + * Disabled tag matching offload tests |
| 94 | +#### GO Bindings |
| 95 | + * Made go bindings thread safe |
| 96 | +#### Documentation |
| 97 | + * Added note about reachability check mode in README |
| 98 | + * Mentioned nvlink as supported transport |
| 99 | + * Documented return status for device APIs |
| 100 | +#### AWS EFA |
| 101 | + * Added RMA WRITE operations support |
| 102 | + * Added flush and fence operations for SRD |
| 103 | + * Enabled EFA SRD support in tests |
| 104 | +### Bugfixes: |
| 105 | +#### UCP |
| 106 | + * Fixed fallback to blocking registration for network device only |
| 107 | + * Fixed flush_state validity check before using it |
| 108 | + * Fixed single net dev filtering for single proto |
| 109 | + * Fixed rkey size estimation for rendezvous |
| 110 | + * Fixed memory invalidation without RNDV |
| 111 | + * Fixed gather_pending_requests to execute only when reconfig occurs |
| 112 | +#### UCT |
| 113 | + * Fixed CUDA_IPC protocol selection for cuda_ipc |
| 114 | + * Fixed GDA compilation issues |
| 115 | + * Fixed GDAKI wqe_idx overflow |
| 116 | + * Fixed MM FIFO room calculation for tail > head case |
| 117 | + * Fixed CUDA_IPC indices handling in put partial |
| 118 | + * Removed DOCA runtime dependency from GDAKI |
| 119 | + * Fixed GDA log spam by reducing DOCA log level |
| 120 | + * Fixed UAR support check when querying resources for GDA/MLX5 |
| 121 | + * Fixed crash in GGA transport when EXPORTED_MKEY flag is missing |
| 122 | +#### CUDA |
| 123 | + * Fixed stack overflow bug when calling cuPointerGetAttribute |
| 124 | + * Fixed mapping of DMA_BUF handle for Direct NIC |
| 125 | + * Returned object to mpool in case of failure in CUDA_COPY |
| 126 | + * Reduced log level of rkey unpacking failures |
| 127 | + * Handled cuMemRelease error status properly |
| 128 | + * Fixed context setting for local buffer in CUDA_IPC |
| 129 | + * Fixed host unregister error message (changed to diagnostic) |
| 130 | + * Fixed CUDA_IPC header installation |
| 131 | +#### RDMA CORE (IB, ROCE, etc.) |
| 132 | + * Fixed RoCE network device name reading |
| 133 | + * Fixed Direct NIC related issues |
| 134 | + * Reverted RC EP address size adaptation without flush_rkey |
| 135 | +#### UCS |
| 136 | + * Fixed ARCH header inclusion when building with nvcc (arm_neon.h) |
| 137 | + * Fixed VFS symlink path handling |
| 138 | + * Fixed netlink message receiving to continue until 'done' flag is set |
| 139 | +#### Build |
| 140 | + * Fixed NVCC search with explicit --with-cuda |
| 141 | + * Fixed ZE transport build failures |
| 142 | + * Fixed ucs_arch_get_cpu_flag compilation |
| 143 | + * Fixed CUDA device code build for supported architectures |
| 144 | +#### Testing |
| 145 | + * Fixed test_jenkins CI issues |
| 146 | + * Decreased rwlock test duration |
| 147 | + * Fixed error counting in gtest |
| 148 | + * Enabled retries for test_arch.memcpy |
| 149 | + * Fixed test_cuda_nvml condition relaxation |
| 150 | + * Skipped build when generating packages |
| 151 | + * Fixed CUDA device restoration in tests |
| 152 | + * Improved error detection in UCP device tests |
| 153 | + * Fixed global topo state cleanup during gtest |
| 154 | +#### Tools |
| 155 | + * Fixed perftest CUDA kernel issues |
| 156 | +#### GO Bindings |
| 157 | + * Fixed go bindings compilation with CUDA |
| 158 | +#### IB/EFA |
| 159 | + * Fixed error message when FLID is not available |
| 160 | +#### Packaging |
| 161 | + * Fixed RPM SPEC debug_package macro execution on SLES16 |
| 162 | + |
14 | 163 | ## 1.19.1 (Sep 18, 2025) |
15 | 164 | ### Features: |
16 | 165 | #### UCP |
|
0 commit comments