Skip to content

Commit 93d3c2a

Browse files
authored
NEWS: added v1.20.0 NEWS (#11271)
Signed-off-by: Roie Danino <rdanino@nvidia.com>
1 parent 211e261 commit 93d3c2a

File tree

1 file changed

+149
-0
lines changed

1 file changed

+149
-0
lines changed

NEWS

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,155 @@
1111
### Features:
1212
### Bugfixes:
1313

14+
## 1.20.0 (December 2, 2025)
15+
### Features:
16+
#### UCP
17+
* Added new GPU device API for direct GPU-to-GPU communication
18+
* Added host API for GPU device management
19+
* Added device signaling API with cooperation levels and flags
20+
* Added API for working with offsets and channel id in device operations
21+
* Added method to write to local counter in device operations
22+
* Added local and remote address fields to memory list element in device API
23+
* Added device lane selection and allocated handle population
24+
* Added support for Direct NIC (DPU) data path with CUDA
25+
* Added rkey packing support for Direct NIC
26+
* Added sender flush mechanism when memory sys_dev differs from remote lane sys_dev
27+
* Added option to use single network device per protocol
28+
* Added MIN_RMA_CHUNK_SIZE configuration parameter
29+
* Decreased default value for MIN_RMA_CHUNK_SIZE from 16k to 8k
30+
* Improved protocol lane selection with find_lanes callback to minimize overhead
31+
* Improved send-zcopy latency factor for fast-completion cases
32+
* Improved multi-ppn performance estimation
33+
* Removed deprecated ucp_mem functions
34+
* Deprecated ucp_request_alloc API
35+
#### UCT
36+
* Added new device API for GPU communication (rc_gda transport)
37+
* Added GDAKI transport with endpoint export to GPU
38+
* Added DEVX QP/CQ support on foreign memory
39+
* Added device API implementation for CUDA_IPC transport
40+
* Added device put multi, put partial, and atomic operations for CUDA_IPC
41+
* Added peer failure error handling capability for GDAKI
42+
* Added check for nvidia_peermem driver when using GDA transport
43+
* Enabled Direct NIC by default for IB transport
44+
* Added XDR performance recognition
45+
* Added support for mapping DMA_BUF handle via PCIe for Direct NIC
46+
* Improved GDR_COPY performance with fast-path cache lookup
47+
#### RDMA CORE (IB, ROCE, etc.)
48+
* Added ConnectX-9 device support
49+
* Split dp_ordering flag for DV/DevX transports
50+
* Added VRF tables support for RoCE reachability check
51+
* Added EFA-specific GPUDirect support detection
52+
#### TCP
53+
* Added routing table check during reachability verification
54+
#### UCS
55+
* Introduced lightweight rwlock data structure
56+
* Added built-in atomics for rcache rwlock
57+
* Improved VFS symlink paths and duplicate object handling
58+
* Disabled error signal interception by default
59+
#### CUDA
60+
* Added wrappers for NVML functions
61+
* Added hook for cuLibraryGetGlobal
62+
* Improved CUDA call logging
63+
* Improved source/destination memory type detection for lane performance estimation
64+
* Removed unsafe usage of cuCtxGetId
65+
* Added support for cuCtxCreate_v4 for newer CUDA versions
66+
* Improved context management for CUDA_IPC operations
67+
#### UCM
68+
* Changed module info print to debug level by default
69+
#### Tools
70+
* Added GDAKI kernel option to perftest
71+
* Added UCP cuda device tests to perftest
72+
* Added MPI+CUDA example
73+
* Differentiated wakeup feature and extra info options in perftest
74+
#### Build
75+
* Added ability to build CUDA device code for supported architectures
76+
* Added ucx.spec into tarball for Universal Build System support
77+
* Added CUDA 13 support
78+
* Added GDA build failure when gpunetio not found
79+
#### Packaging
80+
* Moved driver level dependencies under Recommends section in Debian packages
81+
* Added Provides field for upstream packages in Debian
82+
* Migrated JUCX publish from OSSRH to Central Portal
83+
* Added ib-mlx5-gda separate package
84+
#### CI/Testing
85+
* Added Rocky OS support to release pipeline
86+
* Added RHEL 10 containers to build matrices
87+
* Added Debian 13 to CI build stage
88+
* Added ARM build testing
89+
* Switched to MOFED 25.07
90+
* Switched GPU tests to Ubuntu 24.04 DOCA 3.1 (GPUNetIO) image
91+
* Added support for nvidia_peermem module in testing
92+
* Disabled Valgrind in CI Tests stage
93+
* Disabled tag matching offload tests
94+
#### GO Bindings
95+
* Made go bindings thread safe
96+
#### Documentation
97+
* Added note about reachability check mode in README
98+
* Mentioned nvlink as supported transport
99+
* Documented return status for device APIs
100+
#### AWS EFA
101+
* Added RMA WRITE operations support
102+
* Added flush and fence operations for SRD
103+
* Enabled EFA SRD support in tests
104+
### Bugfixes:
105+
#### UCP
106+
* Fixed fallback to blocking registration for network device only
107+
* Fixed flush_state validity check before using it
108+
* Fixed single net dev filtering for single proto
109+
* Fixed rkey size estimation for rendezvous
110+
* Fixed memory invalidation without RNDV
111+
* Fixed gather_pending_requests to execute only when reconfig occurs
112+
#### UCT
113+
* Fixed CUDA_IPC protocol selection for cuda_ipc
114+
* Fixed GDA compilation issues
115+
* Fixed GDAKI wqe_idx overflow
116+
* Fixed MM FIFO room calculation for tail > head case
117+
* Fixed CUDA_IPC indices handling in put partial
118+
* Removed DOCA runtime dependency from GDAKI
119+
* Fixed GDA log spam by reducing DOCA log level
120+
* Fixed UAR support check when querying resources for GDA/MLX5
121+
* Fixed crash in GGA transport when EXPORTED_MKEY flag is missing
122+
#### CUDA
123+
* Fixed stack overflow bug when calling cuPointerGetAttribute
124+
* Fixed mapping of DMA_BUF handle for Direct NIC
125+
* Returned object to mpool in case of failure in CUDA_COPY
126+
* Reduced log level of rkey unpacking failures
127+
* Handled cuMemRelease error status properly
128+
* Fixed context setting for local buffer in CUDA_IPC
129+
* Fixed host unregister error message (changed to diagnostic)
130+
* Fixed CUDA_IPC header installation
131+
#### RDMA CORE (IB, ROCE, etc.)
132+
* Fixed RoCE network device name reading
133+
* Fixed Direct NIC related issues
134+
* Reverted RC EP address size adaptation without flush_rkey
135+
#### UCS
136+
* Fixed ARCH header inclusion when building with nvcc (arm_neon.h)
137+
* Fixed VFS symlink path handling
138+
* Fixed netlink message receiving to continue until 'done' flag is set
139+
#### Build
140+
* Fixed NVCC search with explicit --with-cuda
141+
* Fixed ZE transport build failures
142+
* Fixed ucs_arch_get_cpu_flag compilation
143+
* Fixed CUDA device code build for supported architectures
144+
#### Testing
145+
* Fixed test_jenkins CI issues
146+
* Decreased rwlock test duration
147+
* Fixed error counting in gtest
148+
* Enabled retries for test_arch.memcpy
149+
* Fixed test_cuda_nvml condition relaxation
150+
* Skipped build when generating packages
151+
* Fixed CUDA device restoration in tests
152+
* Improved error detection in UCP device tests
153+
* Fixed global topo state cleanup during gtest
154+
#### Tools
155+
* Fixed perftest CUDA kernel issues
156+
#### GO Bindings
157+
* Fixed go bindings compilation with CUDA
158+
#### IB/EFA
159+
* Fixed error message when FLID is not available
160+
#### Packaging
161+
* Fixed RPM SPEC debug_package macro execution on SLES16
162+
14163
## 1.19.1 (Sep 18, 2025)
15164
### Features:
16165
#### UCP

0 commit comments

Comments
 (0)