Skip to content

fix d2d cupy transfer#594

Merged
Intron7 merged 2 commits intomainfrom
fix-d2d-copies
Feb 28, 2026
Merged

fix d2d cupy transfer#594
Intron7 merged 2 commits intomainfrom
fix-d2d-copies

Conversation

@Intron7
Copy link
Member

@Intron7 Intron7 commented Feb 27, 2026

Fix multi-GPU cudaErrorLaunchFailure during cross-device result aggregation when using RMM without pool allocation (pool_allocator=False).

The error manifests as a MemoryError on a tiny allocation during Phase 4 (cross-device copy dev1 → dev0):

MemoryError: std::bad_alloc: CUDA error (failed to allocate 15592 bytes) at:
.../rmm/mr/cuda_memory_resource.hpp:51: cudaErrorLaunchFailure unspecified launch failure

Root cause: When CuPy's cp.asarray() performs a cross-device D2D copy and the result is consumed inline (sums+= cp.asarray(data["sums"])), the fused allocation + copy + addition exposes a stale CUDA async error through cudaMalloc (called by RMM's cuda_memory_resource). Splitting the D2D copy into a separate assignment avoids this.

The bug is hidden by:

  • RMM pool allocator — never calls cudaMalloc, so the stale error is never surfaced
  • Small datasets — allocations complete before the stale state propagates

Affected functions: edistance.pairwise, co_occurrence, moran, geary (all multi-GPU paths)

Fix: Split inline cp.asarray D2D copy + addition into two statements, and sync non-blocking streams instead of null stream where applicable.

Before (fails with RMM no-pool on large workloads)

pairwise_sums += cp.asarray(data["sums"])

After (works reliably)

dev0_sums = cp.asarray(data["sums"])
pairwise_sums += dev0_sums

@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.50%. Comparing base (1407937) to head (22949e5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #594   +/-   ##
=======================================
  Coverage   87.49%   87.50%           
=======================================
  Files          96       96           
  Lines        6781     6784    +3     
=======================================
+ Hits         5933     5936    +3     
  Misses        848      848           
Files with missing lines Coverage Δ
...apids_singlecell/pertpy_gpu/_metrics/_edistance.py 91.84% <100.00%> (+0.04%) ⬆️
src/rapids_singlecell/squidpy_gpu/_co_oc.py 92.00% <100.00%> (+0.06%) ⬆️
src/rapids_singlecell/squidpy_gpu/_gearysc.py 92.68% <100.00%> (ø)
src/rapids_singlecell/squidpy_gpu/_moransi.py 91.86% <100.00%> (ø)

@Intron7 Intron7 merged commit 5973271 into main Feb 28, 2026
24 of 25 checks passed
@Intron7 Intron7 deleted the fix-d2d-copies branch February 28, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants