fix d2d cupy transfer by Intron7 · Pull Request #594 · scverse/rapids-singlecell

Intron7 · 2026-02-27T22:50:54Z

Fix multi-GPU cudaErrorLaunchFailure during cross-device result aggregation when using RMM without pool allocation (pool_allocator=False).

The error manifests as a MemoryError on a tiny allocation during Phase 4 (cross-device copy dev1 → dev0):

MemoryError: std::bad_alloc: CUDA error (failed to allocate 15592 bytes) at:
.../rmm/mr/cuda_memory_resource.hpp:51: cudaErrorLaunchFailure unspecified launch failure

Root cause: When CuPy's cp.asarray() performs a cross-device D2D copy and the result is consumed inline (sums+= cp.asarray(data["sums"])), the fused allocation + copy + addition exposes a stale CUDA async error through cudaMalloc (called by RMM's cuda_memory_resource). Splitting the D2D copy into a separate assignment avoids this.

The bug is hidden by:

RMM pool allocator — never calls cudaMalloc, so the stale error is never surfaced
Small datasets — allocations complete before the stale state propagates

Affected functions: edistance.pairwise, co_occurrence, moran, geary (all multi-GPU paths)

Fix: Split inline cp.asarray D2D copy + addition into two statements, and sync non-blocking streams instead of null stream where applicable.

Before (fails with RMM no-pool on large workloads)

pairwise_sums += cp.asarray(data["sums"])

After (works reliably)

dev0_sums = cp.asarray(data["sums"])
pairwise_sums += dev0_sums

codecov-commenter · 2026-02-27T23:22:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.50%. Comparing base (1407937) to head (22949e5).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #594   +/-   ##
=======================================
  Coverage   87.49%   87.50%           
=======================================
  Files          96       96           
  Lines        6781     6784    +3     
=======================================
+ Hits         5933     5936    +3     
  Misses        848      848

Files with missing lines	Coverage Δ
...apids_singlecell/pertpy_gpu/_metrics/_edistance.py	`91.84% <100.00%> (+0.04%)`	⬆️
src/rapids_singlecell/squidpy_gpu/_co_oc.py	`92.00% <100.00%> (+0.06%)`	⬆️
src/rapids_singlecell/squidpy_gpu/_gearysc.py	`92.68% <100.00%> (ø)`
src/rapids_singlecell/squidpy_gpu/_moransi.py	`91.86% <100.00%> (ø)`

fix d2d cupy transfer

62a669b

Intron7 added the run-gpu-ci label Feb 27, 2026

github-actions bot removed the run-gpu-ci label Feb 27, 2026

add release note

22949e5

Intron7 added the run-gpu-ci label Feb 27, 2026

github-actions bot removed the run-gpu-ci label Feb 27, 2026

Intron7 merged commit 5973271 into main Feb 28, 2026
24 of 25 checks passed

Intron7 deleted the fix-d2d-copies branch February 28, 2026 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix d2d cupy transfer#594

fix d2d cupy transfer#594
Intron7 merged 2 commits intomainfrom
fix-d2d-copies

Intron7 commented Feb 27, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Intron7 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Intron7 commented Feb 27, 2026 •

edited

Loading

codecov-commenter commented Feb 27, 2026 •

edited

Loading