Support of multi-GPU single node #113

amontoison · 2025-10-17T19:29:10Z

using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays

T = Float64
n = 100
A_cpu = sprand(T, n, n, 0.01)
A_cpu = A_cpu + A_cpu' + I
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
b_gpu = CuVector(b_cpu)
x_gpu = similar(b_gpu)

device_indices = Cint[0,1]

handle = CUDSS.cudssCreateMg(device_indices)
data = CudssData(handle)
config = CudssConfig(device_indices)
matrix = CudssMatrix(A_gpu, "S", 'F')
solver = CudssSolver(matrix, config, data)

println("Analysis...")
cudss("analysis", solver, x_gpu, b_gpu)

println("Factorization...")
cudss("factorization", solver, x_gpu, b_gpu)
println("Solve...")
cudss("solve", solver, x_gpu, b_gpu)

println("Residual norm ||b - A*x||:")
r_gpu = b_gpu - CuSparseMatrixCSR(A_cpu) * x_gpu
norm(r_gpu)

src/helpers.jl

codecov · 2025-10-17T20:06:09Z

Codecov Report

❌ Patch coverage is 29.85075% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.81%. Comparing base (9e5ffa6) to head (cd10ec9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/management.jl	8.33%	33 Missing ⚠️
src/helpers.jl	52.38%	10 Missing ⚠️
src/interfaces.jl	50.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
- Coverage   75.39%   72.81%   -2.58%     
==========================================
  Files           7        8       +1     
  Lines         764      824      +60     
==========================================
+ Hits          576      600      +24     
- Misses        188      224      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

michel2323

Tested on moonshot and Polaris.

amontoison · 2025-10-22T05:54:23Z

@michel2323 What is missing now is CUDSS.mg_handle.
I need to understand how it works for other CUDA libraries.

I also checked the documentation and we can't use multiple GPUs for solving a batch of sparse linear systems:
https://docs.nvidia.com/cuda/cudss/advanced_features.html#multi-gpu-mg-mode

michel2323 · 2025-10-23T15:36:02Z

Handle
Inspired by CUSOLVER, the state is now the handle, stream, and devices. Similar to CUSOLVER, we don't cache the handles but create a new one every time. Apparently, that's required because of the multiple GPUs, since we have multiple contexts used in CUDSS now. So we can't push pop the cache. Each device config change requires a new state and handle. CUDA.jl has one stream per device per task. So I am not sure whether one should call device!(1) of CUDA.jl and then ask for a handle_mg(). To my understanding, when calling cudss_mg make always sure to call device!(0) before. Maybe we should add that?

Synchronization
Again inspired by CUSOLVER. We need to synchronize. Since CUDSS might have internal streams, we need device_synchronize().

amontoison requested a review from michel2323 October 17, 2025 19:29

amontoison assigned amontoison and michel2323 Oct 17, 2025

amontoison added the enhancement New feature or request label Oct 17, 2025

amontoison commented Oct 17, 2025

View reviewed changes

src/helpers.jl Outdated Show resolved Hide resolved

michel2323 reviewed Oct 21, 2025

View reviewed changes

amontoison force-pushed the cudss_mg branch from 7d2c7fc to 7e44172 Compare October 21, 2025 13:30

Finalize the support of multi-GPU single node

da29c1f

amontoison force-pushed the cudss_mg branch from 7e44172 to da29c1f Compare October 22, 2025 04:46

amontoison added 2 commits October 22, 2025 00:49

WIP for multi-GPU handles

0deae1b

WIP for multi-GPU handles

ad4c06a

michel2323 and others added 4 commits October 23, 2025 08:12

mg_handle with multiple devices

658c43c

Add comment

6d444d0

Add synchronization

e919b21

Ordered destruction. Catch context going first

718ed85

michel2323 force-pushed the cudss_mg branch from eaec739 to cd10ec9 Compare October 23, 2025 14:56

github-actions bot added 2 commits October 23, 2025 10:20

Add functions to docs

48eab65

Adding MG tests

4a8c82b

michel2323 force-pushed the cudss_mg branch from cd10ec9 to 4a8c82b Compare October 23, 2025 15:20

github-actions bot added 2 commits November 11, 2025 08:42

Remove CuContext and use cuda.state

34f0f1a

Multi-GPU hybrid memory tests

8cde76d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of multi-GPU single node #113

Support of multi-GPU single node #113

Uh oh!

amontoison commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

codecov bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

michel2323 left a comment

Uh oh!

amontoison commented Oct 22, 2025 •

edited

Loading

Uh oh!

michel2323 commented Oct 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support of multi-GPU single node #113

Are you sure you want to change the base?

Support of multi-GPU single node #113

Uh oh!

Conversation

amontoison commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

michel2323 left a comment

Choose a reason for hiding this comment

Uh oh!

amontoison commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michel2323 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amontoison commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 17, 2025 •

edited

Loading

amontoison commented Oct 22, 2025 •

edited

Loading

michel2323 commented Oct 23, 2025 •

edited

Loading