Skip to content

Conversation

@amontoison
Copy link
Member

@amontoison amontoison commented Oct 17, 2025

#106

using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays

T = Float64
n = 100
A_cpu = sprand(T, n, n, 0.01)
A_cpu = A_cpu + A_cpu' + I
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
b_gpu = CuVector(b_cpu)
x_gpu = similar(b_gpu)

device_indices = Cint[0,1]

handle = CUDSS.cudssCreateMg(device_indices)
data = CudssData(handle)
config = CudssConfig(device_indices)
matrix = CudssMatrix(A_gpu, "S", 'F')
solver = CudssSolver(matrix, config, data)

println("Analysis...")
cudss("analysis", solver, x_gpu, b_gpu)

println("Factorization...")
cudss("factorization", solver, x_gpu, b_gpu)
println("Solve...")
cudss("solve", solver, x_gpu, b_gpu)

println("Residual norm ||b - A*x||:")
r_gpu = b_gpu - CuSparseMatrixCSR(A_cpu) * x_gpu
norm(r_gpu)

@amontoison amontoison requested a review from michel2323 October 17, 2025 19:29
@amontoison amontoison added the enhancement New feature or request label Oct 17, 2025
@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 29.85075% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.81%. Comparing base (9e5ffa6) to head (cd10ec9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/management.jl 8.33% 33 Missing ⚠️
src/helpers.jl 52.38% 10 Missing ⚠️
src/interfaces.jl 50.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
- Coverage   75.39%   72.81%   -2.58%     
==========================================
  Files           7        8       +1     
  Lines         764      824      +60     
==========================================
+ Hits          576      600      +24     
- Misses        188      224      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@michel2323 michel2323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on moonshot and Polaris.

@amontoison
Copy link
Member Author

amontoison commented Oct 22, 2025

@michel2323 What is missing now is CUDSS.mg_handle.
I need to understand how it works for other CUDA libraries.

I also checked the documentation and we can't use multiple GPUs for solving a batch of sparse linear systems:
https://docs.nvidia.com/cuda/cudss/advanced_features.html#multi-gpu-mg-mode

@michel2323
Copy link
Member

michel2323 commented Oct 23, 2025

Handle
Inspired by CUSOLVER, the state is now the handle, stream, and devices. Similar to CUSOLVER, we don't cache the handles but create a new one every time. Apparently, that's required because of the multiple GPUs, since we have multiple contexts used in CUDSS now. So we can't push pop the cache. Each device config change requires a new state and handle. CUDA.jl has one stream per device per task. So I am not sure whether one should call device!(1) of CUDA.jl and then ask for a handle_mg(). To my understanding, when calling cudss_mg make always sure to call device!(0) before. Maybe we should add that?

Synchronization
Again inspired by CUSOLVER. We need to synchronize. Since CUDSS might have internal streams, we need device_synchronize().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants