Initial attempt to wrap CUSOLVER.Xgeev #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

kshyatt merged 3 commits into main from ksh/cuda_eig

Aug 20, 2025

Member

kshyatt commented Aug 20, 2025

Had to write some kernels for the reordering which could certainly be improved but do seem to work for now.


          Initial attempt to wrap CUSOLVER.Xgeev

fca7fb4

kshyatt requested a review from lkdvos

August 20, 2025 08:51

kshyatt mentioned this pull request

CUDA/AMD support tracker #43

Open

25 tasks


          Remove unneeded using

b2181de

codecov bot commented Aug 20, 2025 •

edited

Loading

Codecov Report

❌ Patch coverage is 94.52055% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...MatrixAlgebraKitCUDAExt/MatrixAlgebraKitCUDAExt.jl	33.33%	2 Missing ⚠️
ext/MatrixAlgebraKitCUDAExt/yacusolver.jl	98.14%	1 Missing ⚠️
src/implementations/eig.jl	93.75%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/MatrixAlgebraKit.jl	`100.00% <ø> (ø)`
ext/MatrixAlgebraKitCUDAExt/yacusolver.jl	`94.78% <98.14%> (+1.15%)`	⬆️
src/implementations/eig.jl	`98.59% <93.75%> (-1.41%)`	⬇️
...MatrixAlgebraKitCUDAExt/MatrixAlgebraKitCUDAExt.jl	`77.77% <33.33%> (-8.89%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lkdvos reviewed

View reviewed changes

Member

lkdvos left a comment

Looks great! At some point we might want to look into actually formalizing the reordering of eigenvalues in some way and add that to the features, but this definitely looks good!

I left some small comments, but otherwise I'm happy to merge this.

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl

    
                      _cmplx_ev_ixs  = findall(!isreal, W) # these come in pairs, choose only the first of each pair

                      complex_ev_ixs = view(_cmplx_ev_ixs, 1:2:length(_cmplx_ev_ixs))

                      if !isempty(real_ev_ixs)

                          real_threads = 128

Member

lkdvos Aug 20, 2025

is this something we can hard-code or should we do some global const _real_threads = Ref(128) kind of thing to make it customizable?

Member Author

kshyatt Aug 20, 2025

We can make it customizable, yes. Maybe settable during init or something?

Member

Jutho Sep 3, 2025

So it seems that the whole custom kernels part of the code would be unnecessary if we directly use the support for complex W and V from my previous comment, meaning that there would be much less code to maintain, and less parameters to configure.

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl

    
                          @cuda threads=real_threads blocks=real_blocks _reorder_kernel_real(real_ev_ixs, VR, n)

                      end

                      if !isempty(complex_ev_ixs)

                          complex_threads = 128

Member

lkdvos Aug 20, 2025

same comment here

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl Outdated Show resolved Hide resolved

src/implementations/eig.jl Outdated Show resolved Hide resolved

src/implementations/eig.jl Outdated Show resolved Hide resolved


          Fixup re comments

97efdb7

lkdvos approved these changes

View reviewed changes

kshyatt enabled auto-merge (squash)

August 20, 2025 10:05

kshyatt merged commit 36ef228 into main

10 checks passed

kshyatt deleted the ksh/cuda_eig branch

August 20, 2025 10:27

Jutho reviewed

View reviewed changes

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl

    
                          n == length(D) || throw(DimensionMismatch("length mismatch between A and D"))

                          if length(V) == 0

                              jobvr = 'N'

                          elseif length(V) == n*n

Member

Jutho Sep 3, 2025

Was there a specific reason for not having size(V) == (n, n)?

Jutho reviewed

View reviewed changes

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl

    
                          elseif length(V) == n*n

                              jobvr = 'V'

                          else

                              throw(DimensionMismatch("size of VR must match size of A"))

Member

Jutho Sep 3, 2025

I guess we just call it V instead of VR in our function signatures and doc strings.

Jutho reviewed

View reviewed changes

ext/MatrixAlgebraKitCUDAExt/yacusolver.jl

    
                              D2 = reinterpret($elty, D)

                              # reuse memory, we will have to reorder afterwards to bring real and imaginary

                              # components in the order as required for the Complex type

                              VR = reinterpret($elty, V)

Member

Jutho Sep 3, 2025

It seems like cusolverDnXgeev supports inputting complex W and VR arguments, and then immediately returns the results as we want them, making the whole reinterpret and _reorder_realeigendecomposition (forced upon us by LAPACK) unneccessary: https://docs.nvidia.com/cuda/cusolver/index.html?highlight=cusolverDnXgeev#cusolverdnxgeev

Member Author

kshyatt Sep 3, 2025

I don't think this is true for the eigenvectors from testing I did. For VL:

Array of dimension ldvl * n. If jobvl = CUSOLVER_EIG_MODE_VECTOR, the left eigenvectors u(j) are stored one after another in the columns of VL, in the same order as their eigenvalues. If datatypeVL is complex or the j-th eigenvalue is real, then u(j) = VL(:,j), the j-th column of VL. If dataTypeVL is real and the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then u(j) = VL(:,j) + iVL(:,j+1) and u(j+1) = VL(:,j) - iVL(:,j+1). If jobvl = CUSOLVER_EIG_MODE_NOVECTOR, VL is not referenced.

Member

Jutho Sep 3, 2025 •

edited

Loading

VL is anyway not supported, but it is the same explanation as for VR. Are we reading this differently:
"If datatypeVL is complex" (so even if A is real) "or the j-th eigenvalue is real" (that doesn't really matter), "then u(j) = VL(:,j), the j-th column of VL."

Member Author

kshyatt Sep 3, 2025

The datatype of VR must be same as the datatype of A though per the table further down

Member

Jutho Sep 3, 2025

That is completely ridiculous. Who comes up with writing the documentation in such a confusing way. Or why would you allow complex W but then not still use the outdated lapack scheme for V. It feels like CUSOLVER managed to make the worst aspects of the LAPACK interface even worse.

Member

Jutho Sep 3, 2025

Ok, given what we have, it might still be possible to simplify the implementation a bit by using the complex eigenvalue vector interface. If possible, I would prefer having to maintain CUDA kernels in MatrixAlgebraKit (even though this likely never needs to be touched again). For example, I wouldn't actually mind assigning a temporary real VR that is distinct from the V in the arguments, and then simply copying over, but maybe it's possible without. But also interested in @lkdvos opinion. I'll might give it a try and then we can still decide what to do with this.

Member

lkdvos Sep 3, 2025

I have to say that I have very little feeling with the performance implications of any of this, but I can say that I have no faith in cuTENSOR to keep their API stable at all, especially considering that complex numbers seem to be an afterthought for them most of the time anyways.
Additionally this is exclusive to cuTENSOR, so presumably we'd need the kernels anyways for AMD?
I would try and make the choices that minimize maintenance in the long term, but I can't really say I know what that means practically 😉.

Member

Jutho Sep 3, 2025

AMD currently doesn't seem to have general eigenvalue decomposition, but when they do, who knows what interface they will adopt.

Member Author

kshyatt Sep 3, 2025

Technically this is all CUSOLVER, btw, not cuTENSOR

Member Author

kshyatt Sep 3, 2025

Also FWIW historically AMD tends to hew very close to the line CUDA adopts for easily compatibility, since their hip* libraries allow users to "drop in" AMD to replace CUSOLVER/CUBLAS/CUWHATEVER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet