Skip to content

Conversation

@kshyatt
Copy link
Member

@kshyatt kshyatt commented Aug 20, 2025

Had to write some kernels for the reordering which could certainly be improved but do seem to work for now.

@kshyatt kshyatt requested a review from lkdvos August 20, 2025 08:51
@kshyatt kshyatt mentioned this pull request Aug 20, 2025
25 tasks
@codecov
Copy link

codecov bot commented Aug 20, 2025

Codecov Report

❌ Patch coverage is 94.52055% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...MatrixAlgebraKitCUDAExt/MatrixAlgebraKitCUDAExt.jl 33.33% 2 Missing ⚠️
ext/MatrixAlgebraKitCUDAExt/yacusolver.jl 98.14% 1 Missing ⚠️
src/implementations/eig.jl 93.75% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/MatrixAlgebraKit.jl 100.00% <ø> (ø)
ext/MatrixAlgebraKitCUDAExt/yacusolver.jl 94.78% <98.14%> (+1.15%) ⬆️
src/implementations/eig.jl 98.59% <93.75%> (-1.41%) ⬇️
...MatrixAlgebraKitCUDAExt/MatrixAlgebraKitCUDAExt.jl 77.77% <33.33%> (-8.89%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! At some point we might want to look into actually formalizing the reordering of eigenvalues in some way and add that to the features, but this definitely looks good!

I left some small comments, but otherwise I'm happy to merge this.

_cmplx_ev_ixs = findall(!isreal, W) # these come in pairs, choose only the first of each pair
complex_ev_ixs = view(_cmplx_ev_ixs, 1:2:length(_cmplx_ev_ixs))
if !isempty(real_ev_ixs)
real_threads = 128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we can hard-code or should we do some global const _real_threads = Ref(128) kind of thing to make it customizable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it customizable, yes. Maybe settable during init or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems that the whole custom kernels part of the code would be unnecessary if we directly use the support for complex W and V from my previous comment, meaning that there would be much less code to maintain, and less parameters to configure.

@cuda threads=real_threads blocks=real_blocks _reorder_kernel_real(real_ev_ixs, VR, n)
end
if !isempty(complex_ev_ixs)
complex_threads = 128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

@kshyatt kshyatt enabled auto-merge (squash) August 20, 2025 10:05
@kshyatt kshyatt merged commit 36ef228 into main Aug 20, 2025
10 checks passed
@kshyatt kshyatt deleted the ksh/cuda_eig branch August 20, 2025 10:27
n == length(D) || throw(DimensionMismatch("length mismatch between A and D"))
if length(V) == 0
jobvr = 'N'
elseif length(V) == n*n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a specific reason for not having size(V) == (n, n)?

elseif length(V) == n*n
jobvr = 'V'
else
throw(DimensionMismatch("size of VR must match size of A"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we just call it V instead of VR in our function signatures and doc strings.

D2 = reinterpret($elty, D)
# reuse memory, we will have to reorder afterwards to bring real and imaginary
# components in the order as required for the Complex type
VR = reinterpret($elty, V)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like cusolverDnXgeev supports inputting complex W and VR arguments, and then immediately returns the results as we want them, making the whole reinterpret and _reorder_realeigendecomposition (forced upon us by LAPACK) unneccessary: https://docs.nvidia.com/cuda/cusolver/index.html?highlight=cusolverDnXgeev#cusolverdnxgeev

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true for the eigenvectors from testing I did. For VL:

Array of dimension ldvl * n. If jobvl = CUSOLVER_EIG_MODE_VECTOR, the left eigenvectors u(j) are stored one after another in the columns of VL, in the same order as their eigenvalues. If datatypeVL is complex or the j-th eigenvalue is real, then u(j) = VL(:,j), the j-th column of VL. If dataTypeVL is real and the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then u(j) = VL(:,j) + iVL(:,j+1) and u(j+1) = VL(:,j) - iVL(:,j+1). If jobvl = CUSOLVER_EIG_MODE_NOVECTOR, VL is not referenced.

Copy link
Member

@Jutho Jutho Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VL is anyway not supported, but it is the same explanation as for VR. Are we reading this differently:
"If datatypeVL is complex" (so even if A is real) "or the j-th eigenvalue is real" (that doesn't really matter), "then u(j) = VL(:,j), the j-th column of VL."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datatype of VR must be same as the datatype of A though per the table further down

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is completely ridiculous. Who comes up with writing the documentation in such a confusing way. Or why would you allow complex W but then not still use the outdated lapack scheme for V. It feels like CUSOLVER managed to make the worst aspects of the LAPACK interface even worse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, given what we have, it might still be possible to simplify the implementation a bit by using the complex eigenvalue vector interface. If possible, I would prefer having to maintain CUDA kernels in MatrixAlgebraKit (even though this likely never needs to be touched again). For example, I wouldn't actually mind assigning a temporary real VR that is distinct from the V in the arguments, and then simply copying over, but maybe it's possible without. But also interested in @lkdvos opinion. I'll might give it a try and then we can still decide what to do with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say that I have very little feeling with the performance implications of any of this, but I can say that I have no faith in cuTENSOR to keep their API stable at all, especially considering that complex numbers seem to be an afterthought for them most of the time anyways.
Additionally this is exclusive to cuTENSOR, so presumably we'd need the kernels anyways for AMD?
I would try and make the choices that minimize maintenance in the long term, but I can't really say I know what that means practically 😉.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD currently doesn't seem to have general eigenvalue decomposition, but when they do, who knows what interface they will adopt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is all CUSOLVER, btw, not cuTENSOR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also FWIW historically AMD tends to hew very close to the line CUDA adopts for easily compatibility, since their hip* libraries allow users to "drop in" AMD to replace CUSOLVER/CUBLAS/CUWHATEVER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants