-
Notifications
You must be signed in to change notification settings - Fork 5
Initial attempt to wrap CUSOLVER.Xgeev #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! At some point we might want to look into actually formalizing the reordering of eigenvalues in some way and add that to the features, but this definitely looks good!
I left some small comments, but otherwise I'm happy to merge this.
| _cmplx_ev_ixs = findall(!isreal, W) # these come in pairs, choose only the first of each pair | ||
| complex_ev_ixs = view(_cmplx_ev_ixs, 1:2:length(_cmplx_ev_ixs)) | ||
| if !isempty(real_ev_ixs) | ||
| real_threads = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something we can hard-code or should we do some global const _real_threads = Ref(128) kind of thing to make it customizable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make it customizable, yes. Maybe settable during init or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems that the whole custom kernels part of the code would be unnecessary if we directly use the support for complex W and V from my previous comment, meaning that there would be much less code to maintain, and less parameters to configure.
| @cuda threads=real_threads blocks=real_blocks _reorder_kernel_real(real_ev_ixs, VR, n) | ||
| end | ||
| if !isempty(complex_ev_ixs) | ||
| complex_threads = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
| n == length(D) || throw(DimensionMismatch("length mismatch between A and D")) | ||
| if length(V) == 0 | ||
| jobvr = 'N' | ||
| elseif length(V) == n*n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific reason for not having size(V) == (n, n)?
| elseif length(V) == n*n | ||
| jobvr = 'V' | ||
| else | ||
| throw(DimensionMismatch("size of VR must match size of A")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we just call it V instead of VR in our function signatures and doc strings.
| D2 = reinterpret($elty, D) | ||
| # reuse memory, we will have to reorder afterwards to bring real and imaginary | ||
| # components in the order as required for the Complex type | ||
| VR = reinterpret($elty, V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like cusolverDnXgeev supports inputting complex W and VR arguments, and then immediately returns the results as we want them, making the whole reinterpret and _reorder_realeigendecomposition (forced upon us by LAPACK) unneccessary: https://docs.nvidia.com/cuda/cusolver/index.html?highlight=cusolverDnXgeev#cusolverdnxgeev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true for the eigenvectors from testing I did. For VL:
Array of dimension ldvl * n. If jobvl = CUSOLVER_EIG_MODE_VECTOR, the left eigenvectors u(j) are stored one after another in the columns of VL, in the same order as their eigenvalues. If datatypeVL is complex or the j-th eigenvalue is real, then u(j) = VL(:,j), the j-th column of VL. If dataTypeVL is real and the j-th and (j+1)-st eigenvalues form a complex conjugate pair, then u(j) = VL(:,j) + iVL(:,j+1) and u(j+1) = VL(:,j) - iVL(:,j+1). If jobvl = CUSOLVER_EIG_MODE_NOVECTOR, VL is not referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VL is anyway not supported, but it is the same explanation as for VR. Are we reading this differently:
"If datatypeVL is complex" (so even if A is real) "or the j-th eigenvalue is real" (that doesn't really matter), "then u(j) = VL(:,j), the j-th column of VL."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datatype of VR must be same as the datatype of A though per the table further down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is completely ridiculous. Who comes up with writing the documentation in such a confusing way. Or why would you allow complex W but then not still use the outdated lapack scheme for V. It feels like CUSOLVER managed to make the worst aspects of the LAPACK interface even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, given what we have, it might still be possible to simplify the implementation a bit by using the complex eigenvalue vector interface. If possible, I would prefer having to maintain CUDA kernels in MatrixAlgebraKit (even though this likely never needs to be touched again). For example, I wouldn't actually mind assigning a temporary real VR that is distinct from the V in the arguments, and then simply copying over, but maybe it's possible without. But also interested in @lkdvos opinion. I'll might give it a try and then we can still decide what to do with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say that I have very little feeling with the performance implications of any of this, but I can say that I have no faith in cuTENSOR to keep their API stable at all, especially considering that complex numbers seem to be an afterthought for them most of the time anyways.
Additionally this is exclusive to cuTENSOR, so presumably we'd need the kernels anyways for AMD?
I would try and make the choices that minimize maintenance in the long term, but I can't really say I know what that means practically 😉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD currently doesn't seem to have general eigenvalue decomposition, but when they do, who knows what interface they will adopt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is all CUSOLVER, btw, not cuTENSOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also FWIW historically AMD tends to hew very close to the line CUDA adopts for easily compatibility, since their hip* libraries allow users to "drop in" AMD to replace CUSOLVER/CUBLAS/CUWHATEVER
Had to write some kernels for the reordering which could certainly be improved but do seem to work for now.