Conversation
…oid having a clean up tile in the middle of the W matrix
…ts to slate calls
… != q. Changed dd computing in qdwh for-now and minor changes
|
Check for warnings, i.e., add |
…trcondest. Reduced the condition number of the tested matrix
mgates3
left a comment
There was a problem hiding this comment.
First pass through. Probably more changes later on.
…or update on gflops count
…from geqrf and geqrf_qdwh_full
|
All test passed, except one failure for gels using cholqr. |
| lapack::Gflop<scalar_t>::potrf(m) + | ||
| blas::Gflop<scalar_t>::trsm(slate::Side::Left, m, n) ); | ||
|
|
||
| double gflop_compute_H = blas::Gflop<scalar_t>::her2k(n, m); |
There was a problem hiding this comment.
Currently, this is really gemm, but eventually it should be herk (i.e., herkx) instead of her2k.
(I'll fix.)
mgates3
left a comment
There was a problem hiding this comment.
Ack! Pending comments from a long time ago. I didn't confirm if these make sense.
| slate_error("Failed to converge."); | ||
| } | ||
| itconv++; | ||
|
|
There was a problem hiding this comment.
I see a previous commit about using double to avoid overflow. Can we just use double for everything instead of casting everything to real_t?
We can simplify by using some constants, e.g., const real_t r2 = 2.0. Or in double, just use constants 1.0, 2.0, etc. in the formulas.
| printf("\nConverged after %d. Check what is the issue, " | ||
| "because QDWH needs <= 6 iterations.\n", | ||
| itqr+itpo); | ||
| } |
There was a problem hiding this comment.
I moved these outputs to the tester, to maintain xSDK compatibility.
| //her2k(one, A, W10, rzero, H, opts); | ||
| //auto AL = HermitianMatrix<scalar_t>( | ||
| // slate::Uplo::Lower, H ); | ||
| //slate::copy(AL, H, opts); |
There was a problem hiding this comment.
herkx or gemmtr, not her2k.
| sqd = sqrt( r_one + real_t(dd) ); | ||
| a1 = sqd + sqrt( real_t(8.0) - real_t(4.0) * real_t(dd) + | ||
| real_t(8.0) * ( real_t(2.0) - L2 ) / ( L2 * sqd ) ) / real_t(2.0); | ||
| a = real(a1); |
There was a problem hiding this comment.
a1 and a are both double, so what does this real( ) call do?
| auto R = TriangularMatrix<scalar_t>( | ||
| Uplo::Upper, slate::Diag::NonUnit, R1 ); | ||
| normR = norm(slate::Norm::One, R, opts); | ||
| slate::trcondest(slate::Norm::One, R, &Li, opts); |
There was a problem hiding this comment.
If trcondest takes Rnorm as does gecondest, then we can actually set Rnorm = 1.0 and avoid computing it entirely, since it just gets cancelled:
smin_est = Rnorm * rcond = Rnorm * 1 / (Rnorm * || R^{-1} ||_1) = 1 / || R^{-1} ||_1
The polar decomposition QDWH of a general matrix A = U * H, where U is orthogonal polar factor and H is hermitian polar factor.
QDWH iterations rely on Cholesky based and QR based iterations to compute the orthogonal polar factor U.
For the QR based iterations, new customized geqrf_qdwh_full and unmqr_qdwh_full are included to take advantage of the identity structure of the matrix involved during the QR based iterations.
The 2-norm estimate (norm2est) of the original matrix is required, the norm2est using power iteration is implemented and called in QDWH.
The following figure present the performance of SLATE_QDWH on Summit using various number of nodes.
